# HumanCompatible · Bias Detection A toolbox for measuring bias in data & models
**Maximum Subgroup Discrepancy (MSD)** - bias metric with linear sample complexity *...with a MILP formulation that also tells you **which subgroup is most affected**.*
--- ## Quick install & 60-second demo ```bash python -m pip install git+https://github.com/humancompatible/detect.git ``` ```python from humancompatible.detect import detect_bias_csv msd_val, rule = detect_bias_csv( csv_path="census.csv", # any CSV file target_col="income_50k", # binary target protected_list=["race", "age"], # columns to audit method="MSD", # chosen method ) print(f"MSD = {msd_val:.3f}", "Rule ->", rule) ``` The function returns - **`msd_val`** – the maximum gap (in percentage‐points) between any subgroup and its complement - **`rule`** – the raw subgroup encoding as a list of `(feature_index, Bin)` pairs. To get a human‐readable description, do the following: ```python pretty = " AND ".join(str(cond) for _, cond in rule) print("Subgroup:", pretty) # -> "Subgroup: Race = Blue AND Age = 0-18" ``` ## Contents ```{toctree} :maxdepth: 1 api/detect ``` - [**Tutorial**](https://github.com/humancompatible/detect/blob/main/README.md) -> Your first audit in 5 minutes - [**Examples**](https://github.com/humancompatible/detect/blob/main/examples/) -> Start with a simple [example notebook](https://github.com/humancompatible/detect/blob/main/examples/01_usage.ipynb), or go directly to a [realistic example using Folktables](https://github.com/humancompatible/detect/blob/main/examples/02_folktables.ipynb) --- ## MSD as a distance? Bias detection can be understood as measuring some distance between two distributions (positive X negative samples, some training dataset X general population data...). However, most distances have exponential sample complexity, whereas MSD requires a linear number of samples (w.r.t. the dimension) to achieve the same error. | Classical metric | Needs full d‐dim joint? | Sample cost | Drawbacks | |-------------------------------:|:-----------------------:|:-----------:|-------------------------------------------------| | Wasserstein, TV, MMD, … | yes | Ω(2^d) | exponential samples, no subgroup info | | **MSD (ours)** | only protected attrs | O(d) | ✓ returns exact subgroup & gap | MSD maximises the absolute difference in probability over all protected‐attribute combinations (subgroups), yet is solvable in practice through an exact Mixed‐Integer optimization that scans the doubly‐exponential space effectively. --- ## Citation If you use MSD, please cite: ```bibtex @inproceedings{MSD, author = {Jiří Němeček and Mark Kozdoba and Illia Kryvoviaz and Tomáš Pevný and Jakub Mareček}, title = {Bias Detection via Maximum Subgroup Discrepancy}, year = {2025}, booktitle = {Proceedings of the 31st ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining}, series = {KDD '25} } ``` Looking for the installation matrix, solver details or developer setup? Head to the [**README -> Installation**](https://github.com/humancompatible/detect?tab=readme-ov-file#installation-details) section.