HumanCompatible · Bias Detection
A toolbox for measuring bias in data & models
Maximum Subgroup Discrepancy (MSD) - bias metric with linear sample complexity
…with a MILP formulation that also tells you which subgroup is most affected.
Quick install & 60-second demo
python -m pip install git+https://github.com/humancompatible/detect.git
from humancompatible.detect import detect_bias_csv
msd_val, rule = detect_bias_csv(
csv_path="census.csv", # any CSV file
target_col="income_50k", # binary target
protected_list=["race", "age"], # columns to audit
method="MSD", # chosen method
)
print(f"MSD = {msd_val:.3f}", "Rule ->", rule)
The function returns
msd_val– the maximum gap (in percentage‐points) between any subgroup and its complementrule– the raw subgroup encoding as a list of(feature_index, Bin)pairs.
To get a human‐readable description, do the following:pretty = " AND ".join(str(cond) for _, cond in rule) print("Subgroup:", pretty) # -> "Subgroup: Race = Blue AND Age = 0-18"
Contents
Tutorial -> Your first audit in 5 minutes
Examples -> Start with a simple example notebook, or go directly to a realistic example using Folktables
MSD as a distance?
Bias detection can be understood as measuring some distance between two distributions (positive X negative samples, some training dataset X general population data…).
However, most distances have exponential sample complexity, whereas MSD requires a linear number of samples (w.r.t. the dimension) to achieve the same error.
Classical metric |
Needs full d‐dim joint? |
Sample cost |
Drawbacks |
|---|---|---|---|
Wasserstein, TV, MMD, … |
yes |
Ω(2^d) |
exponential samples, no subgroup info |
MSD (ours) |
only protected attrs |
O(d) |
✓ returns exact subgroup & gap |
MSD maximises the absolute difference in probability over all protected‐attribute combinations (subgroups), yet is solvable in practice through an exact Mixed‐Integer optimization that scans the doubly‐exponential space effectively.
Citation
If you use MSD, please cite:
@inproceedings{MSD,
author = {Jiří Němeček and Mark Kozdoba and Illia Kryvoviaz and Tomáš Pevný and Jakub Mareček},
title = {Bias Detection via Maximum Subgroup Discrepancy},
year = {2025},
booktitle = {Proceedings of the 31st ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
series = {KDD '25}
}
Looking for the installation matrix, solver details or developer setup?
Head to the README -> Installation section.