HumanCompatible · Bias Detection

A toolbox for measuring bias in data & models

Maximum Subgroup Discrepancy (MSD) - bias metric with linear sample complexity
…with a MILP formulation that also tells you which subgroup is most affected.


Quick install & 60-second demo

python -m pip install git+https://github.com/humancompatible/detect.git
from humancompatible.detect import detect_bias_csv

msd_val, rule = detect_bias_csv(
    csv_path="census.csv",                  # any CSV file
    target_col="income_50k",                # binary target
    protected_list=["race", "age"],         # columns to audit
    method="MSD",                           # chosen method
)

print(f"MSD = {msd_val:.3f}", "Rule ->", rule) 

The function returns

  • msd_val – the maximum gap (in percentage‐points) between any subgroup and its complement

  • rule – the raw subgroup encoding as a list of (feature_index, Bin) pairs.
    To get a human‐readable description, do the following:

    pretty = " AND ".join(str(cond) for _, cond in rule)
    print("Subgroup:", pretty)
    # -> "Subgroup: Race = Blue AND Age = 0-18"
    

Contents


MSD as a distance?

Bias detection can be understood as measuring some distance between two distributions (positive X negative samples, some training dataset X general population data…).

However, most distances have exponential sample complexity, whereas MSD requires a linear number of samples (w.r.t. the dimension) to achieve the same error.

Classical metric

Needs full d‐dim joint?

Sample cost

Drawbacks

Wasserstein, TV, MMD, …

yes

Ω(2^d)

exponential samples, no subgroup info

MSD (ours)

only protected attrs

O(d)

✓ returns exact subgroup & gap

MSD maximises the absolute difference in probability over all protected‐attribute combinations (subgroups), yet is solvable in practice through an exact Mixed‐Integer optimization that scans the doubly‐exponential space effectively.


Citation

If you use MSD, please cite:

@inproceedings{MSD,
  author = {Jiří Němeček and Mark Kozdoba and Illia Kryvoviaz and Tomáš Pevný and Jakub Mareček},
  title = {Bias Detection via Maximum Subgroup Discrepancy},
  year = {2025},
  booktitle = {Proceedings of the 31st ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
  series = {KDD '25}
}

Looking for the installation matrix, solver details or developer setup?
Head to the README -> Installation section.