AIPS DPP — Canonical Taxonomies
This document defines canonical (stable) value sets and naming conventions used in AIPS DPP instances.
They improve cross-product comparability and reduce ambiguity across marketplaces, dashboards, and audits.
Status: Unless stated otherwise, items here are RECOMMENDED (
SHOULD).
Items marked REQUIRED (MUST) are enforced by AIPS DPP validation/guidance.
1) Metric Names (evalInline.metrics[].metricName) — Registry
Use the following canonical names. Avoid synonyms unless listed as aliases.
| Canonical Name | Meaning | Notes / Aliases |
|---|---|---|
AUC | ROC Area Under Curve | Alias: ROC-AUC (prefer AUC) |
PR-AUC | Precision–Recall AUC | |
Accuracy | Overall accuracy | Use with care on imbalanced data |
Precision | Positive predictive value | |
Recall | True positive rate / sensitivity | |
F1 | Harmonic mean of Precision/Recall | |
ROC-Recall@FPR=x | Recall at fixed false positive rate | See parameterized conventions below |
Precision@Recall=y | Precision at fixed recall | See parameterized conventions below |
TopKAccuracy@k | Top-K accuracy | e.g., TopKAccuracy@5 |
InferenceLatencyMs_p50 | p50 inference latency (ms) | Percentile suffix _pNN |
InferenceLatencyMs_p95 | p95 inference latency (ms) | |
ThroughputQps | Inferences per second |
1.1 Parameterized metric naming (REQUIRED format)
- Recall at FPR:
ROC-Recall@FPR=0.10 - Precision at Recall:
Precision@Recall=0.90 - Top-K Accuracy:
TopKAccuracy@5 - Thresholded confusion matrix: provide threshold in the artifact (e.g., eval bundle), not as a metric name.
Keep the value in
metricValue(number), the parameter in the name as above, and the unit inmetricUnitwhen applicable.
1.2 Latency/Throughput naming (REQUIRED format)
- Latency:
InferenceLatencyMs_p50,InferenceLatencyMs_p95… (milliseconds unit implied; you may setmetricUnit: "ms"). - Throughput:
ThroughputQps(queries per second; optionalmetricUnit: "qps").
1.3 Units
- Prefer unitless for dimensionless scores (
AUC,F1). - Use explicit units for time/size:
"ms","s","qps","MB","GB".
2) Risk Severity (riskInline.riskSeverity) — REQUIRED Enum
These values are required and case-sensitive:
Low | Medium | High | Critical
If you maintain a local scale (e.g., “Sev2”), map it internally to one of these four before publishing the DPP.
3) Risk Category (riskInline.riskCategory) — Controlled Vocabulary
Recommended category set (extendable). Use these labels verbatim where applicable:
Fairness(bias, disparate impact)DataDrift(covariate/label shift, seasonality)Robustness(adversarial fragility, OOD sensitivity)Security(model exfiltration, data poisoning)Privacy(PII leakage, membership inference)Safety(harmful outputs, misuse)Explainability(interpretability limitations)Performance(latency, throughput, availability)
Aliases → Canonical mapping (SHOULD):
Bias→FairnessDrift→DataDriftInterpretability→Explainability
4) Policy Results (policyInline[].result) — REQUIRED Enum
Pass | Fail | Warn
Use Warn for non-blocking issues that still warrant attention.
5) Dataset Descriptor Conventions (SHOULD)
- Names: concise, stable labels (e.g.,
Transactions-2025Q1Q2). - Versions: semantic or date-based (e.g.,
2025.06). - Splits: specify in
evalInline.evalDataset(e.g.,Holdout (2025Q3)). - Confidentiality: do not include internal identifiers or PII in public DPPs.
6) Windows & Timestamps
windowvalues SHOULD use ISO-8601 durations (e.g.,P90D,P30D).issuedAt,capturedAt,evaluatedAtMUST be ISO-8601 UTC timestamps.
7) Examples
7.1 Metrics array (good)
{
"evalInline": {
"metrics": [
{ "metricName": "AUC", "metricValue": 0.943 },
{ "metricName": "F1", "metricValue": 0.896 },
{ "metricName": "ROC-Recall@FPR=0.10", "metricValue": 0.905 },
{ "metricName": "Precision@Recall=0.90", "metricValue": 0.876 },
{ "metricName": "InferenceLatencyMs_p95", "metricValue": 11.2, "metricUnit": "ms" },
{ "metricName": "ThroughputQps", "metricValue": 1500, "metricUnit": "qps" }
],
"evalDataset": "Internal holdout (2025Q3)",
"evalProtocol": "Stratified split; 5-fold CV; holdout reporting"
}
}
7.2 Risk entry (good)
{
"riskInline": {
"riskCategory": "Fairness",
"riskSeverity": "Medium",
"riskDescription": "Performance variance across merchant segments; quarterly audits in place."
}
}
8) Do / Don’t
-
Do:
ROC-Recall@FPR=0.10 -
Don’t:
Recall@FPR10%(ambiguous) -
Do:
InferenceLatencyMs_p95 -
Don’t:
Latency95(unclear unit/scope) -
Do:
Fairness/DataDrift -
Don’t: custom category strings like
BiasRiskwhen a canonical term exists.
9) Extending Taxonomies
Producers MAY extend categories/metrics to meet domain needs. When extending:
- Prefer parameterized naming (Section 1.1).
- Keep units explicit where applicable.
- Maintain internal → canonical mapping for dashboards and search.
If a new metric becomes widely used, propose it for addition to this registry in the next minor AIPS DPP release.
10) Validation Notes
- The risk severity enum is enforced in AIPS SHACL and JSON Schema.
- Metric names are not strictly enumerated to allow extension, but conformance tooling may warn on non-canonical names.