Skip to main content

AIPS DPP — Canonical Taxonomies

This document defines canonical (stable) value sets and naming conventions used in AIPS DPP instances.
They improve cross-product comparability and reduce ambiguity across marketplaces, dashboards, and audits.

Status: Unless stated otherwise, items here are RECOMMENDED (SHOULD).
Items marked REQUIRED (MUST) are enforced by AIPS DPP validation/guidance.


1) Metric Names (evalInline.metrics[].metricName) — Registry

Use the following canonical names. Avoid synonyms unless listed as aliases.

Canonical NameMeaningNotes / Aliases
AUCROC Area Under CurveAlias: ROC-AUC (prefer AUC)
PR-AUCPrecision–Recall AUC
AccuracyOverall accuracyUse with care on imbalanced data
PrecisionPositive predictive value
RecallTrue positive rate / sensitivity
F1Harmonic mean of Precision/Recall
ROC-Recall@FPR=xRecall at fixed false positive rateSee parameterized conventions below
Precision@Recall=yPrecision at fixed recallSee parameterized conventions below
TopKAccuracy@kTop-K accuracye.g., TopKAccuracy@5
InferenceLatencyMs_p50p50 inference latency (ms)Percentile suffix _pNN
InferenceLatencyMs_p95p95 inference latency (ms)
ThroughputQpsInferences per second

1.1 Parameterized metric naming (REQUIRED format)

  • Recall at FPR: ROC-Recall@FPR=0.10
  • Precision at Recall: Precision@Recall=0.90
  • Top-K Accuracy: TopKAccuracy@5
  • Thresholded confusion matrix: provide threshold in the artifact (e.g., eval bundle), not as a metric name.

Keep the value in metricValue (number), the parameter in the name as above, and the unit in metricUnit when applicable.

1.2 Latency/Throughput naming (REQUIRED format)

  • Latency: InferenceLatencyMs_p50, InferenceLatencyMs_p95 … (milliseconds unit implied; you may set metricUnit: "ms").
  • Throughput: ThroughputQps (queries per second; optional metricUnit: "qps").

1.3 Units

  • Prefer unitless for dimensionless scores (AUC, F1).
  • Use explicit units for time/size: "ms", "s", "qps", "MB", "GB".

2) Risk Severity (riskInline.riskSeverity) — REQUIRED Enum

These values are required and case-sensitive:


Low | Medium | High | Critical

If you maintain a local scale (e.g., “Sev2”), map it internally to one of these four before publishing the DPP.


3) Risk Category (riskInline.riskCategory) — Controlled Vocabulary

Recommended category set (extendable). Use these labels verbatim where applicable:

  • Fairness (bias, disparate impact)
  • DataDrift (covariate/label shift, seasonality)
  • Robustness (adversarial fragility, OOD sensitivity)
  • Security (model exfiltration, data poisoning)
  • Privacy (PII leakage, membership inference)
  • Safety (harmful outputs, misuse)
  • Explainability (interpretability limitations)
  • Performance (latency, throughput, availability)

Aliases → Canonical mapping (SHOULD):

  • BiasFairness
  • DriftDataDrift
  • InterpretabilityExplainability

4) Policy Results (policyInline[].result) — REQUIRED Enum


Pass | Fail | Warn

Use Warn for non-blocking issues that still warrant attention.


5) Dataset Descriptor Conventions (SHOULD)

  • Names: concise, stable labels (e.g., Transactions-2025Q1Q2).
  • Versions: semantic or date-based (e.g., 2025.06).
  • Splits: specify in evalInline.evalDataset (e.g., Holdout (2025Q3)).
  • Confidentiality: do not include internal identifiers or PII in public DPPs.

6) Windows & Timestamps

  • window values SHOULD use ISO-8601 durations (e.g., P90D, P30D).
  • issuedAt, capturedAt, evaluatedAt MUST be ISO-8601 UTC timestamps.

7) Examples

7.1 Metrics array (good)

{
"evalInline": {
"metrics": [
{ "metricName": "AUC", "metricValue": 0.943 },
{ "metricName": "F1", "metricValue": 0.896 },
{ "metricName": "ROC-Recall@FPR=0.10", "metricValue": 0.905 },
{ "metricName": "Precision@Recall=0.90", "metricValue": 0.876 },
{ "metricName": "InferenceLatencyMs_p95", "metricValue": 11.2, "metricUnit": "ms" },
{ "metricName": "ThroughputQps", "metricValue": 1500, "metricUnit": "qps" }
],
"evalDataset": "Internal holdout (2025Q3)",
"evalProtocol": "Stratified split; 5-fold CV; holdout reporting"
}
}

7.2 Risk entry (good)

{
"riskInline": {
"riskCategory": "Fairness",
"riskSeverity": "Medium",
"riskDescription": "Performance variance across merchant segments; quarterly audits in place."
}
}

8) Do / Don’t

  • Do: ROC-Recall@FPR=0.10

  • Don’t: Recall@FPR10% (ambiguous)

  • Do: InferenceLatencyMs_p95

  • Don’t: Latency95 (unclear unit/scope)

  • Do: Fairness / DataDrift

  • Don’t: custom category strings like BiasRisk when a canonical term exists.


9) Extending Taxonomies

Producers MAY extend categories/metrics to meet domain needs. When extending:

  1. Prefer parameterized naming (Section 1.1).
  2. Keep units explicit where applicable.
  3. Maintain internal → canonical mapping for dashboards and search.

If a new metric becomes widely used, propose it for addition to this registry in the next minor AIPS DPP release.


10) Validation Notes

  • The risk severity enum is enforced in AIPS SHACL and JSON Schema.
  • Metric names are not strictly enumerated to allow extension, but conformance tooling may warn on non-canonical names.