Metadata & Self-Description
A true AI Product must be self-describable — providing metadata that enables both machines and humans to understand what it is, how it works, and how it should be used.
Self-description is the foundation for discoverability, governance, and interoperability.
Why Metadata Matters
- Discoverability → Catalogs and marketplaces rely on metadata for search and filtering.
- Governance → Policies and compliance checks are automated via metadata.
- Interoperability → Metadata provides contracts for integration with other products.
- Trust → Transparency builds consumer and regulator confidence.
Metadata Requirements
An AI Product must declare metadata across multiple categories:
-
Identity Metadata
- Product name, ID, version, description.
- Tags, keywords, categories.
- Owner and maintainers.
-
Capability Metadata
- Declared capability type(s) (see Capability Type).
- Supported tasks and domains.
- Constraints and limitations.
-
Input/Output Metadata
- Input/output schemas and formats (see Inputs & Outputs).
- Confidence scores, explainability attachments.
-
Lineage Metadata
- Upstream dependencies and provenance (see Lineage & Provenance).
- Training datasets and source models.
-
Governance Metadata
- Risk classification (minimal, limited, high).
- Prohibited uses.
- Compliance standards (e.g., ISO, EU AI Act, HIPAA).
-
Operational Metadata
- Performance benchmarks.
- Monitoring hooks and observability endpoints.
- Lifecycle state (experimental, production, deprecated, retired).
Self-Description Mechanisms
AI Products must provide metadata in:
-
Machine-readable formats
- JSON-LD, RDF, YAML, or other standardized serializations.
- Aligned with BPS and extended for AI-specific needs.
-
Human-readable formats
- Markdown product cards.
- Model/system cards (ethical and transparency documentation).
Metadata must be synchronized across formats.
Example
AI Product: Legal Document Summarizer
- Identity:
urn:aiprod:legal-summarizer:v1.0.0 - Capability: Generative, domain = Legal.
- Inputs: Legal text documents (PDF, plain text).
- Outputs: Structured summaries + confidence score.
- Lineage: Fine-tuned LLM (GPT-family) on legal corpora.
- Governance: Risk = High, prohibited for unauthorized legal advice.
- Operational: SLA = 99.9% uptime, lifecycle = Production.
- Metadata Format: JSON-LD + Markdown model card.
Summary
- Metadata is the language of AI Product self-description.
- Must cover identity, capability, I/O, lineage, governance, and operations.
- Must be provided in both machine-readable and human-readable forms.
Principle: An AI Product without rich, synchronized metadata cannot be discovered, governed, or trusted.