Skip to main content

Metadata & Self-Description

A true AI Product must be self-describable — providing metadata that enables both machines and humans to understand what it is, how it works, and how it should be used.
Self-description is the foundation for discoverability, governance, and interoperability.


Why Metadata Matters

  • Discoverability → Catalogs and marketplaces rely on metadata for search and filtering.
  • Governance → Policies and compliance checks are automated via metadata.
  • Interoperability → Metadata provides contracts for integration with other products.
  • Trust → Transparency builds consumer and regulator confidence.

Metadata Requirements

An AI Product must declare metadata across multiple categories:

  1. Identity Metadata

    • Product name, ID, version, description.
    • Tags, keywords, categories.
    • Owner and maintainers.
  2. Capability Metadata

    • Declared capability type(s) (see Capability Type).
    • Supported tasks and domains.
    • Constraints and limitations.
  3. Input/Output Metadata

    • Input/output schemas and formats (see Inputs & Outputs).
    • Confidence scores, explainability attachments.
  4. Lineage Metadata

    • Upstream dependencies and provenance (see Lineage & Provenance).
    • Training datasets and source models.
  5. Governance Metadata

    • Risk classification (minimal, limited, high).
    • Prohibited uses.
    • Compliance standards (e.g., ISO, EU AI Act, HIPAA).
  6. Operational Metadata

    • Performance benchmarks.
    • Monitoring hooks and observability endpoints.
    • Lifecycle state (experimental, production, deprecated, retired).

Self-Description Mechanisms

AI Products must provide metadata in:

  • Machine-readable formats

    • JSON-LD, RDF, YAML, or other standardized serializations.
    • Aligned with BPS and extended for AI-specific needs.
  • Human-readable formats

    • Markdown product cards.
    • Model/system cards (ethical and transparency documentation).

Metadata must be synchronized across formats.


Example

AI Product: Legal Document Summarizer

  • Identity: urn:aiprod:legal-summarizer:v1.0.0
  • Capability: Generative, domain = Legal.
  • Inputs: Legal text documents (PDF, plain text).
  • Outputs: Structured summaries + confidence score.
  • Lineage: Fine-tuned LLM (GPT-family) on legal corpora.
  • Governance: Risk = High, prohibited for unauthorized legal advice.
  • Operational: SLA = 99.9% uptime, lifecycle = Production.
  • Metadata Format: JSON-LD + Markdown model card.

Summary

  • Metadata is the language of AI Product self-description.
  • Must cover identity, capability, I/O, lineage, governance, and operations.
  • Must be provided in both machine-readable and human-readable forms.

Principle: An AI Product without rich, synchronized metadata cannot be discovered, governed, or trusted.