Skip to main content

Transformer Encoder Playground

Transformer Encoder Playground

timeflieslikeanarrow

Token Embeddings + Sinusoidal Positional Encoding

time
flies
like
an
arrow

Notes

  • Deterministic random init (fixed seed) for reproducibility.
  • Sinusoidal positional encoding added before attention.
  • Self-attention is unmasked (encoder-style).
  • Simplified post-norm LayerNorm for clarity.
  • Adjust heads, model dimension, and FFN size to see qualitative changes.

Created by Kai

Not core to AIPS however as the target product is AI hence having some understanding about "Today's AI" will be useful or at least fun

About this Tool (was created as fun project in 2019-20 when I was studying BERT - tweaked it a few times)

This page illustrates the encoder sub-architecture of a Transformer:

  • Embeddings + Positional Encoding — converts tokens to vectors and injects order.
  • Multi-Head Self-Attention — mixes information across the sequence.
  • Residual Connections + LayerNorm — stabilize training and preserve signal.
  • Position-wise Feed-Forward — non-linear projection per token.
  • Output — encoded token representations for downstream tasks.

Roadmap ideas: CLS pooling, key padding masks, import/export weights (JSON), BPE/WordPiece tokenization, toy backprop views.