Transformer Encoder Playground
Transformer Encoder Playground
timeflieslikeanarrow
Token Embeddings + Sinusoidal Positional Encoding
time
••••••••••••••••••••••••••••••••
flies
••••••••••••••••••••••••••••••••
like
••••••••••••••••••••••••••••••••
an
••••••••••••••••••••••••••••••••
arrow
••••••••••••••••••••••••••••••••
Notes
- Deterministic random init (fixed seed) for reproducibility.
- Sinusoidal positional encoding added before attention.
- Self-attention is unmasked (encoder-style).
- Simplified post-norm LayerNorm for clarity.
- Adjust heads, model dimension, and FFN size to see qualitative changes.
Created by Kai
Not core to AIPS however as the target product is AI hence having some understanding about "Today's AI" will be useful or at least fun
About this Tool (was created as fun project in 2019-20 when I was studying BERT - tweaked it a few times)
This page illustrates the encoder sub-architecture of a Transformer:
- Embeddings + Positional Encoding — converts tokens to vectors and injects order.
- Multi-Head Self-Attention — mixes information across the sequence.
- Residual Connections + LayerNorm — stabilize training and preserve signal.
- Position-wise Feed-Forward — non-linear projection per token.
- Output — encoded token representations for downstream tasks.
Roadmap ideas: CLS pooling, key padding masks, import/export weights (JSON), BPE/WordPiece tokenization, toy backprop views.