Transformer Encoder Playground

Show numbers

Input textWhitespace tokenization · max 8 tokens

Model dimension (d_model): 32Attention heads: 4FFN hidden dimension: 64

timeflieslikeanarrow

time

••••••••••••••••••••••••••••••••

flies

••••••••••••••••••••••••••••••••

arrow

••••••••••••••••••••••••••••••••

Created by Kai

This page illustrates the encoder sub-architecture of a Transformer:

Embeddings + Positional Encoding — converts tokens to vectors and injects order.
Multi-Head Self-Attention — mixes information across the sequence.
Residual Connections + LayerNorm — stabilize training and preserve signal.
Position-wise Feed-Forward — non-linear projection per token.
Output — encoded token representations for downstream tasks.

Roadmap ideas: CLS pooling, key padding masks, import/export weights (JSON), BPE/WordPiece tokenization, toy backprop views.