LSTM & GAN | Kunyi Yu

A hands-on deep learning project built around two classic modeling tasks: implementing an LSTM cell manually for palindrome sequence prediction, and training a fully connected GAN on MNIST with latent-space interpolation experiments. Both components are implemented close to the metal — no high-level API wrappers — to demonstrate understanding of the underlying mechanics.

View on GitHub

Highlights

Implemented LSTM gating logic from scratch (input, forget, cell, output gates; hidden and cell state updates) without using torch.nn.LSTM
Validated the custom LSTM on palindrome sequences of varying lengths, analyzing how prediction accuracy degrades as sequence length increases
Built a fully connected GAN for MNIST from scratch: generator with BatchNorm and Tanh, discriminator with LeakyReLU and sigmoid, alternating adversarial optimization loop
Conducted latent-space interpolation experiments to verify continuity in the generator’s learned representation
Preserved training artifacts throughout: staged sample grids, loss curves, accuracy plots, a serialized generator checkpoint, and a written technical report

Part 1 — Manual LSTM for Sequence Modeling

The LSTM is implemented in lstm.py without torch.nn.LSTM. All four gates and the cell/hidden state update equations are written explicitly with nn.Linear layers, making the recurrent logic fully transparent.

Training setup: palindrome dataset with configurable sequence length, gradient clipping, per-epoch loss and accuracy tracking.

Training loss — sequence length 10

Accuracy — sequence length 10

Training loss — sequence length 30 (longer dependency)

Accuracy — sequence length 30

The model handles shorter and medium-length palindromes reliably and degrades more gradually than a basic RNN as sequence length grows, reflecting the gating mechanism’s role in preserving long-range information.

Part 2 — Fully Connected GAN on MNIST

A multilayer perceptron GAN trained on MNIST using a standard generator-discriminator adversarial setup:

Generator — stacked linear layers, BatchNorm, LeakyReLU activations, Tanh output
Discriminator — linear layers, LeakyReLU, sigmoid output
Training — alternating generator and discriminator updates with BCE loss; sample grids saved at regular intervals

Early training — noisy, unstructured samples

Mid training — digit structure emerging

Final stage — legible digit samples

Using fully connected layers (rather than convolutions) makes the training objective and model behavior easier to inspect at each stage.

Latent-Space Interpolation

Interpolating between pairs of latent vectors tests whether the generator has learned a smooth, continuous representation or simply memorized discrete samples.

Interpolation between two latent vectors

Smooth transitions confirm continuous latent structure

Technical Summary


Language	Python 3
Framework	PyTorch
Models	Manual LSTM, fully connected GAN (MLP)
Tasks	Sequence modeling, image generation, latent interpolation
Key components	Custom gate equations, adversarial training loop, BatchNorm, gradient clipping
Artifacts	Loss/accuracy curves, staged sample grids, generator checkpoint, written report