LearnCurve

An interactive tool for understanding how neural networks learn. © 2025 Chris Rowen — MIT License

① CREATE TRAINING DATA
Define a data recipe, then generate training data
to
100
0.02
② TRAIN THE MODEL Generate examples first...
0.100
| Train Error: steps/sec | Ops: 0+ 0× = 0

Network Design

Fwd:0+ 0× Back:0+ 0×
Layers 1
1
4
Width 1
3
16

Examples vs Model

Training Range -3
-2 2
3

Training Trace

| 100 | ΔError:
View 10
All
All

Error Landscape

vs
🟢 Start 🔴 Current • Error: lowhigh
③ EVALUATE & COMPARE |

📐 Equations

📖 How Neural Networks Learn

The Big Picture

We use machine learning to build equations that serve as predictive models—estimating the next word in a composition or identifying objects in a picture. You have data representing typical inputs and desired outputs, but don't know the pattern. A neural network finds a function that fits the data—like recreating a curve from scattered dots, but for arbitrarily complex functions.

The Three Phases

Create Data — Generate examples from a hidden "recipe."

Train — Measure error, adjust parameters, repeat.

Evaluate — Test on held-out data to verify learning.

Key Terms

Data Recipe — Function that generates data

Training Data — Examples the model learns from

Held-Out Data — Reserved to test real learning

Noise — Random variation (real data is messy)

Step — Update from one example

Epoch — Full pass through training data

Error (E) — Prediction error (lower = better)

Learning Rate (η) — Step size for updates

Activation (σ) — Nonlinear function (ReLU/Sigmoid)

Weights (w) — Multipliers on connections

Biases (bij, b) — Offset added at each neuron

Gradient (∂E/∂w) — Steepest uphill direction

Overfitting — Memorizing instead of learning

Extrapolation — Predicting outside training range

The Math

Forward: z = w·x + b → h = σ(z) → y

Error: E = ½(y − t)² where t = target

Backward: Chain rule finds ∂E/∂w

∂[f(g(x))]/∂x = f'(g(x)) · g'(x)

Update: w ← w − η · ∂E/∂w (downhill step)

Optimizers

Simple (SGD) — w ← w − η · ∂E/∂w

Adam — Adapts η per-weight. Usually faster.

Try These

📈 Simple: x, x^2, x^3-3*x

🌊 Waves: sin(x), sin(x)+sin(3*x)

📐 Sharp: abs(x), sign(x), floor(x)

🔬 Restrict training range → extrapolation fails

⚡ Compare SGD vs Adam

📊 Increase noise → learning breaks down

💡 ML = adjusting w to minimize E!