LearnCurve

① CREATE TRAINING DATA

Define a data recipe, then generate training data

Recipe:

x: to

Samples: 100

Noise: 0.02

② TRAIN THE MODEL Generate examples first...

Learning Rate (η): 0.100

Optimizer:

Activation (σ):

| Train Error: — — steps/sec | Ops: 0+ 0× = 0

Network Design

Fwd:0+ 0× Back:0+ 0×

Layers 1

Width 1

Examples vs Model

Held-Out Recipe

Training Range -3

-2 2

Training Trace

Error Weights Log | Average 100 | ΔError:—

All

Error Landscape

🟢 Start 🔴 Current • Error: low → high

③ EVALUATE & COMPARE |

📐 Equations

📖 How Neural Networks Learn

The Big Picture

We use machine learning to build equations that serve as predictive models—estimating the next word in a composition or identifying objects in a picture. You have data representing typical inputs and desired outputs, but don't know the pattern. A neural network finds a function that fits the data—like recreating a curve from scattered dots, but for arbitrarily complex functions.

The Three Phases

① Create Data — Generate examples from a hidden "recipe."

② Train — Measure error, adjust parameters, repeat.

③ Evaluate — Test on held-out data to verify learning.

Key Terms

Data Recipe — Function that generates data

Training Data — Examples the model learns from

Held-Out Data — Reserved to test real learning

Noise — Random variation (real data is messy)

Step — Update from one example

Epoch — Full pass through training data

Error (E) — Prediction error (lower = better)

Learning Rate (η) — Step size for updates

Activation (σ) — Nonlinear function (ReLU/Sigmoid)

Weights (w) — Multipliers on connections

Biases (b_ij, b) — Offset added at each neuron

Gradient (∂E/∂w) — Steepest uphill direction

Overfitting — Memorizing instead of learning

Extrapolation — Predicting outside training range

The Math

Forward: z = w·x + b → h = σ(z) → y

Error: E = ½(y − t)² where t = target

Backward: Chain rule finds ∂E/∂w

∂[f(g(x))]/∂x = f'(g(x)) · g'(x)

Update: w ← w − η · ∂E/∂w (downhill step)

Optimizers

Simple (SGD) — w ← w − η · ∂E/∂w

Adam — Adapts η per-weight. Usually faster.

Try These

📈 Simple: x, x^2, x^3-3*x

🌊 Waves: sin(x), sin(x)+sin(3*x)

📐 Sharp: abs(x), sign(x), floor(x)

🔬 Restrict training range → extrapolation fails

⚡ Compare SGD vs Adam

📊 Increase noise → learning breaks down

💡 ML = adjusting w to minimize E!