The 1979 arcade classic, rebuilt as a reinforcement-learning arena. One pure-Python physics core (stdlib only, ~30 KB wheel) does everything: it trains your agents through Gymnasium, and it runs live in the browser via Pyodide (CPython on WebAssembly). JS never simulates — it only draws what Python computes, 60 times a second. Same seed → same episode, byte-for-byte, in CI, in training, and in your browser. The title screen is a live demo: a built-in autopilot flying three real landers, collisions welcome.
←/A→/Drotate ·↑/W/Spacethrust ·Rnew episode ·Oagent view ·PAI pilot ·1/2/3difficulty- Touch: arcade buttons —
←/→andTHRUST; tap the difficulty line on the title screen ·?seed=123in the URL pins the terrain
Landing: both feet on a pad, upright, slow —
|vx| ≤ 12 and |vy| ≤ 18 is a perfect landing
(50 × multiplier); up to 25/35 still pays 15 ×; anything
else — fast, tilted, off-pad, off-world — is a crash. Narrower pads pay
more (2X/3X/5X). Landers are
solid: meet one in flight and you both crash; a landed lander blocks
its pad, legally.
Three presets — terrain ruggedness, pad widths, fuel budget and spawn drift change; the physics never does. The selection persists and applies to both your episodes and the attract mode.
| preset | displacement | decay | y_max | pad widths 2X/3X/5X | fuel | spawn vx |
|---|---|---|---|---|---|---|
| TRAINEE | 140 | 0.52 | 380 | 130 / 95 / 60 | 1200 | ±15 |
| CADET | 210 | 0.62 | 480 | 110 / 75 / 45 | 1000 | ±25 |
| COMMANDER | 260 | 0.68 | 560 | 90 / 60 / 36 | 850 | ±40 |
The ladder doubles as an RL curriculum: train trainee → cadet → commander with the same reward and physics throughout.
import gymnasium, moonlander
env = gymnasium.make("MoonLander-v0") # classic rotate+thrust
# or: MoonLanderEnv(mode="gym") # LunarLander-style engines
# or: MoonLanderEnv(obs_mode="radar") # partial observability
# or: MoonLanderEnv(preset="trainee") # the difficulty curriculum
# or: MoonLanderEnv(frame_skip=4) # 4 physics ticks per step
obs, info = env.reset(seed=42) # info["terrain"] = terrain dict
obs, r, term, trunc, info = env.step(env.action_space.sample())
if term: print(info["outcome"]) # {"kind": "perfect", "mult": 5, ...}
action_space = Discrete(4) ·
observation_space = Box(-10, 10, (14,), float32).
In classic mode the four actions are noop / rotate-left /
rotate-right / thrust; in gym mode they fire engines — noop / left /
main / right. Episodes truncate at max_steps physics
ticks; stepping a finished episode raises RuntimeError.
On the terminal step info carries outcome,
is_success and score.
14 floats, world-size-relative. The target pad is the nearest by euclidean distance to the pad center.
| idx | value | formula |
|---|---|---|
| 0 | x (normalized) | x / world_w * 2 - 1 |
| 1 | y (normalized) | y / world_h * 2 - 1 |
| 2 | vx | vx / 60 |
| 3 | vy | vy / 60 |
| 4 | sin(angle) | sin(angle) |
| 5 | cos(angle) | cos(angle) |
| 6 | angular velocity | ang_vel / 3 |
| 7 | fuel fraction | fuel / fuel_init |
| 8 | dx to target pad | (pad_cx - x) / world_w |
| 9 | dy to target pad | (pad_y - y) / world_h |
| 10 | pad half-width | (x1 - x0) / 2 / 100 |
| 11 | terrain clearance | (y - lander_bottom - ground_y(x)) / world_h |
| 12 | pad multiplier | mult / 5 |
| 13 | pad visible | 1.0 or 0.0 |
Truth lives in the core; perception is a filter. The frame JSON always
carries the truth — obs is what a policy sees, shaped by
obs_mode:
"full"— indices 8–12 always populated, index 13 always 1.0"radar"— beyondradar_rangeof the nearest pad, indices 8, 9, 10, 12 read 0.0 and index 13 reads 0.0: the agent must explore to find a pad
Documented but not yet implemented: a lidar mode (terrain rays, no pad oracle), seeded sensor noise, and other-lander slots for the multi-agent env.
Press P (or the AI button) to hand the lander
to a neural-network pilot: a small MLP whose forward pass runs in pure
Python, right here in your browser. Touch any flight control and it hands
the stick back. The game ships without a trained brain — you bring
your own.
Bring your own brain: drag a policy .json
onto the game (or use the LOAD AI footer link) and your
network flies, labeled CUSTOM AI. Train and export one with
examples/train_template.py — an annotated starting
point that tours the whole world API; the forward pass it exports is the
exact one this page runs.
φ(s) = -1.0 * dist - 0.5 * speed - 0.5 * |sin angle|
dist = (min over pads of euclidean distance to pad center) / world_w
speed = hypot(vx, vy) / 60
r_t = 10 * (φ(s') - φ(s)) - 0.06 * (1 if main engine actually fired else 0)
terminal: perfect → +100 + 10*mult; hard → +30; crash → -100
The shaping is potential-based, so it is policy-invariant, and the min-over-pads potential stays continuous when the nearest pad switches — no fake reward at the boundary.
Training note (audit-verified): at
frame_skip=1 a good landing is ~1300+ decisions, so with
γ = 0.99 the terminal reward is discounted to ~0.0002
and hovering beats landing in discounted return. Train with
frame_skip=4 and gamma ≥ 0.997 (0.999
recommended). frame_skip up to 8 preserves the autopilot landing rate
(within one seed of k=1 on cadet seeds 0–29).
The single-agent env wraps Game(n_landers=1), but the core
is multi-lander: one world, shared terrain, solid collisions, each
lander with its own fuel, score, and fate. (A PettingZoo wrapper is a
later phase.)
from moonlander.core.game import Game
g = Game(n_landers=3) # one world, three landers
g.reset(seed=7) # shared terrain, spread spawns
g.step_all('[[1, true], [0, false], [-1, true]]') # solid: collisions crash both
- All randomness flows through one
random.Random(seed)per episode: same preset + same seed = byte-identical terrain, pads, stars, and spawns (different presets differ on the same seed, by design). env.reset(seed=k)follows the gymnasiumnp_randomchain — the Game seed is derived, so the terrain'sseedfield ≠ k.env.reset(options={"game_seed": k})seeds the Game directly — byte-identical to the web game's?seed=k. That is the bridge between a training run and what you watch in the browser.
Algorithm arena (train different algorithms, watch them fly side by side on identical seeds) → competition (multi-agent, pad-blocking strategy, collision risk, comm channels) → human + AI co-op (you fly one lander, the agent flies the other).
MultiLander is built and maintained by Bijan Mehralizadeh as an open-source playground for teaching and researching reinforcement learning — an homage to Atari's 1979 vector-monitor original, rebuilt so the same physics that trains agents flies in your browser. Python + Gymnasium on the inside; Pyodide/WebAssembly and a hand-drawn vector stroke font on the outside. Source, tests, and the full Python⇆JS contract live at github.com/bijanmehr/MultiLander.
← BACK TO THE GAME