← BACK TO THE GAME

The 1979 arcade classic, rebuilt as a reinforcement-learning arena. One pure-Python physics core (stdlib only, ~30 KB wheel) does everything: it trains your agents through Gymnasium, and it runs live in the browser via Pyodide (CPython on WebAssembly). JS never simulates — it only draws what Python computes, 60 times a second. Same seed → same episode, byte-for-byte, in CI, in training, and in your browser. The title screen is a live demo: a built-in autopilot flying three real landers, collisions welcome.

Landing: both feet on a pad, upright, slow — |vx| ≤ 12 and |vy| ≤ 18 is a perfect landing (50 × multiplier); up to 25/35 still pays 15 ×; anything else — fast, tilted, off-pad, off-world — is a crash. Narrower pads pay more (2X/3X/5X). Landers are solid: meet one in flight and you both crash; a landed lander blocks its pad, legally.

Three presets — terrain ruggedness, pad widths, fuel budget and spawn drift change; the physics never does. The selection persists and applies to both your episodes and the attract mode.

presetdisplacementdecayy_maxpad widths 2X/3X/5Xfuelspawn vx
TRAINEE1400.52380130 / 95 / 601200±15
CADET2100.62480110 / 75 / 451000±25
COMMANDER2600.6856090 / 60 / 36850±40

The ladder doubles as an RL curriculum: train trainee → cadet → commander with the same reward and physics throughout.

import gymnasium, moonlander

env = gymnasium.make("MoonLander-v0")            # classic rotate+thrust
# or: MoonLanderEnv(mode="gym")                  # LunarLander-style engines
# or: MoonLanderEnv(obs_mode="radar")            # partial observability
# or: MoonLanderEnv(preset="trainee")            # the difficulty curriculum
# or: MoonLanderEnv(frame_skip=4)                # 4 physics ticks per step

obs, info = env.reset(seed=42)                   # info["terrain"] = terrain dict
obs, r, term, trunc, info = env.step(env.action_space.sample())
if term: print(info["outcome"])                  # {"kind": "perfect", "mult": 5, ...}

action_space = Discrete(4)  ·  observation_space = Box(-10, 10, (14,), float32). In classic mode the four actions are noop / rotate-left / rotate-right / thrust; in gym mode they fire engines — noop / left / main / right. Episodes truncate at max_steps physics ticks; stepping a finished episode raises RuntimeError. On the terminal step info carries outcome, is_success and score.

14 floats, world-size-relative. The target pad is the nearest by euclidean distance to the pad center.

idxvalueformula
0x (normalized)x / world_w * 2 - 1
1y (normalized)y / world_h * 2 - 1
2vxvx / 60
3vyvy / 60
4sin(angle)sin(angle)
5cos(angle)cos(angle)
6angular velocityang_vel / 3
7fuel fractionfuel / fuel_init
8dx to target pad(pad_cx - x) / world_w
9dy to target pad(pad_y - y) / world_h
10pad half-width(x1 - x0) / 2 / 100
11terrain clearance(y - lander_bottom - ground_y(x)) / world_h
12pad multipliermult / 5
13pad visible1.0 or 0.0

Truth lives in the core; perception is a filter. The frame JSON always carries the truth — obs is what a policy sees, shaped by obs_mode:

Documented but not yet implemented: a lidar mode (terrain rays, no pad oracle), seeded sensor noise, and other-lander slots for the multi-agent env.

Press P (or the AI button) to hand the lander to a neural-network pilot: a small MLP whose forward pass runs in pure Python, right here in your browser. Touch any flight control and it hands the stick back. The game ships without a trained brain — you bring your own.

Bring your own brain: drag a policy .json onto the game (or use the LOAD AI footer link) and your network flies, labeled CUSTOM AI. Train and export one with examples/train_template.py — an annotated starting point that tours the whole world API; the forward pass it exports is the exact one this page runs.

φ(s)  = -1.0 * dist - 0.5 * speed - 0.5 * |sin angle|
        dist  = (min over pads of euclidean distance to pad center) / world_w
        speed = hypot(vx, vy) / 60
r_t   = 10 * (φ(s') - φ(s)) - 0.06 * (1 if main engine actually fired else 0)
terminal: perfect → +100 + 10*mult;  hard → +30;  crash → -100

The shaping is potential-based, so it is policy-invariant, and the min-over-pads potential stays continuous when the nearest pad switches — no fake reward at the boundary.

Training note (audit-verified): at frame_skip=1 a good landing is ~1300+ decisions, so with γ = 0.99 the terminal reward is discounted to ~0.0002 and hovering beats landing in discounted return. Train with frame_skip=4 and gamma ≥ 0.997 (0.999 recommended). frame_skip up to 8 preserves the autopilot landing rate (within one seed of k=1 on cadet seeds 0–29).

The single-agent env wraps Game(n_landers=1), but the core is multi-lander: one world, shared terrain, solid collisions, each lander with its own fuel, score, and fate. (A PettingZoo wrapper is a later phase.)

from moonlander.core.game import Game
g = Game(n_landers=3)                              # one world, three landers
g.reset(seed=7)                                    # shared terrain, spread spawns
g.step_all('[[1, true], [0, false], [-1, true]]')  # solid: collisions crash both

Algorithm arena (train different algorithms, watch them fly side by side on identical seeds) → competition (multi-agent, pad-blocking strategy, collision risk, comm channels) → human + AI co-op (you fly one lander, the agent flies the other).

MultiLander is built and maintained by Bijan Mehralizadeh as an open-source playground for teaching and researching reinforcement learning — an homage to Atari's 1979 vector-monitor original, rebuilt so the same physics that trains agents flies in your browser. Python + Gymnasium on the inside; Pyodide/WebAssembly and a hand-drawn vector stroke font on the outside. Source, tests, and the full Python⇆JS contract live at github.com/bijanmehr/MultiLander.

← BACK TO THE GAME