Atari's 1979 Asteroids, rebuilt as a deterministic reinforcement-learning environment — one pure-Python physics core that runs identically in your browser and in training.
The whole game is a single pure-Python, standard-library-only physics core. It is the single source of truth and is surfaced two ways:
Because both run the same step() bytecode, the game you play and the environment an agent will train on are byte-for-byte the same simulation. The boundary rule: Python simulates, JavaScript draws, everything crossing the line is plain data.
| Action | Keys | Touch |
|---|---|---|
| Rotate | ← → · A D | ↺ ↻ buttons |
| Thrust (inertia) | ↑ · W | ▲ button |
| Fire | Space — one shot per press | ● button |
| Hyperspace | Shift — random jump | ✦ button |
| Back to menu | R | MENU |
Movement is Newtonian — you keep drifting after you stop thrusting, and every edge of the screen wraps around.
20, medium 50, small 100 points.Pick one on the title screen. They are an RL curriculum as much as a difficulty knob.
| Preset | Lives | Rock speed | Start wave |
|---|---|---|---|
| Rookie | 5 | ×0.7 | 1 |
| Pilot (default) | 3 | ×1.0 | 1 |
| Ace | 3 | ×1.3 | 3 |
gymnasium.utils.env_checker.check_env passes. Model training, JSON policy export, and in-browser AI replay are on the roadmap and not wired up yet. This build is the game + the environment.import gymnasium as gym
import asteroidhunter
asteroidhunter.register()
env = gym.make("AsteroidHunter-v0") # or: AsteroidHunterEnv(preset="ace")
obs, info = env.reset(seed=0)
obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
print(info["true_score"]) # the honest benchmark metric
A fixed Box(-1, 1, shape=(100,)) float vector — every field normalized into [-1, 1]. Asteroids and bullets are listed nearest-first, sorted by time-to-collision, so slot 0 is always the most dangerous object.
| Block | Fields per slot | Slots | Floats |
|---|---|---|---|
| Self (ship) | cosθ, sinθ, vx, vy, speed, lives, fire-cooldown, invuln | 1 | 8 |
| Asteroids | present, dx, dy, dist, closing-rate, rel-vx, rel-vy, size | 8 | 64 |
| Bullets (incoming) | present, dx, dy, dist, closing-rate, rel-vx, rel-vy | 4 | 28 |
OBS_DIM = 8 + 8·k_ast + 7·k_bul = 100 (default k_ast=8, k_bul=4). The UFO block (+8) and other-ship blocks (+8 each, for multi-agent) are reserved for the roadmap.
Factored MultiDiscrete([3, 2, 2, 2]) — the agent can do several things at once, like a real player:
| Index | Meaning | Values |
|---|---|---|
| 0 | rotate | 0 = left, 1 = none, 2 = right |
| 1 | thrust | 0 / 1 |
| 2 | fire | 0 / 1 |
| 3 | hyperspace | 0 / 1 |
Φ(s) = 0.3 · (27 − n_asteroids) # non-negative potential
r = Δscore / 100 # 20 / 50 / 100 per rock
+ 10 · wave_cleared
− 10 · life_lost
− 0.01 # per-step time cost
− 0.3 · hyperspace_used
+ γ·Φ(s′) − Φ(s) # potential-based shaping (policy-invariant)
The shaping speeds up credit assignment for clearing the field without changing the optimal policy (Ng et al., 1999). There is no reward clipping — that would collapse 20/50/100 into one value and kill the aiming skill. info["true_score"] (the raw game score) is the only benchmark metric: a high shaped reward with a low true score is the reward-hacking detector.
Same (seed, action sequence) → byte-identical episode, in training and in the browser. Every random draw comes from a single seeded random.Random; there is no wall-clock, no OS entropy, and trig is read from a precomputed 256-entry table so native Python and Pyodide can't diverge on the last floating-point bit. env.reset(seed=s) fully reseeds and restores a fresh episode.
One core, two surfaces. The core speaks dict-of-agents so it is already N-ship-ready; the single-agent env is the default surface. Each browser frame, the JS clock packs the held keys into one integer bitmask, calls Python step(bits), and gets back a flat float buffer it draws — no game logic in JavaScript.
render buffer = [ score, game_over, wave, lives,
then 6 floats per entity: kind, x, y, angle, radius, flags ]
kind: 0 ship · 2 bullet · 3 asteroid-L · 4 asteroid-M · 5 asteroid-S · 6 UFO