ASTEROIDHUNTER
MANUAL

Atari's 1979 Asteroids, rebuilt as a deterministic reinforcement-learning environment — one pure-Python physics core that runs identically in your browser and in training.

How it works
Controls
Gameplay & scoring
Difficulty presets
The RL environment
Observation space
Action space
Reward
Determinism & seeding
Architecture
Roadmap

How it works

The whole game is a single pure-Python, standard-library-only physics core. It is the single source of truth and is surfaced two ways:

natively, as a Gymnasium-style environment for reinforcement-learning use;
in the browser, loaded by Pyodide / WebAssembly to drive this Canvas game.

Because both run the same step() bytecode, the game you play and the environment an agent will train on are byte-for-byte the same simulation. The boundary rule: Python simulates, JavaScript draws, everything crossing the line is plain data.

Controls

Action	Keys	Touch
Rotate	← → · A D	↺ ↻ buttons
Thrust (inertia)	↑ · W	▲ button
Fire	Space — one shot per press	● button
Hyperspace	Shift — random jump	✦ button
Back to menu	R	MENU

Movement is Newtonian — you keep drifting after you stop thrusting, and every edge of the screen wraps around.

Gameplay & scoring

Asteroids split. A large rock breaks into two mediums, a medium into two smalls, a small is destroyed.
Smaller is worth more: large 20, medium 50, small 100 points.
Bullets are capped at 6 on screen and live ~1.25 s (they cross most of the field, then expire).
Hyperspace teleports you to a random spot — an escape hatch with a real self-destruct risk that rises with the number of rocks on screen.
Waves: clear every rock and the next wave spawns with more of them.
Lives: you start with 3–5 (by difficulty) and earn an extra life every 10,000 points. Respawns get ~2 s of invulnerability (the ship blinks).

Difficulty presets

Pick one on the title screen. They are an RL curriculum as much as a difficulty knob.

Preset	Lives	Rock speed	Start wave
Rookie	5	×0.7	1
Pilot (default)	3	×1.0	1
Ace	3	×1.3	3

The RL environment

Status. The environment is fully built — observation, action, and reward are implemented and gymnasium.utils.env_checker.check_env passes. Model training, JSON policy export, and in-browser AI replay are on the roadmap and not wired up yet. This build is the game + the environment.

import gymnasium as gym
import asteroidhunter
asteroidhunter.register()

env = gym.make("AsteroidHunter-v0")          # or: AsteroidHunterEnv(preset="ace")
obs, info = env.reset(seed=0)
obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
print(info["true_score"])                    # the honest benchmark metric

Observation space

A fixed Box(-1, 1, shape=(100,)) float vector — every field normalized into [-1, 1]. Asteroids and bullets are listed nearest-first, sorted by time-to-collision, so slot 0 is always the most dangerous object.

Block	Fields per slot	Slots	Floats
Self (ship)	cosθ, sinθ, vx, vy, speed, lives, fire-cooldown, invuln	1	8
Asteroids	present, dx, dy, dist, closing-rate, rel-vx, rel-vy, size	8	64
Bullets (incoming)	present, dx, dy, dist, closing-rate, rel-vx, rel-vy	4	28

OBS_DIM = 8 + 8·k_ast + 7·k_bul = 100 (default k_ast=8, k_bul=4). The UFO block (+8) and other-ship blocks (+8 each, for multi-agent) are reserved for the roadmap.

Action space

Factored MultiDiscrete([3, 2, 2, 2]) — the agent can do several things at once, like a real player:

Index	Meaning	Values
0	rotate	0 = left, 1 = none, 2 = right
1	thrust	0 / 1
2	fire	0 / 1
3	hyperspace	0 / 1

Reward

Φ(s) = 0.3 · (27 − n_asteroids)          # non-negative potential

r =  Δscore / 100                        # 20 / 50 / 100 per rock
  + 10 · wave_cleared
  − 10 · life_lost
  − 0.01                                 # per-step time cost
  − 0.3 · hyperspace_used
  + γ·Φ(s′) − Φ(s)                        # potential-based shaping (policy-invariant)

The shaping speeds up credit assignment for clearing the field without changing the optimal policy (Ng et al., 1999). There is no reward clipping — that would collapse 20/50/100 into one value and kill the aiming skill. info["true_score"] (the raw game score) is the only benchmark metric: a high shaped reward with a low true score is the reward-hacking detector.

Determinism & seeding

Same (seed, action sequence) → byte-identical episode, in training and in the browser. Every random draw comes from a single seeded random.Random; there is no wall-clock, no OS entropy, and trig is read from a precomputed 256-entry table so native Python and Pyodide can't diverge on the last floating-point bit. env.reset(seed=s) fully reseeds and restores a fresh episode.

Architecture

One core, two surfaces. The core speaks dict-of-agents so it is already N-ship-ready; the single-agent env is the default surface. Each browser frame, the JS clock packs the held keys into one integer bitmask, calls Python step(bits), and gets back a flat float buffer it draws — no game logic in JavaScript.

render buffer = [ score, game_over, wave, lives,
                 then 6 floats per entity: kind, x, y, angle, radius, flags ]
kind: 0 ship · 2 bullet · 3 asteroid-L · 4 asteroid-M · 5 asteroid-S · 6 UFO

Roadmap

Training enablement — a training template, JSON policy export, and in-browser LOAD AI / AGENT VIEW replay (the agent runs in Pyodide on the exact training observations).
UFO / saucer — the firing enemy, which also seeds a ships-vs-UFO adversarial mode.
Multi-agent — cooperative (clear a shared field) and competitive (zero-sum) modes via the dict-of-agents core.

AsteroidHunter — built by Bijan Mehr · source on GitHub · play

ASTEROIDHUNTERMANUAL