← PLAY GITHUB

ASTEROIDHUNTER
MANUAL

Atari's 1979 Asteroids, rebuilt as a deterministic reinforcement-learning environment — one pure-Python physics core that runs identically in your browser and in training.

How it works
Controls
Gameplay & scoring
Difficulty presets
The RL environment
Observation space
Action space
Reward
Determinism & seeding
Architecture
Roadmap

How it works

The whole game is a single pure-Python, standard-library-only physics core. It is the single source of truth and is surfaced two ways:

Because both run the same step() bytecode, the game you play and the environment an agent will train on are byte-for-byte the same simulation. The boundary rule: Python simulates, JavaScript draws, everything crossing the line is plain data.

Controls

ActionKeysTouch
Rotate · A D↺ ↻ buttons
Thrust (inertia) · W▲ button
FireSpace — one shot per press● button
HyperspaceShift — random jump✦ button
Back to menuRMENU

Movement is Newtonian — you keep drifting after you stop thrusting, and every edge of the screen wraps around.

Gameplay & scoring

Difficulty presets

Pick one on the title screen. They are an RL curriculum as much as a difficulty knob.

PresetLivesRock speedStart wave
Rookie5×0.71
Pilot (default)3×1.01
Ace3×1.33

The RL environment

Status. The environment is fully built — observation, action, and reward are implemented and gymnasium.utils.env_checker.check_env passes. Model training, JSON policy export, and in-browser AI replay are on the roadmap and not wired up yet. This build is the game + the environment.
import gymnasium as gym
import asteroidhunter
asteroidhunter.register()

env = gym.make("AsteroidHunter-v0")          # or: AsteroidHunterEnv(preset="ace")
obs, info = env.reset(seed=0)
obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
print(info["true_score"])                    # the honest benchmark metric

Observation space

A fixed Box(-1, 1, shape=(100,)) float vector — every field normalized into [-1, 1]. Asteroids and bullets are listed nearest-first, sorted by time-to-collision, so slot 0 is always the most dangerous object.

BlockFields per slotSlotsFloats
Self (ship)cosθ, sinθ, vx, vy, speed, lives, fire-cooldown, invuln18
Asteroidspresent, dx, dy, dist, closing-rate, rel-vx, rel-vy, size864
Bullets (incoming)present, dx, dy, dist, closing-rate, rel-vx, rel-vy428

OBS_DIM = 8 + 8·k_ast + 7·k_bul = 100 (default k_ast=8, k_bul=4). The UFO block (+8) and other-ship blocks (+8 each, for multi-agent) are reserved for the roadmap.

Action space

Factored MultiDiscrete([3, 2, 2, 2]) — the agent can do several things at once, like a real player:

IndexMeaningValues
0rotate0 = left, 1 = none, 2 = right
1thrust0 / 1
2fire0 / 1
3hyperspace0 / 1

Reward

Φ(s) = 0.3 · (27 − n_asteroids)          # non-negative potential

r =  Δscore / 100                        # 20 / 50 / 100 per rock
  + 10 · wave_cleared
  − 10 · life_lost
  − 0.01                                 # per-step time cost
  − 0.3 · hyperspace_used
  + γ·Φ(s′) − Φ(s)                        # potential-based shaping (policy-invariant)

The shaping speeds up credit assignment for clearing the field without changing the optimal policy (Ng et al., 1999). There is no reward clipping — that would collapse 20/50/100 into one value and kill the aiming skill. info["true_score"] (the raw game score) is the only benchmark metric: a high shaped reward with a low true score is the reward-hacking detector.

Determinism & seeding

Same (seed, action sequence) → byte-identical episode, in training and in the browser. Every random draw comes from a single seeded random.Random; there is no wall-clock, no OS entropy, and trig is read from a precomputed 256-entry table so native Python and Pyodide can't diverge on the last floating-point bit. env.reset(seed=s) fully reseeds and restores a fresh episode.

Architecture

One core, two surfaces. The core speaks dict-of-agents so it is already N-ship-ready; the single-agent env is the default surface. Each browser frame, the JS clock packs the held keys into one integer bitmask, calls Python step(bits), and gets back a flat float buffer it draws — no game logic in JavaScript.

render buffer = [ score, game_over, wave, lives,
                 then 6 floats per entity: kind, x, y, angle, radius, flags ]
kind: 0 ship · 2 bullet · 3 asteroid-L · 4 asteroid-M · 5 asteroid-S · 6 UFO

Roadmap

AsteroidHunter — built by Bijan Mehr · source on GitHub · play