Post

Building AI Game Agents with Gymnasium and Pygame

Building AI Game Agents with Gymnasium and Pygame

Introduction

One of the most exciting frontiers in artificial intelligence is training agents to play games. From simple mazes to complex strategy games, this pursuit pushes the boundaries of machine learning and decision-making. To build and test these AI agents, we need two critical pieces of infrastructure: a standardized way to represent the game as an environment, and a way to visualize what’s happening.

This is where two cornerstone Python libraries come into play:

  • Gymnasium: The successor to OpenAI’s Gym, Gymnasium is the de-facto standard for creating reinforcement learning (RL) environments. It provides a simple, universal API that allows AI agents to interact with any compatible game or simulation.
  • Pygame: A beloved, cross-platform set of Python modules designed for writing video games. It provides a straightforward way to handle graphics, sound, and user input, making it an excellent choice for building and rendering 2D game environments.

This post will explore how these two libraries work together to create a powerful platform for developing and testing game-playing AI agents, drawing on concepts seen in complex game engines and AI agent code.

The Core of AI Training: The Environment API

To train an AI agent, we can’t have it just “look at the screen.” We need a programmatic interface for it to interact with the game world. A standard interface is crucial because it allows AI algorithms to be developed independently of the specific game they are playing. An agent designed to work with the Gymnasium API can, in theory, be plugged into any Gymnasium environment, whether it’s a chess game, a maze, or a simulation of fish battling, as hinted at in complex Game logic.

The Gymnasium Env class is this standard interface. Any game environment you create will inherit from this class and implement its core methods:

  • __init__(): Initializes the environment, setting up the game state, defining action and observation spaces, etc.
  • step(action): This is the engine of the environment. The agent passes an action to this method. The environment updates the game state based on that action and returns a tuple of five values:
    1. observation: The new state of the world after the action.
    2. reward: A numerical reward signal that tells the agent how well it’s doing.
    3. terminated: A boolean that is True if the game has ended (e.g., the agent won or lost).
    4. truncated: A boolean that is True if the episode was ended for a reason other than a natural conclusion (e.g., a time limit was reached).
    5. info: A dictionary for auxiliary diagnostic information.
  • reset(): Resets the environment to a starting condition and returns the initial observation. This is called at the beginning of every new game episode.
  • render(): Renders the current state of the game for a human to see. This is where a library like Pygame comes in.
  • close(): Cleans up any open resources.

The BabyAI and Lmrlgym_MazeEnv classes you’ve seen are perfect examples of this pattern, wrapping complex game logic inside this standard Gymnasium API.

Building a Custom Game Environment

In many advanced projects, the core game logic (the “game engine”) is separated from the environment wrapper. This engine might be a complex C++ class like the Game class you’ve seen, which manages players, states, and rules.

The pattern to make this compatible with the AI ecosystem is to create a Python wrapper class:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import gymnasium as gym
# Assume 'MyGameEngine' is your custom game logic (like the C++ Game class)
from my_game_engine import MyGameEngine 

class CustomGameEnv(gym.Env):
    def __init__(self):
        super().__init__()
        self.game_engine = MyGameEngine()
        # Define action and observation spaces
        self.action_space = gym.spaces.Discrete(4) # e.g., 4 actions
        self.observation_space = gym.spaces.Box(low=0, high=255, shape=(84, 84, 3), dtype=np.uint8)

    def step(self, action):
        # 1. Pass the action to the internal game engine
        engine_state, reward, is_done = self.game_engine.update(action)

        # 2. Convert the engine's state into an observation
        observation = self.game_engine.get_observation()

        # 3. Return the standard tuple
        return observation, reward, is_done, False, {}

    def reset(self, seed=None, options=None):
        # Reset the internal game engine
        initial_engine_state = self.game_engine.reset()
        observation = self.game_engine.get_observation()
        return observation, {}

This wrapper acts as a bridge, translating the standard step and reset calls into commands for your specific game logic.

Visualizing the Action with Pygame

While an AI agent only needs the observation data, humans need to see what’s going on! The render() method is our window into the game world. While you can save images to a file (as seen in the BabyAI example), using Pygame allows for real-time, interactive visualization.

Here’s a conceptual example of how you might implement the render() method using Pygame:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
import pygame

# Inside your CustomGameEnv class...
class CustomGameEnv(gym.Env):
    def __init__(self):
        # ... other setup ...
        pygame.init()
        self.screen = pygame.display.set_mode((640, 480))
        self.clock = pygame.time.Clock()

    def render(self):
        # Clear the screen
        self.screen.fill((20, 20, 20)) # Dark background

        # Get game state from the core game engine
        # This is where you'd get positions of fish, players, maze walls, etc.
        player_data = self.game_engine.get_player_data()
        enemy_data = self.game_engine.get_enemy_data()

        # Draw game elements using pygame.draw functions
        for player in player_data:
            pygame.draw.rect(self.screen, (0, 255, 0), (*player['position'], 20, 20)) # Green square for player
        
        for enemy in enemy_data:
            pygame.draw.rect(self.screen, (255, 0, 0), (*enemy['position'], 20, 20)) # Red square for enemy

        # Update the display and control the frame rate
        pygame.display.flip()
        self.clock.tick(60) # Limit to 60 FPS

    def close(self):
        pygame.quit()

Plugging in an AI Agent

Once you have a Gymnasium environment, you can connect any compatible agent to it. The agent’s job is simple: receive an observation and return an action. This is where the complex logic seen in snippets like _guess comes into play. That method is the “brain” of the agent.

The main training loop looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# 1. Create the environment
env = CustomGameEnv()

# 2. Create the agent (this could be a simple algorithm or a complex LLM-based agent)
# The agent's logic might live in a class with a method like `choose_action`
agent = MyAIAgent() 

observation, info = env.reset()
terminated = False
total_reward = 0

while not terminated:
    # 3. Render the environment for human viewing
    env.render()

    # 4. The agent chooses an action based on the current observation
    action = agent.choose_action(observation)

    # 5. The environment responds to the action
    observation, reward, terminated, truncated, info = env.step(action)
    total_reward += reward

print(f"Game over! Total reward: {total_reward}")
env.close()

In this loop, the agent.choose_action method could be a traditional RL algorithm (like Q-learning) or, in a more modern approach, it could be a function that formats the observation into a prompt, calls an LLM (like Gemma), and parses the response to get an action—exactly the pattern seen in advanced agentic systems.

Conclusion

The combination of a core game engine, a Gymnasium wrapper, an AI agent, and a Pygame renderer forms a powerful and modular architecture for modern AI research.

  • Gymnasium provides the universal language that connects agents to environments.
  • Pygame provides the visual feedback essential for development and debugging.

This setup allows researchers and developers to experiment with different games and different AI decision-making strategies, from classic RL to the latest LLM-based agents, all within a standardized and extensible framework.

This post is licensed under CC BY 4.0 by the author.