Using RL with Gymnasium

Programming / ai / reinforcement learning

Gymnasium main concepts

Observation Space: Set of possible state that agent can observe in the environment
Action Space: Set of actions that agent cant take in environment
Episode: A complete run through the environment from initial state until terminate state is reach, each episode is composed if a sequence of states, actions and rewards
Wrapper: A tool in GYM that allow modify an environment behavior without changing its code, for example and time constrain and action masking
Benchmark: Help to compare between different RL algorithm

graph TD

    S[State/Observation\ns_t] --> A[Agent]
    A -->|Select Action a_t| ENV[Environment]
    ENV -->|Reward r_t| A
    ENV -->|Next Observation s_{t+1}| A

    style S fill:#d9eaff,stroke:#0052a3,stroke-width:1.5px
    style A fill:#c7ffd9,stroke:#007a3d,stroke-width:1.5px
    style ENV fill:#ffe2bc,stroke:#cc7a00,stroke-width:1.5px

Observation space

What information the environment gives the agent

Action space

What actions the agent is allowed to take

next_state: the observation after taking the action
reward: the reward after tacking the action
terminated: boolean, true if episode ended
truncated: boolean , true if the episode end by early truncation (time limit reached)
info: a dictionary contain additional environment information (for example in atari game it's hold user lives)

Using RL with Gymnasium

Gymnasium main concepts

Observation space

Action space

Reference