Using RL with Gymnasium
Programming / ai / reinforcement learning
Gymnasium main concepts
- Observation Space: Set of possible state that agent can observe in the environment
- Action Space: Set of actions that agent cant take in environment
- Episode: A complete run through the environment from initial state until terminate state is reach, each episode is composed if a sequence of states, actions and rewards
- Wrapper: A tool in GYM that allow modify an environment behavior without changing its code, for example and time constrain and action masking
- Benchmark: Help to compare between different RL algorithm
graph TD
S[State/Observation\ns_t] --> A[Agent]
A -->|Select Action a_t| ENV[Environment]
ENV -->|Reward r_t| A
ENV -->|Next Observation s_{t+1}| A
style S fill:#d9eaff,stroke:#0052a3,stroke-width:1.5px
style A fill:#c7ffd9,stroke:#007a3d,stroke-width:1.5px
style ENV fill:#ffe2bc,stroke:#cc7a00,stroke-width:1.5px
Observation space
What information the environment gives the agent
Action space
What actions the agent is allowed to take
- next_state: the observation after taking the action
- reward: the reward after tacking the action
- terminated: boolean, true if episode ended
- truncated: boolean , true if the episode end by early truncation (time limit reached)
- info: a dictionary contain additional environment information (for example in atari game it's hold user lives)