Kioku

Kioku is a reinforcement learning library built in PyTorch. It implements classic and modern RL algorithms with a focus on clean abstractions and modularity for agents, value functions, memory buffers and training loops.

The framework currently includes implementations of DQN, A2C and PPO.

Technical Design

Kioku is implemented in Python using PyTorch and Gymnasium. Agents are built from modular components, including policy/value networks, replay or on-policy buffers, schedulers (for epsilon, learning rate, etc.) and trainers. This structure keeps the codebase clean and facilitates construction of new agents.

It currently supports the following agents:

DQN
- Based on: Playing Atari with Deep Reinforcement Learning
- Uses Q-value function with Polyak averaging and epsilon decay
DQN with PER
- Adds a priotized experience replay buffer based on: Prioritized Experience Replay
A2C
- Based on: Asynchronous Methods for Deep Reinforcement Learning
- Uses N-step returns
PPO
- Based on: Proximal Policy Optimization Algorithms
- Uses Generalized Advantage Estimation, N-step returns, mini-batch learning, multiple learning iterations per batch.

Why Kioku?

Kioku started as an attempt to deeply understand reinforcement learning algorithms by implementing them from scratch instead of relying on high-level libraries. I was particularly interested in implementing PPO, since none of my previous attempts succeeded.

I started this project in fall 2024, when I was taking COGS 100: Exploring the Mind, an introductory cognitive science course at SFU. I learned about cognitive science concepts like constructionist and constructivist approaches, which inspired me to take a constructionist approach and build a modular framework for constructing agents.

Challenges

The primary challenge was translating math-heavy algorithms like A2C and PPO into reliable implementations. Small details can cause drastic changes in performance, so a lot of trial-and-error was necessary.

Hyperparameter optimization was also difficult. I mainly manually tuned the parameters (luckily the agent constructors make this easy), but in the future, I will look into hyperparameter sweeps to automate this.

Results

DQN Cartpole:

DQN cartpole episde rewards DQN cartpole gif

PPO Cartpole:

PPO CartPole episode rewards PPO CartPole gif

PPO Lunar Lander:

PPO Lunar Lander episode rewards PPO Lunar Lander gif