Beating 2048 with Hybrid RL & C++

December 24, 2025

The Challenge: Why 2048?

At first glance, 2048 looks simple. But for an AI, it’s a deceptive nightmare of stochasticity and delayed rewards. Pure Reinforcement Learning (RL) often fails to plan far enough ahead, while traditional search algorithms (like Minimax) are too slow without a solid heuristic evaluation function.

I wanted to bridge this gap. My goal wasn’t just to “solve” the game, but to engineer a high-performance system that combines the intuition of Deep Learning with the rigor of Algorithmic Search.

Instead of relying on a single method, I designed a hybrid architecture:

  1. The “Intuition” (Masked PPO): I started by training a Masked Proximal Policy Optimization (PPO) agent in Python. The “masking” was crucial to prevent the agent from attempting illegal moves, significantly speeding up convergence. This neural network learned to estimate the value of a board state—essentially teaching the AI to recognize “good” positions from “bad” ones.
  2. The “Brain” (C++ Expectimax Engine): Knowing that Python would be too slow for deep search trees, I implemented a custom Expectimax search engine in C++. Using pybind11, I exposed this high-performance C++ engine to Python. The engine uses the PPO-learned value function as its heuristic. This allows the agent to “think” several moves ahead (handling the random tile spawns via Expectimax nodes) while using the Neural Network’s intuition to evaluate the leaf nodes.

Optimization: Tuning with Optuna

A complex system has complex hyperparameters. Guessing wasn’t an option. I implemented a Bayesian Hyperparameter Optimization loop using Optuna. I ran over 100 trials to tune 8 critical hyperparameters, balancing exploration vs. exploitation during the training phase.

View Source Code on GitHub