Chess AI with Deep Reinforcement Learning

Click to expand

Chess AI with Deep Reinforcement Learning - 2

Click to expand

Chess AI with Deep Reinforcement Learning - 3

Click to expand

Category:AI/Machine Learning

Client:Personal Project

Duration:January 2026 - Present

Year:2026

My Approach:
Crafting Digital
Excellence

View Live Project

View on GitHub

View on Hugging Face

A neural network-based chess engine built with PPO and adaptive optimization, trained through 27 iterative versions to achieve strategic play against Stockfish.

Architecture:

SE-Residual Network with 6 blocks and 128 filters
Dual-head design: Policy Head (move selection) + Value Head (position evaluation)
12-channel board encoding including castling rights and turn indicator

Training Pipeline:

Supervised pre-training on Lichess games dataset
Self-play reinforcement learning with PPO
Stockfish-guided learning with depth 10 evaluations
Policy distillation from Stockfish best moves

Adaptive Optimization Features:

Learning rate warmup & cosine annealing
Dynamic gradient clipping (global norm, per-parameter, adaptive)
Entropy scheduling for exploration-exploitation balance
Auto-freeze mechanism to prevent model collapse

Key Technical Challenges Solved:

Fixed BatchNorm issues causing policy degradation during RL training
Solved draw loops with asymmetric self-play strategy
Improved checkmate execution using tactic puzzles dataset

Evaluation:

Win rate vs random opponents: 93-99%
Win rate vs Stockfish depth 0: 15-35%

Deployment:

Flask-based web interface for human vs AI gameplay
Deployed to Hugging Face Spaces as backend API
Mobile-responsive UI with click-to-move support

Tech Stack: Python, PyTorch, python-chess, Flask, Stockfish, Matplotlib, TensorBoard

Other Projects

View Project →

PythonFastAPI PyTorchComputer VisionDockerFly.ioDeep Learning

CrowdVision: AI-Powered Crowd Analytics

An advanced AI web application capable of estimating crowd sizes with high accuracy. Utilizing dual deep learning models (CSRNet & P2PNet) to generate density heatmaps and precise point localization for real-time crowd analytics.

View Project →

Next.jsFastAPITypeScriptPythonTailwind CSSChromaDBLlama 3.3RAGFullstack

ResearchHub - AI Research Assistant

ResearchHub is an enterprise-grade AI research assistant inspired by Google's NotebookLM, designed to revolutionize how researchers interact with academic documents. This open-source platform combines cutting-edge AI technologies with an intuitive interface to provide a comprehensive research workflow solution.

FashionAI - Clothing Recognition with Color Classification

View Project →

PythonComputer VisionYolov8FastAPIReal-time Detection

FashionAI - Clothing Recognition with Color Classification

FashionAI is an AI-powered real-time clothing detection system built with a dual-stage pipeline (YOLOv8 + Color Classifier) and FastAPI. It enables instant identification of 8 clothing categories through webcam live detection or image upload, featuring a premium dark-themed web interface and robust deployment ready for multi-platform environments.

View Project →

PythonNLPE-commerceOpen-sourceData Science

Leksara

Leksara is an open-source Python toolkit for processing Indonesian text in the e-commerce domain, automating tasks like text cleaning, stopword removal, slang normalization, and punctuation handling. It helps Data Scientists and ML Engineers streamline their workflows and ensure clean, usable data for analysis.

Chess AI with Deep Reinforcement Learning

My Approach: Crafting DigitalExcellence

Architecture:

SE-Residual Network with 6 blocks and 128 filters

Dual-head design: Policy Head (move selection) + Value Head (position evaluation)

12-channel board encoding including castling rights and turn indicator

Training Pipeline:

Supervised pre-training on Lichess games dataset

Self-play reinforcement learning with PPO

Stockfish-guided learning with depth 10 evaluations

Policy distillation from Stockfish best moves

Adaptive Optimization Features:

Learning rate warmup & cosine annealing

Dynamic gradient clipping (global norm, per-parameter, adaptive)

Entropy scheduling for exploration-exploitation balance

Auto-freeze mechanism to prevent model collapse

Key Technical Challenges Solved:

Fixed BatchNorm issues causing policy degradation during RL training

Solved draw loops with asymmetric self-play strategy

Improved checkmate execution using tactic puzzles dataset

Evaluation:

Win rate vs random opponents: 93-99%

Win rate vs Stockfish depth 0: 15-35%

Deployment:

Flask-based web interface for human vs AI gameplay

Deployed to Hugging Face Spaces as backend API

Mobile-responsive UI with click-to-move support

Other Projects

CrowdVision: AI-Powered Crowd Analytics

ResearchHub - AI Research Assistant

FashionAI - Clothing Recognition with Color Classification

Leksara

My Approach:
Crafting Digital
Excellence