IRL Pokédex

A real-life Pokédex that classifies all 151 Gen 1 Pokémon using a fine-tuned CNN and reads their Pokédex entries aloud with a custom text-to-speech voice.

Overview

  • Classifies all 151 Gen 1 Pokémon, 94.71% validation accuracy
  • EfficientNet-B0 trained on ~11k scraped images of Pokémon cards, anime, and merch
  • XTTS v2.0 voice reads the matching Pokédex entry aloud
  • Exported to ONNX for deployment on a Pi Zero 2 W

App

Runs as a Python app on a Pi 5 with a camera, LCD, speaker, and GPIO buttons. Point it at a Pokémon, press a button, and it identifies it and reads the Pokédex entry aloud. Also includes a browseable entry list with sprites and stats for all 151 original Pokémon.

Breadboarded Raspberry Pi Pokédex with camera, amp, and buttons wired in

Dataset + classification

Close-up training photo of a Pokémon plush for the initial dataset
Training images scraped from online searches: cards, anime, plushes, figurines.
Fine-tuning interface showing Pokémon samples grouped by class
Balanced batches grouped by class for fine-tuning.

Training images were scraped from online searches: cards, anime screenshots, and merch like plushes and figurines rather than just game sprites. After deduplication the dataset came to about 11,000 images across 151 classes. EfficientNet-B0 was trained in two stages, first the classification head then the full network, going from 67% to 94.71% accuracy. The final model was exported to ONNX with class labels embedded.

On-device prediction output
On-device prediction output.

Text-to-speech

XTTS v2.0 was fine-tuned on audio scraped from the Pokémon show to create a custom Pokédex voice. Flavor text for each species was then pulled from PokeAPI and paired with pronunciation data from a fan wiki so that names get rewritten phonetically before inference. The result is a spoken Pokédex entry for every Gen 1 Pokémon.

Links

Projects