Kitten TTS, an open-source text-to-speech library built on ONNX, has reached version 0.8 with three new model variants designed for resource-constrained environments. The release introduces ultra-compact models ranging from 15 million to 80 million parameters, with disk footprints as small as 25 MB, making them ideal for edge deployment scenarios where storage and computational resources are limited.

Key Capabilities

The library prioritizes efficiency without sacrificing quality. All models run entirely on CPU without requiring GPU acceleration, delivering 24 kHz audio synthesis through ONNX-based inference. Users can choose from eight distinct voices—Bella, Jasper, Luna, Bruno, Rosie, Hugo, Kiki, and Leo—while adjusting speech rate on the fly. The system includes built-in text preprocessing that intelligently handles numbers, currencies, units, and other linguistic elements.

Available Models

Three model tiers cater to different performance needs. The mini variant features 80 million parameters and occupies 80 MB. The micro model provides 40 million parameters at 41 MB. The nano option delivers 15 million parameters in a standard 56 MB package, with an additional int8 quantized variant compressed to just 25 MB—though users have reported occasional stability issues with this smallest variant.

Getting Started

Installation requires Python 3.8 or later and proceeds through a single pip command. Basic synthesis involves loading a model from Hugging Face Hub and calling the generate function with desired text and voice selection. For users preferring direct file output, the generate_to_file method streamlines the workflow. The library exposes the available_voices property to list all supported voice options programmatically.

The project currently maintains developer preview status, meaning APIs may shift between releases. Commercial licensing, integration support, and custom voice development are available for enterprise users. A live demonstration is accessible through Hugging Face Spaces for browser-based testing.

What's Ahead

The development roadmap includes an optimized inference engine, mobile SDK support, higher-fidelity models, multilingual capabilities, and a companion automatic speech recognition system called KittenASR.

Source: Hacker News