IBM's New Speech Model Packs a Punch at Half the Size

IBM has released a new, more efficient speech recognition model designed to run directly on devices like phones and sensors, rather than requiring powerful cloud servers. The model, named Granite 4.0 1B Speech, is a significant upgrade that cuts its computational footprint in half while improving accuracy and adding features.

Engineers managed to shrink the model to one billion parameters—half that of its predecessor—while boosting its English transcription precision. It now also understands and translates between six languages: English, French, German, Spanish, Portuguese, and Japanese. A new feature allows developers to provide lists of specific keywords, like technical acronyms or product names, making the model far more reliable for specialized business use.

The technical performance is turning heads. Granite 4.0 recently took the top spot on the industry's OpenASR leaderboard for open-source speech recognition. Benchmarks show it matches or beats the accuracy of much larger models, a critical achievement for applications where processing power and battery life are limited.

Available now under a permissive Apache 2.0 license, the model is built for engineers to integrate into real-world applications, from handheld logistics scanners to global customer service tools. IBM suggests pairing it with their Granite Guardian system for deployments that need an extra layer of security monitoring.

Source: Hugging Face Blog