Hey Everyone,
Lost in the PR shuffle about AI this week was an odd story out of Amazon.
Amazon’s text-to-speech (TTS) model called BASE TTS, which stands for Big Adaptive Streamable TTS with Emergent abilities.
Researchers at Amazon have trained the largest ever text-to-speech model yet, which they claim exhibits “emergent” qualities improving its ability to speak even complex sentences naturally.
The size is interesting in terms of audio and NLP.
The 980 million parameter model, called BASE TTS, is the largest text-to-speech model yet created. The researchers trained models of various sizes on up to 100,000 hours of public domain speech data to see if they would observe the same performance leaps that occur in natural language processing models once they grow past a certain scale.
Alexa Upgraded
As you know BigTech is racing to make Alexa, Siri and Google Assistant helpful again with Generative AI makeovers. Text-to-Speech (TTS) models are used in the development of voice assistants for smart devices and are employed to convert written text into spoken words, allowing voice assistants to communicate with users in a natural and human-like manner.
When OpenAI Sora videos are dubbed now with ElevenLabs audio, you know generative AI is on a roll. Furthermore, TTS models produce outputs that closely resemble natural speech, incorporating elements such as intonation, emphasis, and inflection. Text-to-speech and NLP are about to open a whole new world of ambient computing, if you think about it.
Amazon’s claims about being the biggest or having emergent abilities is really a bit funny though given that this PR slipped through the cracks of most outlets.
Keep reading with a 7-day free trial
Subscribe to Machine Economy Press to keep reading this post and get 7 days of free access to the full post archives.