Google Makes DeepMind's AI-Powered Cloud Text-to-Speech Service Available to Developers

Advertisement
By Indo-Asian News Service | Updated: 28 March 2018 19:01 IST

Photo Credit: Bloomberg

Google on Wednesday launched a voice synthesiser called "Cloud Text-to-Speech" which is powered by its Britain-based Artificial Intelligence (AI) subsidiary DeepMind.

The service is now available for developers to add it in their own applications.

Advertisement

A text-to-speech service is a form of speech synthesis that converts text into spoken voice output. Google's text-to-speech powers the voices in service like Google Assistant, Search and Maps.

"'Cloud Text-to-Speech' lets developers choose from 32 different voices from 12 languages and variants," Dan Aharon, Product Manager, Cloud AI, said in a blog post.

Advertisement

"Cloud Text-to-Speech" correctly pronounces complex text such as names, dates, times and addresses for authentic-sounding speech, the company claimed.

It also allows developers to customise pitch, speaking rate and volume gain, and supports a variety of audio formats, including MP3 and WAV.

Advertisement

According to Google, "Cloud Text-to-Speech" can be used in a variety of ways, including to power voice response systems for call centres (IVRs) and enabling real-time natural language conversations, to enable Internet of Things (IoT) devices to talk back and to convert text-based media into spoken format.

Google said that "Cloud Text-to-Speech" includes a selection of high-fidelity voices built using WaveNet - a neural network trained with a large volume of speech samples that is able to create raw audio waveforms from scratch.

Advertisement

DeepMind introduced the first version of WaveNet in late 2016.

WaveNet synthesises more natural-sounding speech and, on average, produces speech audio that people prefer over other text-to-speech technologies.

During training, the network extracts the structure of the speech, including tones and what shape a realistic speech waveform should have.

When given text input, the trained WaveNet model generates the corresponding speech waveforms, one sample at a time, achieving higher accuracy than alternative approaches.

Today's improved WaveNet model generates raw waveforms 1,000 times faster than the original model and can generate one second of speech in just 50 milliseconds.

The model also has higher-fidelity and is capable of creating waveforms with 24,000 samples a second.

"We have also increased the resolution of each sample from 8 bits to 16 bits, producing higher quality audio for a more human sound," Aharon added.

With these adjustments, the latest WaveNet model produces more natural sounding speech and people have given the new US English WaveNet voices an average mean-opinion-score (MOS) of 4.1 on a scale of one-five.

 

Get your daily dose of tech news, reviews, and insights, in under 80 characters on Gadgets 360 Turbo. Connect with fellow tech lovers on our Forum. Follow us on X, Facebook, WhatsApp, Threads and Google News for instant updates. Catch all the action on our YouTube channel.

Advertisement

Related Stories

Popular Mobile Brands
  1. Xiaomi 17 Max Debuts With 8,000mAh Battery, Leica-Tuned Cameras: See Price
  2. Oppo Find X9s vs Vivo X300 FE vs OnePlus 15: Price and Features Compared
  3. Oppo Enco Air 5 Pro With 12mm Drivers Arrives in India at This Price
  4. Xiaomi Smart Band 10 Pro With 1.74-Inch AMOLED Screen Debuts at This Price
  5. Oppo Find X10 Series Tipped to Launch With Notable Battery Upgrades
  6. Redmi Note 17 Could Launch Earlier than Expected
  7. Vivo Y600 Turbo Launch Date Revealed as Tipster Leaks Key Specifications
  1. Oura Ring 5 Leak Hint at Imminent Launch; Could Offer Same Health-Tracking Features as Ring 4
  2. Portronics Vayu Nano Tyre Inflator Launched in India With Up to 120 PSI Pressure, 600mAh Batteries: Price, Features
  3. Samsung Preparing to Launch Galaxy Buds Able as Clip-On Open-Ear Earbuds: Report
  4. Redmi Note 17 Reportedly Spotted on GSMA Database; May Launch Earlier Than Expected
  5. Oppo Reno 16 Chipset Details Surface via Geekbench Listing; May Feature Dimensity 8500 Chip, 12GB RAM
  6. Scientists Discover New Fuel-Saving Route to the Moon
  7. Madhu Vidhu OTT Release: Where to Watch, Plot, Cast, IMDb Rating, and More
  8. Maa Behen OTT Release Revealed: When and Where to Watch it Online?
  9. LOL: Last One Laughing Germany Season 7 Out on OTT: Know Where to Watch it Online
  10. Warrant: From the World of Vilangu OTT Release Date: When and Where to Watch it Online?
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2026. All rights reserved.