Microsoft Unveils VALL-E, Audio AI That Can Simulate Any Voice From 3-Second Prompts

Microsoft trained VALL-E's speech synthesis capabilities on an audio library called LibriLight.

Advertisement
Written by Sucharita Ganguly, Edited by Siddharth Suvarna | Updated: 10 January 2023 19:07 IST
Highlights
  • Microsoft calls VALL-E a "neural codec language model"
  • VALL-E could be used for high-quality text-to-speech applications
  • It could also be used for speech editing and audio content creation

Microsoft says VALL-E generates discrete audio codec codes from text and acoustic prompts

Photo Credit: Reuters

Microsoft researchers recently announced VALL-E, a new text-to-speech AI model that can accurately mimic a person's voice when given a three-second audio sample. Once it has learned a specific voice, VALL-E can synthesise audio of that person saying anything—while attempting to retain the speaker's emotional tone. When combined with other generative AI models like GPT-3, VALL-E's creators believe it can be used for high-quality text-to-speech applications, speech editing in which a recording of a person could be edited and altered from a text transcript (making them say something they did not actually say), and audio content creation.

According to Microsoft, VALL-E is primarily a "neural codec language model," and is based on EnCodec, which Meta revealed in October 2022. VALL-E creates discrete audio codec codes from text and acoustic prompts, as opposed to other text-to-speech methods that typically synthesise speech by manipulating waveforms. It processes how a person sounds, breaks the relevant data down into discrete components (referred to as "tokens") using EnCodec, and then uses training data to match what it "knows" about how that voice might sound if it spoke other phrases beyond the three-second sample.

Microsoft trained VALL-E's speech synthesis functionalities using Meta's LibriLight audio library. It includes 60,000 hours of English language speech from over 7,000 speakers, sourced primarily from LibriVox public domain audiobooks. The voice in the three-second sample should closely resemble a voice in the learning algorithm for VALL-E to produce a good result.

Advertisement

The American technology giant offers dozens of audio examples of the AI model in action on the VALL-E example website. The "Speaker Prompt" data set is the three-second audio given to VALL-E that it must try to emulate. The "Ground Truth" is a previously recorded version of that same speaker saying a specific phrase for comparative purposes (sort of like the "control" in the experiment). The "Baseline" sample is generated by a traditional text-to-speech synthesis method, and the "VALL-E" sample is generated by the VALL-E model.

Advertisement

A block diagram of VALL-E as shown in the example website by Microsoft researchers
Photo Credit: Microsoft

Advertisement

Researchers only supplied the three-second "Speaker Prompt" sample and a text string (what they would want the voice to say) into VALL-E to get those results. Some VALL-E results appear computer-generated, but others could be misunderstood for human speech, which is the model's goal. Because of VALL-E's potential to fuel wrongdoings and deceit, Microsoft has not made VALL-E code available for others to explore. The researchers appear to be aware of the potential social harm that this technology may cause.

They write in the paper's conclusion: "Since VALL-E could synthesize speech that maintains speaker identity, it may carry potential risks in misuse of the model, such as spoofing voice identification or impersonating a specific speaker. To mitigate such risks, it is possible to build a detection model to discriminate whether an audio clip was synthesized by VALL-E. We will also put Microsoft AI Principles into practice when further developing the models."


How does the Redmi 12 Pro+ fare against its competitor, the Realme 10 Pro+ 5G? Is it a worthy successor to the Redmi Note 11 Pro+ 5G? We discuss this and more on Orbital, the Gadgets 360 podcast. Orbital is available on Spotify, Gaana, JioSaavn, Google Podcasts, Apple Podcasts, Amazon Music and wherever you get your podcasts.
Affiliate links may be automatically generated - see our ethics statement for details.
 

Get your daily dose of tech news, reviews, and insights, in under 80 characters on Gadgets 360 Turbo. Connect with fellow tech lovers on our Forum. Follow us on X, Facebook, WhatsApp, Threads and Google News for instant updates. Catch all the action on our YouTube channel.

Further reading: VALL-E, Microsoft
Advertisement
Popular Mobile Brands
  1. OTT Releases of the Week: The Raja Saab, Kis Kisko Pyaar Karoon 2, Parasakthi, and More
  2. Samsung Galaxy A07 5G With 6,000mAh Battery Launched in India: See Price
  3. Vivo V70 Elite, Vivo V70 Will Launch in India on This Date
  4. Realme P4 Power 5G With 10,001mAh Battery Goes on Sale in India
  5. Asus Launches New Zenbook and Vivobook Laptops in India: See Prices, Offers
  6. James Webb Telescope Spots the Most Distant Galaxy in the Universe
  7. Qualcomm Says Smartphone Brands Reducing Production Amid Memory Shortage
  8. Anthropic Mocks ChatGPT in New Ads, OpenAI CEO Calls Them 'Deceptive'
  9. Google Pixel 10a Will Be Available for Pre-Order Later This Month
  10. iQOO 15R Confirmed to Debut With 50-Megapixel Sony LYT Camera Sensor
  1. Qualcomm Reveals Smartphone Brands Are Planning to Reduce Shipments Amid Global Memory Shortage
  2. James Webb Telescope Discovers Most Distant Galaxy From Just 300 Million Years After the Big Bang
  3. Ikka Starring Sunny Deol and Akshay Khanna to Stream Soon on Netflix: What You Need to Know
  4. Blue Origin Halts New Shepard Space Tourism for at Least Two Years
  5. YouTube’s Auto-Dubbing Is Now Available to All Users With 27 Supported Languages, New Features
  6. Oppo Find X9s to Launch in Global Markets Including India With MediaTek Dimensity 9500s SoC: Report
  7. iQOO Z11 5G, iQOO Z11 Lite 5G Reportedly Listed on IMEI Database Ahead of Anticipated Debut
  8. Faraday Future Launches its First Series of Multipurpose AI-Powered Humanoid and Bionic Robots
  9. Oppo Reno 15c 5G With 7,000mAh Battery, 50-Megapixel Camera Goes on Sale in India: Price, Offers
  10. Sony Sells 8 Million PS5 Units in Q3 FY 2025, Reports 19 Percent PlayStation Profit Growth
Gadgets 360 is available in
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2026. All rights reserved.