Mayukh Saha | Truth Theory | Source URL
Google’s new AI has been developed to translate one’s speech while keeping the original voice. To put it simply, let’s think back to any other text to speech; most of them give off a glassy monotone after being translated, because the speech is first converted into text, then translated, and resynthesized back into speech. This loses the original voice.
Google’s AI cuts off all the intermediary steps, as it converts the input audio, into output audio.
This new innovation has been named the Translatotron, and it has 3 components which are all designed to smoothen any hitches while translating. The first element consists of the neural network which helps in mapping the input audio spectrogram to the output audio spectrogram. The second component helps in synthesizing the spectrogram so it can be played back, and the third put in the original characteristics of the voice back onto the new output.
It has a two-fold advantage- it not only brings in a nuanced translation, owing to the nonverbal cues that weren’t omitted, it also reduces the errors in translation, due to the fewer number of steps.
Even though it isn’t ready for the market just yet, researchers did use it for a translation to English from Spanish that was based on highly curated and analyzed data. It won’t be wrong to believe that something even more highly developed would be in the market pretty soon.
You can listen to some of their samples, here.