Specialists of the search giant Google published an article in which they told that they had created a speech generator capable of speaking indistinguishable from a living person in a voice. The development is called Tacotron 2 and can very qualitatively convert text to speech.
The program is two interconnected neural networks of in-depth training. The first neural network creates a spectrogram on the basis of the text and transmits it to the second WaveNet algorithm, which it voices with a “voice”. Tacotron 2 knows a lot of nuances, easily copes with difficult words in pronunciation and, reading from the sheet, takes into account punctuation. Thanks to this, for example, she distinguishes the end of the sentence and the beginning of a new one, highlighting them with intonation.
Samples of work of the application experts have already laid out on the page devoted to the development. This sounds much better than the monotonous mechanical voices of modern voice-over programs, so, I believe, Google will quickly find application development. WaveNet is already used in Google Assistant, so Tacotron 2 will certainly be an excellent addition to it.
At this stage of development, Tacotron 2 speaks only in a pleasant female voice, but it is likely that in the future it will acquire a male version, and, given its ability to learn, it is possible to learn and imitate other voices.