In this multi-modal task natural language processing and machine learning are utilized to produce human-like speech from text.
Natural Human Language in textual form
Synthetic speech audio
To convert text into a natural-sounding voice
Text-to-speech synthesis using deep learning
Naturalness, intelligibility, accuracy, and similarity to human voice