Disclaimer: We may earn a commission if you make any purchase by clicking our links. Please see our detailed guide here.

Follow us on:

Google News

Microsoft’s new AI voice can interpret the emotion lying beneath the human voice

Bipasha Mandal
Bipasha Mandal
Bipasha Mondal is writer at TechGenyz

Join the Opinion Leaders Network

Join the Techgenyz Opinion Leaders Network today and become part of a vibrant community of change-makers. Together, we can create a brighter future by shaping opinions, driving conversations, and transforming ideas into reality.

Microsoft recently launched a new voice technology that will make the intelligent voice more delicate and controllable. The new artificial intelligent speech technology will let the users adjust the “multi-emotional” level.

It is a general consensus that human emotions can be detected in the slightest chance of voice and tone. The new Smart Voice technology from Microsoft can detect the underlying human emotion in a voice by distinguishing Happy, Sad, Angry, Fearful, Disgruntled, Serious, Affectionate, Gentle, Depressed, Embarrassed (Embarrassed), calm (Calm), and other emotions. The calm tone will act as the zero points and one percent as a quantitative unit. The new technology can also detect the change in emotion through speech in Chinese voice also, such as Xiaoxiao, Yunxi, Yunye, Xiaohan, Xiaoxuan, Xiaomo, and Xiaorui all support the “emotional level” adjustment technology. Moreover, the new Smart Voice has different ages, genders and personalities.

The new intelligent speech emotion adjustment is based on an adaptive neural network. The developers make use of the SSML tags to control the degree of emotion. However, this does not mean that only the developers possessing the SSML labeling can use this feature; users who do not have any programming training can also access the feature through the audio content creation platform.

The Smart Voice technology integrates the automatic text sentiment analysis technology to predict the range of emotional categories and successfully interpret the underlying emotion under speech.  This will come in handy when interpreting any speech pattern. Some reports have compared the new technology with the director’s casting since it can better interpret works through the most suitable voice and the most appropriate emotions.

It is suitable for chat robots, audiobook reading, automatic film, and television dubbing, and games Wait for many scenarios. Hopefully, in the future, this feature will expand to encompass other services as well.


Partner With Us

Digital advertising offers a way for your business to reach out and make much-needed connections with your audience in a meaningful way. Advertising on Techgenyz will help you build brand awareness, increase website traffic, generate qualified leads, and grow your business.

Power Your Business

Solutions you need to super charge your business and drive growth

More from this topic