CN108831436A

CN108831436A - A method of text speech synthesis after simulation speaker's mood optimization translation

Info

Publication number: CN108831436A
Application number: CN201810601584.6A
Authority: CN
Inventors: 张岩; 林彦; 熊涛
Original assignee: Shenzhen Heyan Mdt Infotech Ltd
Current assignee: Shenzhen Heyan Mdt Infotech Ltd
Priority date: 2018-06-12
Filing date: 2018-06-12
Publication date: 2018-11-16

Abstract

The invention discloses a kind of methods of text speech synthesis after simulation speaker's mood optimization translation, obtain the voice messaging of user first；Audio file is analyzed on backstage, obtains frequency, word speed parameter；Backstage obtains the parameters such as gender, age by importeding into Voiceprint Recognition System；Voice is obtained into text information by speech recognition；By the grammer of text, word is analyzed by text sentence, obtains emotional parameters；In conjunction with frequency, word speed, gender, age, the multiple features of mood, the characteristic value of each feature is set；By characteristic value combination speech synthesis SSML grammer, pairing is paused and is configured at speed, volume, words is broadcasted in the SSML grammer of voice.To realize that other countries' voice broadcast of synthesis reflects that speaker says the emotional characteristics of mother tongue.The present invention is by sound and language features such as the tone, intonation, word, the grammers of identification speaker, so that the mood of reflection current speaker strictly according to the facts is broadcasted in final voiced translation synthesis.

Description

A method of text speech synthesis after simulation speaker's mood optimization translation

Technical field

The present invention relates to a kind of method of speech synthesis, in particular to text after a kind of simulation speaker's mood optimization translation The method of speech synthesis belongs to voiced translation technical field.

Background technique

Current speech synthesis technique only broadcasting text is mechanical text-to-speech merely, not The mood of speaker can accurately be given expression to.The present invention by the sound such as the tone, intonation, word, grammer of identification speaker with Language feature, when the text of other language after speaker's language translation, dynamically to adjust speech synthesis rule, so that finally Voice synthesized broadcast reflects the mood of current speaker strictly according to the facts.

Summary of the invention

The technical problem to be solved by the present invention is to overcome current speech synthesis techniques by text-to-speech, only singly Pure broadcasting text machinery, can not accurately give expression to the defect of the mood of speaker, provide a kind of simulate and speak The method of text speech synthesis after the optimization translation of person's mood.

In order to solve the above-mentioned technical problems, the present invention provides the following technical solutions：

The present invention provides it is a kind of simulation speaker's mood optimization translation after text speech synthesis method, including with business The interpreting equipment of backstage signal connection, the interpreting equipment is connected with speech recognition interface by business backstage signal, vocal print is known Other interface, syntactic analysis interface, translation interface and speech synthesis interface.

As a preferred technical solution of the present invention, voiced translation synthesis step is：

Step 1：Interpreting equipment obtains the voice voice of user, obtains WAV format；

Step 2：Audio file is analyzed on business backstage, obtains frequency, word speed parameter；

Step 3：Voice messaging is imported into Application on Voiceprint Recognition interface by business backstage, identifies to obtain by Voiceprint Recognition System The parameters such as user's gender, age；

Step 4：Voice messaging is imported into speech recognition interface by business backstage, obtains text by speech recognition system Information；

Step 5：Text information after identification is imported into syntactic analysis interface by business backstage, passes through syntactic analysis system To the grammer of text, word is analyzed by text sentence, obtains emotional parameters, for example, happily, anger, indignation, passiveness etc.；

Step 6：Business backstage is in conjunction with the multiple feature ginsengs of frequency, word speed, gender, age, mood that each network analysis obtains Number, sets the characteristic value of each feature；

Step 7：Text after business backstage identifies user imports translation interface, translates to obtain mesh by translation system The text of poster speech；

Step 8：The language text that translation obtains and the characteristic value that analysis obtains are imported speech synthesis from the background and connect by business Mouthful, make speech synthesis system by characteristic value combination speech synthesis SSML grammer, pairing is fast at broadcasting in the SSML grammer of voice Degree, volume, words pause are configured, to realize that other countries' voice broadcast of synthesis reflects that speaker says this country The emotional characteristics of language.

The beneficial effects obtained by the present invention are as follows being：The tone, intonation, word, the grammer etc. that the present invention passes through identification speaker Sound and language feature when the text of other language after speaker's language translation, dynamically to adjust speech synthesis rule, make Obtain the mood that final voice synthesized broadcast reflects current speaker strictly according to the facts.

Detailed description of the invention

Attached drawing is used to provide further understanding of the present invention, and constitutes part of specification, with reality of the invention It applies example to be used to explain the present invention together, not be construed as limiting the invention.In the accompanying drawings：

Fig. 1 is structural schematic diagram of the invention；

Fig. 2 is front view of the invention.

Specific embodiment

Hereinafter, preferred embodiments of the present invention will be described with reference to the accompanying drawings, it should be understood that preferred reality described herein Apply example only for the purpose of illustrating and explaining the present invention and is not intended to limit the present invention.

Embodiment 1

As shown in Figs. 1-2, the present invention provides a kind of sides of text speech synthesis after simulation speaker's mood optimization translation Method, including the interpreting equipment connecting with business backstage signal, the interpreting equipment, which is connected with voice by business backstage signal, to be known Other interface, Application on Voiceprint Recognition interface, syntactic analysis interface, translation interface and speech synthesis interface.

Specifically, voiced translation synthesis step is：

Finally it should be noted that：The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, Although the present invention is described in detail referring to the foregoing embodiments, for those skilled in the art, still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features. All within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should be included in of the invention Within protection scope.

Claims

1. a kind of method of text speech synthesis after simulation speaker's mood optimization translation, including what is connect with business backstage signal Interpreting equipment, which is characterized in that the interpreting equipment is connected with speech recognition interface by business backstage signal, Application on Voiceprint Recognition connects Mouth, syntactic analysis interface, translation interface and speech synthesis interface.

2. the method for text speech synthesis, special after a kind of simulation speaker's mood optimization translation according to claim 1 Sign is that voiced translation synthesis step is：

Step 3：Voice messaging is imported into Application on Voiceprint Recognition interface by business backstage, identifies to obtain user by Voiceprint Recognition System The parameters such as gender, age；

Step 4：Voice messaging is imported into speech recognition interface by business backstage, obtains text information by speech recognition system；

Step 5：Text information after identification is imported into syntactic analysis interface by business backstage, by syntactic analysis system to text This grammer, word are analyzed by text sentence, obtain emotional parameters, for example, happily, anger, indignation, passiveness etc.；

Step 6：The multiple characteristic parameters of frequency, word speed, gender, age, mood that business backstage is obtained in conjunction with each network analysis, Set the characteristic value of each feature；

Step 7：Text after business backstage identifies user imports translation interface, translates to obtain target language by translation system The text of speech；

Step 8：The language text that translation obtains and the characteristic value that analysis obtains are imported speech synthesis interface from the background by business, are made Speech synthesis system is by characteristic value combination speech synthesis SSML grammer, and pairing is at casting speed, sound in the SSML grammer of voice Amount size, words pause are configured, to realize that other countries' voice broadcast of synthesis reflects that speaker says mother tongue Emotional characteristics.