CN108831436A - A method of text speech synthesis after simulation speaker's mood optimization translation - Google Patents
A method of text speech synthesis after simulation speaker's mood optimization translation Download PDFInfo
- Publication number
- CN108831436A CN108831436A CN201810601584.6A CN201810601584A CN108831436A CN 108831436 A CN108831436 A CN 108831436A CN 201810601584 A CN201810601584 A CN 201810601584A CN 108831436 A CN108831436 A CN 108831436A
- Authority
- CN
- China
- Prior art keywords
- text
- translation
- obtains
- interface
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000015572 biosynthetic process Effects 0.000 title claims abstract description 35
- 238000003786 synthesis reaction Methods 0.000 title claims abstract description 35
- 230000036651 mood Effects 0.000 title claims abstract description 19
- 238000000034 method Methods 0.000 title claims abstract description 12
- 238000005457 optimization Methods 0.000 title claims abstract description 9
- 238000004088 simulation Methods 0.000 title claims abstract description 8
- 230000002996 emotional effect Effects 0.000 claims abstract description 8
- 230000008676 import Effects 0.000 claims description 3
- 238000003012 network analysis Methods 0.000 claims description 3
- 238000005266 casting Methods 0.000 claims 1
- 241000208340 Araliaceae Species 0.000 description 2
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 2
- 235000003140 Panax quinquefolius Nutrition 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 235000008434 ginseng Nutrition 0.000 description 2
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000001308 synthesis method Methods 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
Abstract
The invention discloses a kind of methods of text speech synthesis after simulation speaker's mood optimization translation, obtain the voice messaging of user first;Audio file is analyzed on backstage, obtains frequency, word speed parameter;Backstage obtains the parameters such as gender, age by importeding into Voiceprint Recognition System;Voice is obtained into text information by speech recognition;By the grammer of text, word is analyzed by text sentence, obtains emotional parameters;In conjunction with frequency, word speed, gender, age, the multiple features of mood, the characteristic value of each feature is set;By characteristic value combination speech synthesis SSML grammer, pairing is paused and is configured at speed, volume, words is broadcasted in the SSML grammer of voice.To realize that other countries' voice broadcast of synthesis reflects that speaker says the emotional characteristics of mother tongue.The present invention is by sound and language features such as the tone, intonation, word, the grammers of identification speaker, so that the mood of reflection current speaker strictly according to the facts is broadcasted in final voiced translation synthesis.
Description
Technical field
The present invention relates to a kind of method of speech synthesis, in particular to text after a kind of simulation speaker's mood optimization translation
The method of speech synthesis belongs to voiced translation technical field.
Background technique
Current speech synthesis technique only broadcasting text is mechanical text-to-speech merely, not
The mood of speaker can accurately be given expression to.The present invention by the sound such as the tone, intonation, word, grammer of identification speaker with
Language feature, when the text of other language after speaker's language translation, dynamically to adjust speech synthesis rule, so that finally
Voice synthesized broadcast reflects the mood of current speaker strictly according to the facts.
Summary of the invention
The technical problem to be solved by the present invention is to overcome current speech synthesis techniques by text-to-speech, only singly
Pure broadcasting text machinery, can not accurately give expression to the defect of the mood of speaker, provide a kind of simulate and speak
The method of text speech synthesis after the optimization translation of person's mood.
In order to solve the above-mentioned technical problems, the present invention provides the following technical solutions:
The present invention provides it is a kind of simulation speaker's mood optimization translation after text speech synthesis method, including with business
The interpreting equipment of backstage signal connection, the interpreting equipment is connected with speech recognition interface by business backstage signal, vocal print is known
Other interface, syntactic analysis interface, translation interface and speech synthesis interface.
As a preferred technical solution of the present invention, voiced translation synthesis step is:
Step 1:Interpreting equipment obtains the voice voice of user, obtains WAV format;
Step 2:Audio file is analyzed on business backstage, obtains frequency, word speed parameter;
Step 3:Voice messaging is imported into Application on Voiceprint Recognition interface by business backstage, identifies to obtain by Voiceprint Recognition System
The parameters such as user's gender, age;
Step 4:Voice messaging is imported into speech recognition interface by business backstage, obtains text by speech recognition system
Information;
Step 5:Text information after identification is imported into syntactic analysis interface by business backstage, passes through syntactic analysis system
To the grammer of text, word is analyzed by text sentence, obtains emotional parameters, for example, happily, anger, indignation, passiveness etc.;
Step 6:Business backstage is in conjunction with the multiple feature ginsengs of frequency, word speed, gender, age, mood that each network analysis obtains
Number, sets the characteristic value of each feature;
Step 7:Text after business backstage identifies user imports translation interface, translates to obtain mesh by translation system
The text of poster speech;
Step 8:The language text that translation obtains and the characteristic value that analysis obtains are imported speech synthesis from the background and connect by business
Mouthful, make speech synthesis system by characteristic value combination speech synthesis SSML grammer, pairing is fast at broadcasting in the SSML grammer of voice
Degree, volume, words pause are configured, to realize that other countries' voice broadcast of synthesis reflects that speaker says this country
The emotional characteristics of language.
The beneficial effects obtained by the present invention are as follows being:The tone, intonation, word, the grammer etc. that the present invention passes through identification speaker
Sound and language feature when the text of other language after speaker's language translation, dynamically to adjust speech synthesis rule, make
Obtain the mood that final voice synthesized broadcast reflects current speaker strictly according to the facts.
Detailed description of the invention
Attached drawing is used to provide further understanding of the present invention, and constitutes part of specification, with reality of the invention
It applies example to be used to explain the present invention together, not be construed as limiting the invention.In the accompanying drawings:
Fig. 1 is structural schematic diagram of the invention;
Fig. 2 is front view of the invention.
Specific embodiment
Hereinafter, preferred embodiments of the present invention will be described with reference to the accompanying drawings, it should be understood that preferred reality described herein
Apply example only for the purpose of illustrating and explaining the present invention and is not intended to limit the present invention.
Embodiment 1
As shown in Figs. 1-2, the present invention provides a kind of sides of text speech synthesis after simulation speaker's mood optimization translation
Method, including the interpreting equipment connecting with business backstage signal, the interpreting equipment, which is connected with voice by business backstage signal, to be known
Other interface, Application on Voiceprint Recognition interface, syntactic analysis interface, translation interface and speech synthesis interface.
Specifically, voiced translation synthesis step is:
Step 1:Interpreting equipment obtains the voice voice of user, obtains WAV format;
Step 2:Audio file is analyzed on business backstage, obtains frequency, word speed parameter;
Step 3:Voice messaging is imported into Application on Voiceprint Recognition interface by business backstage, identifies to obtain by Voiceprint Recognition System
The parameters such as user's gender, age;
Step 4:Voice messaging is imported into speech recognition interface by business backstage, obtains text by speech recognition system
Information;
Step 5:Text information after identification is imported into syntactic analysis interface by business backstage, passes through syntactic analysis system
To the grammer of text, word is analyzed by text sentence, obtains emotional parameters, for example, happily, anger, indignation, passiveness etc.;
Step 6:Business backstage is in conjunction with the multiple feature ginsengs of frequency, word speed, gender, age, mood that each network analysis obtains
Number, sets the characteristic value of each feature;
Step 7:Text after business backstage identifies user imports translation interface, translates to obtain mesh by translation system
The text of poster speech;
Step 8:The language text that translation obtains and the characteristic value that analysis obtains are imported speech synthesis from the background and connect by business
Mouthful, make speech synthesis system by characteristic value combination speech synthesis SSML grammer, pairing is fast at broadcasting in the SSML grammer of voice
Degree, volume, words pause are configured, to realize that other countries' voice broadcast of synthesis reflects that speaker says this country
The emotional characteristics of language.
The beneficial effects obtained by the present invention are as follows being:The tone, intonation, word, the grammer etc. that the present invention passes through identification speaker
Sound and language feature when the text of other language after speaker's language translation, dynamically to adjust speech synthesis rule, make
Obtain the mood that final voice synthesized broadcast reflects current speaker strictly according to the facts.
Finally it should be noted that:The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention,
Although the present invention is described in detail referring to the foregoing embodiments, for those skilled in the art, still may be used
To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features.
All within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should be included in of the invention
Within protection scope.
Claims (2)
1. a kind of method of text speech synthesis after simulation speaker's mood optimization translation, including what is connect with business backstage signal
Interpreting equipment, which is characterized in that the interpreting equipment is connected with speech recognition interface by business backstage signal, Application on Voiceprint Recognition connects
Mouth, syntactic analysis interface, translation interface and speech synthesis interface.
2. the method for text speech synthesis, special after a kind of simulation speaker's mood optimization translation according to claim 1
Sign is that voiced translation synthesis step is:
Step 1:Interpreting equipment obtains the voice voice of user, obtains WAV format;
Step 2:Audio file is analyzed on business backstage, obtains frequency, word speed parameter;
Step 3:Voice messaging is imported into Application on Voiceprint Recognition interface by business backstage, identifies to obtain user by Voiceprint Recognition System
The parameters such as gender, age;
Step 4:Voice messaging is imported into speech recognition interface by business backstage, obtains text information by speech recognition system;
Step 5:Text information after identification is imported into syntactic analysis interface by business backstage, by syntactic analysis system to text
This grammer, word are analyzed by text sentence, obtain emotional parameters, for example, happily, anger, indignation, passiveness etc.;
Step 6:The multiple characteristic parameters of frequency, word speed, gender, age, mood that business backstage is obtained in conjunction with each network analysis,
Set the characteristic value of each feature;
Step 7:Text after business backstage identifies user imports translation interface, translates to obtain target language by translation system
The text of speech;
Step 8:The language text that translation obtains and the characteristic value that analysis obtains are imported speech synthesis interface from the background by business, are made
Speech synthesis system is by characteristic value combination speech synthesis SSML grammer, and pairing is at casting speed, sound in the SSML grammer of voice
Amount size, words pause are configured, to realize that other countries' voice broadcast of synthesis reflects that speaker says mother tongue
Emotional characteristics.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810601584.6A CN108831436A (en) | 2018-06-12 | 2018-06-12 | A method of text speech synthesis after simulation speaker's mood optimization translation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810601584.6A CN108831436A (en) | 2018-06-12 | 2018-06-12 | A method of text speech synthesis after simulation speaker's mood optimization translation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108831436A true CN108831436A (en) | 2018-11-16 |
Family
ID=64144893
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810601584.6A Pending CN108831436A (en) | 2018-06-12 | 2018-06-12 | A method of text speech synthesis after simulation speaker's mood optimization translation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108831436A (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109584858A (en) * | 2019-01-08 | 2019-04-05 | 武汉西山艺创文化有限公司 | A kind of virtual dubbing method and its device based on AI artificial intelligence |
CN109658917A (en) * | 2019-01-17 | 2019-04-19 | 深圳壹账通智能科技有限公司 | E-book chants method, apparatus, computer equipment and storage medium |
CN109712646A (en) * | 2019-02-20 | 2019-05-03 | 百度在线网络技术(北京)有限公司 | Voice broadcast method, device and terminal |
CN110008481A (en) * | 2019-04-10 | 2019-07-12 | 南京魔盒信息科技有限公司 | Translated speech generation method, device, computer equipment and storage medium |
CN110930977A (en) * | 2019-11-12 | 2020-03-27 | 北京搜狗科技发展有限公司 | Data processing method and device and electronic equipment |
CN111508469A (en) * | 2020-04-26 | 2020-08-07 | 北京声智科技有限公司 | Text-to-speech conversion method and device |
CN111986647A (en) * | 2020-08-26 | 2020-11-24 | 北京声智科技有限公司 | Voice synthesis method and device |
CN112151064A (en) * | 2020-09-25 | 2020-12-29 | 北京捷通华声科技股份有限公司 | Voice broadcast method, device, computer readable storage medium and processor |
CN112349271A (en) * | 2020-11-06 | 2021-02-09 | 北京乐学帮网络技术有限公司 | Voice information processing method and device, electronic equipment and storage medium |
CN112509567A (en) * | 2020-12-25 | 2021-03-16 | 北京百度网讯科技有限公司 | Method, device, equipment, storage medium and program product for processing voice data |
WO2021051588A1 (en) * | 2019-09-19 | 2021-03-25 | 北京搜狗科技发展有限公司 | Data processing method and apparatus, and apparatus used for data processing |
WO2021134592A1 (en) * | 2019-12-31 | 2021-07-08 | 深圳市欢太科技有限公司 | Speech processing method, apparatus and device, and storage medium |
WO2021217433A1 (en) * | 2020-04-28 | 2021-11-04 | 青岛海信传媒网络技术有限公司 | Content-based voice playback method and display device |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102122297A (en) * | 2011-03-04 | 2011-07-13 | 北京航空航天大学 | Semantic-based Chinese network text emotion extracting method |
US20120078607A1 (en) * | 2010-09-29 | 2012-03-29 | Kabushiki Kaisha Toshiba | Speech translation apparatus, method and program |
CN102723078A (en) * | 2012-07-03 | 2012-10-10 | 武汉科技大学 | Emotion speech recognition method based on natural language comprehension |
CN107315742A (en) * | 2017-07-03 | 2017-11-03 | 中国科学院自动化研究所 | The Interpreter's method and system that personalize with good in interactive function |
CN107731232A (en) * | 2017-10-17 | 2018-02-23 | 深圳市沃特沃德股份有限公司 | Voice translation method and device |
CN107944008A (en) * | 2017-12-08 | 2018-04-20 | 神思电子技术股份有限公司 | A kind of method that Emotion identification is carried out for natural language |
-
2018
- 2018-06-12 CN CN201810601584.6A patent/CN108831436A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120078607A1 (en) * | 2010-09-29 | 2012-03-29 | Kabushiki Kaisha Toshiba | Speech translation apparatus, method and program |
CN102122297A (en) * | 2011-03-04 | 2011-07-13 | 北京航空航天大学 | Semantic-based Chinese network text emotion extracting method |
CN102723078A (en) * | 2012-07-03 | 2012-10-10 | 武汉科技大学 | Emotion speech recognition method based on natural language comprehension |
CN107315742A (en) * | 2017-07-03 | 2017-11-03 | 中国科学院自动化研究所 | The Interpreter's method and system that personalize with good in interactive function |
CN107731232A (en) * | 2017-10-17 | 2018-02-23 | 深圳市沃特沃德股份有限公司 | Voice translation method and device |
CN107944008A (en) * | 2017-12-08 | 2018-04-20 | 神思电子技术股份有限公司 | A kind of method that Emotion identification is carried out for natural language |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109584858A (en) * | 2019-01-08 | 2019-04-05 | 武汉西山艺创文化有限公司 | A kind of virtual dubbing method and its device based on AI artificial intelligence |
CN109658917A (en) * | 2019-01-17 | 2019-04-19 | 深圳壹账通智能科技有限公司 | E-book chants method, apparatus, computer equipment and storage medium |
CN109712646A (en) * | 2019-02-20 | 2019-05-03 | 百度在线网络技术(北京)有限公司 | Voice broadcast method, device and terminal |
CN110008481A (en) * | 2019-04-10 | 2019-07-12 | 南京魔盒信息科技有限公司 | Translated speech generation method, device, computer equipment and storage medium |
CN110008481B (en) * | 2019-04-10 | 2023-04-28 | 南京魔盒信息科技有限公司 | Translated voice generating method, device, computer equipment and storage medium |
WO2021051588A1 (en) * | 2019-09-19 | 2021-03-25 | 北京搜狗科技发展有限公司 | Data processing method and apparatus, and apparatus used for data processing |
CN110930977A (en) * | 2019-11-12 | 2020-03-27 | 北京搜狗科技发展有限公司 | Data processing method and device and electronic equipment |
WO2021134592A1 (en) * | 2019-12-31 | 2021-07-08 | 深圳市欢太科技有限公司 | Speech processing method, apparatus and device, and storage medium |
CN111508469A (en) * | 2020-04-26 | 2020-08-07 | 北京声智科技有限公司 | Text-to-speech conversion method and device |
WO2021217433A1 (en) * | 2020-04-28 | 2021-11-04 | 青岛海信传媒网络技术有限公司 | Content-based voice playback method and display device |
CN113940049A (en) * | 2020-04-28 | 2022-01-14 | 青岛海信传媒网络技术有限公司 | Voice playing method and display device based on content |
CN113940049B (en) * | 2020-04-28 | 2023-10-31 | Vidaa(荷兰)国际控股有限公司 | Voice playing method based on content and display equipment |
CN111986647A (en) * | 2020-08-26 | 2020-11-24 | 北京声智科技有限公司 | Voice synthesis method and device |
CN112151064A (en) * | 2020-09-25 | 2020-12-29 | 北京捷通华声科技股份有限公司 | Voice broadcast method, device, computer readable storage medium and processor |
CN112349271A (en) * | 2020-11-06 | 2021-02-09 | 北京乐学帮网络技术有限公司 | Voice information processing method and device, electronic equipment and storage medium |
CN112509567A (en) * | 2020-12-25 | 2021-03-16 | 北京百度网讯科技有限公司 | Method, device, equipment, storage medium and program product for processing voice data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108831436A (en) | A method of text speech synthesis after simulation speaker's mood optimization translation | |
CN101030368B (en) | Method and system for communicating across channels simultaneously with emotion preservation | |
US9368104B2 (en) | System and method for synthesizing human speech using multiple speakers and context | |
US10229668B2 (en) | Systems and techniques for producing spoken voice prompts | |
US20160365087A1 (en) | High end speech synthesis | |
US9508338B1 (en) | Inserting breath sounds into text-to-speech output | |
US20230206897A1 (en) | Electronic apparatus and method for controlling thereof | |
Abushariah et al. | Modern standard Arabic speech corpus for implementing and evaluating automatic continuous speech recognition systems | |
KR20150105075A (en) | Apparatus and method for automatic interpretation | |
CA3160315C (en) | Real-time speech-to-speech generation (rssg) apparatus, method and a system therefore | |
CN110767233A (en) | Voice conversion system and method | |
Onaolapo et al. | A simplified overview of text-to-speech synthesis | |
TW201322250A (en) | Polyglot speech synthesis method | |
Hirose et al. | Temporal rate change of dialogue speech in prosodic units as compared to read speech | |
Aylett et al. | Combining statistical parameteric speech synthesis and unit-selection for automatic voice cloning | |
Westall et al. | Speech technology for telecommunications | |
Meyer | Coding human languages for long-range communication in natural ecological environments: shouting, whistling, and drumming | |
Jaiswal et al. | Concatenative Text-to-Speech Synthesis System for Communication Recognition | |
US11501091B2 (en) | Real-time speech-to-speech generation (RSSG) and sign language conversion apparatus, method and a system therefore | |
US11783813B1 (en) | Methods and systems for improving word discrimination with phonologically-trained machine learning models | |
Mohammad et al. | Phonetically rich and balanced text and speech corpora for Arabic language | |
Hande | A review on speech synthesis an artificial voice production | |
Lyu | Acoustic Energy Analysis of Vowels in Western Yugur Language | |
Ojala | Auditory quality evaluation of present Finnish text-to-speech systems | |
TWM621764U (en) | A system for customized speech |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181116 |