CN108831436A - A method of text speech synthesis after simulation speaker's mood optimization translation - Google Patents

A method of text speech synthesis after simulation speaker's mood optimization translation Download PDF

Info

Publication number
CN108831436A
CN108831436A CN201810601584.6A CN201810601584A CN108831436A CN 108831436 A CN108831436 A CN 108831436A CN 201810601584 A CN201810601584 A CN 201810601584A CN 108831436 A CN108831436 A CN 108831436A
Authority
CN
China
Prior art keywords
text
translation
obtains
interface
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810601584.6A
Other languages
Chinese (zh)
Inventor
张岩
林彦
熊涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Heyan Mdt Infotech Ltd
Original Assignee
Shenzhen Heyan Mdt Infotech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Heyan Mdt Infotech Ltd filed Critical Shenzhen Heyan Mdt Infotech Ltd
Priority to CN201810601584.6A priority Critical patent/CN108831436A/en
Publication of CN108831436A publication Critical patent/CN108831436A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Abstract

The invention discloses a kind of methods of text speech synthesis after simulation speaker's mood optimization translation, obtain the voice messaging of user first;Audio file is analyzed on backstage, obtains frequency, word speed parameter;Backstage obtains the parameters such as gender, age by importeding into Voiceprint Recognition System;Voice is obtained into text information by speech recognition;By the grammer of text, word is analyzed by text sentence, obtains emotional parameters;In conjunction with frequency, word speed, gender, age, the multiple features of mood, the characteristic value of each feature is set;By characteristic value combination speech synthesis SSML grammer, pairing is paused and is configured at speed, volume, words is broadcasted in the SSML grammer of voice.To realize that other countries' voice broadcast of synthesis reflects that speaker says the emotional characteristics of mother tongue.The present invention is by sound and language features such as the tone, intonation, word, the grammers of identification speaker, so that the mood of reflection current speaker strictly according to the facts is broadcasted in final voiced translation synthesis.

Description

A method of text speech synthesis after simulation speaker's mood optimization translation
Technical field
The present invention relates to a kind of method of speech synthesis, in particular to text after a kind of simulation speaker's mood optimization translation The method of speech synthesis belongs to voiced translation technical field.
Background technique
Current speech synthesis technique only broadcasting text is mechanical text-to-speech merely, not The mood of speaker can accurately be given expression to.The present invention by the sound such as the tone, intonation, word, grammer of identification speaker with Language feature, when the text of other language after speaker's language translation, dynamically to adjust speech synthesis rule, so that finally Voice synthesized broadcast reflects the mood of current speaker strictly according to the facts.
Summary of the invention
The technical problem to be solved by the present invention is to overcome current speech synthesis techniques by text-to-speech, only singly Pure broadcasting text machinery, can not accurately give expression to the defect of the mood of speaker, provide a kind of simulate and speak The method of text speech synthesis after the optimization translation of person's mood.
In order to solve the above-mentioned technical problems, the present invention provides the following technical solutions:
The present invention provides it is a kind of simulation speaker's mood optimization translation after text speech synthesis method, including with business The interpreting equipment of backstage signal connection, the interpreting equipment is connected with speech recognition interface by business backstage signal, vocal print is known Other interface, syntactic analysis interface, translation interface and speech synthesis interface.
As a preferred technical solution of the present invention, voiced translation synthesis step is:
Step 1:Interpreting equipment obtains the voice voice of user, obtains WAV format;
Step 2:Audio file is analyzed on business backstage, obtains frequency, word speed parameter;
Step 3:Voice messaging is imported into Application on Voiceprint Recognition interface by business backstage, identifies to obtain by Voiceprint Recognition System The parameters such as user's gender, age;
Step 4:Voice messaging is imported into speech recognition interface by business backstage, obtains text by speech recognition system Information;
Step 5:Text information after identification is imported into syntactic analysis interface by business backstage, passes through syntactic analysis system To the grammer of text, word is analyzed by text sentence, obtains emotional parameters, for example, happily, anger, indignation, passiveness etc.;
Step 6:Business backstage is in conjunction with the multiple feature ginsengs of frequency, word speed, gender, age, mood that each network analysis obtains Number, sets the characteristic value of each feature;
Step 7:Text after business backstage identifies user imports translation interface, translates to obtain mesh by translation system The text of poster speech;
Step 8:The language text that translation obtains and the characteristic value that analysis obtains are imported speech synthesis from the background and connect by business Mouthful, make speech synthesis system by characteristic value combination speech synthesis SSML grammer, pairing is fast at broadcasting in the SSML grammer of voice Degree, volume, words pause are configured, to realize that other countries' voice broadcast of synthesis reflects that speaker says this country The emotional characteristics of language.
The beneficial effects obtained by the present invention are as follows being:The tone, intonation, word, the grammer etc. that the present invention passes through identification speaker Sound and language feature when the text of other language after speaker's language translation, dynamically to adjust speech synthesis rule, make Obtain the mood that final voice synthesized broadcast reflects current speaker strictly according to the facts.
Detailed description of the invention
Attached drawing is used to provide further understanding of the present invention, and constitutes part of specification, with reality of the invention It applies example to be used to explain the present invention together, not be construed as limiting the invention.In the accompanying drawings:
Fig. 1 is structural schematic diagram of the invention;
Fig. 2 is front view of the invention.
Specific embodiment
Hereinafter, preferred embodiments of the present invention will be described with reference to the accompanying drawings, it should be understood that preferred reality described herein Apply example only for the purpose of illustrating and explaining the present invention and is not intended to limit the present invention.
Embodiment 1
As shown in Figs. 1-2, the present invention provides a kind of sides of text speech synthesis after simulation speaker's mood optimization translation Method, including the interpreting equipment connecting with business backstage signal, the interpreting equipment, which is connected with voice by business backstage signal, to be known Other interface, Application on Voiceprint Recognition interface, syntactic analysis interface, translation interface and speech synthesis interface.
Specifically, voiced translation synthesis step is:
Step 1:Interpreting equipment obtains the voice voice of user, obtains WAV format;
Step 2:Audio file is analyzed on business backstage, obtains frequency, word speed parameter;
Step 3:Voice messaging is imported into Application on Voiceprint Recognition interface by business backstage, identifies to obtain by Voiceprint Recognition System The parameters such as user's gender, age;
Step 4:Voice messaging is imported into speech recognition interface by business backstage, obtains text by speech recognition system Information;
Step 5:Text information after identification is imported into syntactic analysis interface by business backstage, passes through syntactic analysis system To the grammer of text, word is analyzed by text sentence, obtains emotional parameters, for example, happily, anger, indignation, passiveness etc.;
Step 6:Business backstage is in conjunction with the multiple feature ginsengs of frequency, word speed, gender, age, mood that each network analysis obtains Number, sets the characteristic value of each feature;
Step 7:Text after business backstage identifies user imports translation interface, translates to obtain mesh by translation system The text of poster speech;
Step 8:The language text that translation obtains and the characteristic value that analysis obtains are imported speech synthesis from the background and connect by business Mouthful, make speech synthesis system by characteristic value combination speech synthesis SSML grammer, pairing is fast at broadcasting in the SSML grammer of voice Degree, volume, words pause are configured, to realize that other countries' voice broadcast of synthesis reflects that speaker says this country The emotional characteristics of language.
The beneficial effects obtained by the present invention are as follows being:The tone, intonation, word, the grammer etc. that the present invention passes through identification speaker Sound and language feature when the text of other language after speaker's language translation, dynamically to adjust speech synthesis rule, make Obtain the mood that final voice synthesized broadcast reflects current speaker strictly according to the facts.
Finally it should be noted that:The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, Although the present invention is described in detail referring to the foregoing embodiments, for those skilled in the art, still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features. All within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should be included in of the invention Within protection scope.

Claims (2)

1. a kind of method of text speech synthesis after simulation speaker's mood optimization translation, including what is connect with business backstage signal Interpreting equipment, which is characterized in that the interpreting equipment is connected with speech recognition interface by business backstage signal, Application on Voiceprint Recognition connects Mouth, syntactic analysis interface, translation interface and speech synthesis interface.
2. the method for text speech synthesis, special after a kind of simulation speaker's mood optimization translation according to claim 1 Sign is that voiced translation synthesis step is:
Step 1:Interpreting equipment obtains the voice voice of user, obtains WAV format;
Step 2:Audio file is analyzed on business backstage, obtains frequency, word speed parameter;
Step 3:Voice messaging is imported into Application on Voiceprint Recognition interface by business backstage, identifies to obtain user by Voiceprint Recognition System The parameters such as gender, age;
Step 4:Voice messaging is imported into speech recognition interface by business backstage, obtains text information by speech recognition system;
Step 5:Text information after identification is imported into syntactic analysis interface by business backstage, by syntactic analysis system to text This grammer, word are analyzed by text sentence, obtain emotional parameters, for example, happily, anger, indignation, passiveness etc.;
Step 6:The multiple characteristic parameters of frequency, word speed, gender, age, mood that business backstage is obtained in conjunction with each network analysis, Set the characteristic value of each feature;
Step 7:Text after business backstage identifies user imports translation interface, translates to obtain target language by translation system The text of speech;
Step 8:The language text that translation obtains and the characteristic value that analysis obtains are imported speech synthesis interface from the background by business, are made Speech synthesis system is by characteristic value combination speech synthesis SSML grammer, and pairing is at casting speed, sound in the SSML grammer of voice Amount size, words pause are configured, to realize that other countries' voice broadcast of synthesis reflects that speaker says mother tongue Emotional characteristics.
CN201810601584.6A 2018-06-12 2018-06-12 A method of text speech synthesis after simulation speaker's mood optimization translation Pending CN108831436A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810601584.6A CN108831436A (en) 2018-06-12 2018-06-12 A method of text speech synthesis after simulation speaker's mood optimization translation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810601584.6A CN108831436A (en) 2018-06-12 2018-06-12 A method of text speech synthesis after simulation speaker's mood optimization translation

Publications (1)

Publication Number Publication Date
CN108831436A true CN108831436A (en) 2018-11-16

Family

ID=64144893

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810601584.6A Pending CN108831436A (en) 2018-06-12 2018-06-12 A method of text speech synthesis after simulation speaker's mood optimization translation

Country Status (1)

Country Link
CN (1) CN108831436A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109584858A (en) * 2019-01-08 2019-04-05 武汉西山艺创文化有限公司 A kind of virtual dubbing method and its device based on AI artificial intelligence
CN109658917A (en) * 2019-01-17 2019-04-19 深圳壹账通智能科技有限公司 E-book chants method, apparatus, computer equipment and storage medium
CN109712646A (en) * 2019-02-20 2019-05-03 百度在线网络技术(北京)有限公司 Voice broadcast method, device and terminal
CN110008481A (en) * 2019-04-10 2019-07-12 南京魔盒信息科技有限公司 Translated speech generation method, device, computer equipment and storage medium
CN110930977A (en) * 2019-11-12 2020-03-27 北京搜狗科技发展有限公司 Data processing method and device and electronic equipment
CN111508469A (en) * 2020-04-26 2020-08-07 北京声智科技有限公司 Text-to-speech conversion method and device
CN111986647A (en) * 2020-08-26 2020-11-24 北京声智科技有限公司 Voice synthesis method and device
CN112151064A (en) * 2020-09-25 2020-12-29 北京捷通华声科技股份有限公司 Voice broadcast method, device, computer readable storage medium and processor
CN112349271A (en) * 2020-11-06 2021-02-09 北京乐学帮网络技术有限公司 Voice information processing method and device, electronic equipment and storage medium
CN112509567A (en) * 2020-12-25 2021-03-16 北京百度网讯科技有限公司 Method, device, equipment, storage medium and program product for processing voice data
WO2021051588A1 (en) * 2019-09-19 2021-03-25 北京搜狗科技发展有限公司 Data processing method and apparatus, and apparatus used for data processing
WO2021134592A1 (en) * 2019-12-31 2021-07-08 深圳市欢太科技有限公司 Speech processing method, apparatus and device, and storage medium
WO2021217433A1 (en) * 2020-04-28 2021-11-04 青岛海信传媒网络技术有限公司 Content-based voice playback method and display device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102122297A (en) * 2011-03-04 2011-07-13 北京航空航天大学 Semantic-based Chinese network text emotion extracting method
US20120078607A1 (en) * 2010-09-29 2012-03-29 Kabushiki Kaisha Toshiba Speech translation apparatus, method and program
CN102723078A (en) * 2012-07-03 2012-10-10 武汉科技大学 Emotion speech recognition method based on natural language comprehension
CN107315742A (en) * 2017-07-03 2017-11-03 中国科学院自动化研究所 The Interpreter's method and system that personalize with good in interactive function
CN107731232A (en) * 2017-10-17 2018-02-23 深圳市沃特沃德股份有限公司 Voice translation method and device
CN107944008A (en) * 2017-12-08 2018-04-20 神思电子技术股份有限公司 A kind of method that Emotion identification is carried out for natural language

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120078607A1 (en) * 2010-09-29 2012-03-29 Kabushiki Kaisha Toshiba Speech translation apparatus, method and program
CN102122297A (en) * 2011-03-04 2011-07-13 北京航空航天大学 Semantic-based Chinese network text emotion extracting method
CN102723078A (en) * 2012-07-03 2012-10-10 武汉科技大学 Emotion speech recognition method based on natural language comprehension
CN107315742A (en) * 2017-07-03 2017-11-03 中国科学院自动化研究所 The Interpreter's method and system that personalize with good in interactive function
CN107731232A (en) * 2017-10-17 2018-02-23 深圳市沃特沃德股份有限公司 Voice translation method and device
CN107944008A (en) * 2017-12-08 2018-04-20 神思电子技术股份有限公司 A kind of method that Emotion identification is carried out for natural language

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109584858A (en) * 2019-01-08 2019-04-05 武汉西山艺创文化有限公司 A kind of virtual dubbing method and its device based on AI artificial intelligence
CN109658917A (en) * 2019-01-17 2019-04-19 深圳壹账通智能科技有限公司 E-book chants method, apparatus, computer equipment and storage medium
CN109712646A (en) * 2019-02-20 2019-05-03 百度在线网络技术(北京)有限公司 Voice broadcast method, device and terminal
CN110008481A (en) * 2019-04-10 2019-07-12 南京魔盒信息科技有限公司 Translated speech generation method, device, computer equipment and storage medium
CN110008481B (en) * 2019-04-10 2023-04-28 南京魔盒信息科技有限公司 Translated voice generating method, device, computer equipment and storage medium
WO2021051588A1 (en) * 2019-09-19 2021-03-25 北京搜狗科技发展有限公司 Data processing method and apparatus, and apparatus used for data processing
CN110930977A (en) * 2019-11-12 2020-03-27 北京搜狗科技发展有限公司 Data processing method and device and electronic equipment
WO2021134592A1 (en) * 2019-12-31 2021-07-08 深圳市欢太科技有限公司 Speech processing method, apparatus and device, and storage medium
CN111508469A (en) * 2020-04-26 2020-08-07 北京声智科技有限公司 Text-to-speech conversion method and device
WO2021217433A1 (en) * 2020-04-28 2021-11-04 青岛海信传媒网络技术有限公司 Content-based voice playback method and display device
CN113940049A (en) * 2020-04-28 2022-01-14 青岛海信传媒网络技术有限公司 Voice playing method and display device based on content
CN113940049B (en) * 2020-04-28 2023-10-31 Vidaa(荷兰)国际控股有限公司 Voice playing method based on content and display equipment
CN111986647A (en) * 2020-08-26 2020-11-24 北京声智科技有限公司 Voice synthesis method and device
CN112151064A (en) * 2020-09-25 2020-12-29 北京捷通华声科技股份有限公司 Voice broadcast method, device, computer readable storage medium and processor
CN112349271A (en) * 2020-11-06 2021-02-09 北京乐学帮网络技术有限公司 Voice information processing method and device, electronic equipment and storage medium
CN112509567A (en) * 2020-12-25 2021-03-16 北京百度网讯科技有限公司 Method, device, equipment, storage medium and program product for processing voice data

Similar Documents

Publication Publication Date Title
CN108831436A (en) A method of text speech synthesis after simulation speaker's mood optimization translation
CN101030368B (en) Method and system for communicating across channels simultaneously with emotion preservation
US9368104B2 (en) System and method for synthesizing human speech using multiple speakers and context
US10229668B2 (en) Systems and techniques for producing spoken voice prompts
US20160365087A1 (en) High end speech synthesis
US9508338B1 (en) Inserting breath sounds into text-to-speech output
US20230206897A1 (en) Electronic apparatus and method for controlling thereof
Abushariah et al. Modern standard Arabic speech corpus for implementing and evaluating automatic continuous speech recognition systems
KR20150105075A (en) Apparatus and method for automatic interpretation
CA3160315C (en) Real-time speech-to-speech generation (rssg) apparatus, method and a system therefore
CN110767233A (en) Voice conversion system and method
Onaolapo et al. A simplified overview of text-to-speech synthesis
TW201322250A (en) Polyglot speech synthesis method
Hirose et al. Temporal rate change of dialogue speech in prosodic units as compared to read speech
Aylett et al. Combining statistical parameteric speech synthesis and unit-selection for automatic voice cloning
Westall et al. Speech technology for telecommunications
Meyer Coding human languages for long-range communication in natural ecological environments: shouting, whistling, and drumming
Jaiswal et al. Concatenative Text-to-Speech Synthesis System for Communication Recognition
US11501091B2 (en) Real-time speech-to-speech generation (RSSG) and sign language conversion apparatus, method and a system therefore
US11783813B1 (en) Methods and systems for improving word discrimination with phonologically-trained machine learning models
Mohammad et al. Phonetically rich and balanced text and speech corpora for Arabic language
Hande A review on speech synthesis an artificial voice production
Lyu Acoustic Energy Analysis of Vowels in Western Yugur Language
Ojala Auditory quality evaluation of present Finnish text-to-speech systems
TWM621764U (en) A system for customized speech

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20181116