CN1945692B - Intelligent method for improving prompting voice matching effect in voice synthetic system - Google Patents

Intelligent method for improving prompting voice matching effect in voice synthetic system Download PDF

Info

Publication number
CN1945692B
CN1945692B CN200610096676A CN200610096676A CN1945692B CN 1945692 B CN1945692 B CN 1945692B CN 200610096676 A CN200610096676 A CN 200610096676A CN 200610096676 A CN200610096676 A CN 200610096676A CN 1945692 B CN1945692 B CN 1945692B
Authority
CN
China
Prior art keywords
prompt tone
synthesis system
character
text
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN200610096676A
Other languages
Chinese (zh)
Other versions
CN1945692A (en
Inventor
王仁华
刘庆峰
吴晓如
严峻
赵志伟
熊厚余
李文兵
于继栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
ZHONGKEDA XUNFEI INFORMATION SCIENCE & TECHNOLOGY Co Ltd ANHUI PROV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZHONGKEDA XUNFEI INFORMATION SCIENCE & TECHNOLOGY Co Ltd ANHUI PROV filed Critical ZHONGKEDA XUNFEI INFORMATION SCIENCE & TECHNOLOGY Co Ltd ANHUI PROV
Priority to CN200610096676A priority Critical patent/CN1945692B/en
Publication of CN1945692A publication Critical patent/CN1945692A/en
Application granted granted Critical
Publication of CN1945692B publication Critical patent/CN1945692B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

This invention discloses an intelligent method for increasing the matched result of prompt sounds in a phone synthesizing system including setting up the resource of prompt sound library and resource indexes, in which, in the process of synthesizing texts using the phone synthesizing system, the intelligent matching of a synthesized text and resource of prompt sounds is realized after the match of the character layer of prompt sounds, the match of the spelling layer and automatic structure so as to finish the transformation from text to prompt sound, which increases the usability of prompt sounds.

Description

A kind of intelligent method that in speech synthesis system, promotes the prompt tone matching effect
Technical field
The present invention relates to a kind of phoneme synthesizing method, be specifically related to a kind ofly finish in the transfer process from the text to the natural-sounding using a computer, the method for text with pre-record speech-sound intelligent coupling and outputting high quality voice.
Background technology
At present, in IVR, call center expected someone's call voice system, operation flow often needed to use simultaneously the prompt tone and the synthetic speech of pre-recording.The pre-record prompting sound adopts true man's recording, the effect nature, and can embody more emotion style, experience for the user with cordiality.Though synthetic speech is clear accurately, also there is a certain distance aspect the tone and the emotion with true man's recording.In concrete the application, the voice of pre-recording are used for reporting the relatively-stationary content of voice service system, normally advance the greeting of system and the prompting of system operation methods.Synthetic speech is used to report the text that content often changes, contains much information, needs are synthetic immediately.The voice of pre-recording combine with synthetic speech, both can satisfy the requirement of hommization in the call voice service, have realized the instant report of multidate information again.The strategy that voice and synthetic speech combine of pre-recording in the speech synthesis system at present is: to the text to be synthesized of user's input, synthesis system is at first compared the text on the character aspect with the text of each prompt tone the prompt tone storehouse, if mate fully, then export the voice data of this prompt tone; If there are differences, then synthesize and export synthetic speech with speech synthesis engine.
There is certain defective in actual applications in above-mentioned synthesis strategy, mainly shows:
1, because synthesis text and prompt tone text must mate on the character aspect fully,, then unmatches if the user is revised as other Chinese character with same phonetic with certain syllable in the prompt tone sentence.
Such as: recorded prompt tone " 2, inquiry into balance " in the sound storehouse,, then can not match that prompt tone of front when the synthesis text of user's input is " two, inquiry into balance ".
2, for the information of same meaning,, just need sparate sound recording one by one if there is the difference of full half-angle, punctuation mark and other symbol.In actual applications, the prompt tone enormous amount is if each bar alert tone information all will be considered from the character aspect, record the suggestion voice of many kinds of characters forms, then can increase the prompt tone storehouse and make workload, prolong sound storehouse fabrication cycle, simultaneously also cause sound storehouse bulk redundancy, cause the wasting of resources.
Summary of the invention
Purpose of the present invention just provides the method that improves the prompt tone utilization ratio in a kind of phonetic synthesis process, be used to improve prompt tone merely from character layer coupling deficiency in actual applications, effectively utilize the prompt tone storehouse, accomplish both to make full use of the natural and tripping of true man's recording, reduce sound storehouse redundancy again, thereby improve the quality of voice service.
The present invention is achieved by the following technical solutions:
A kind of intelligent method that in speech synthesis system, promotes the prompt tone matching effect, comprise the prompt tone resource of making the synthesis system needs according to the speech data of prerecording, the making of prompt tone resource comprises sets up the prompt tone index file, index file comprises: the title of each prompt tone, speaker, the deposit position of character content and speech data, the user provides text message to be synthesized to synthesis system then, synthesis system is mated through character layer, the character content of text to be synthesized and prompt tone character content are exportable synthetic speech data after must be in full accord, carry out in the synthetic process of text in the utilization synthesis system, the present invention also needs to handle through phonetic layer coupling and the automatic intellectual analysis that makes up of prompt tone; Index file also includes: Pinyin information and prompt tone make up the speech data in sound storehouse automatically and deposit the path.
A kind of intelligent method that in speech synthesis system, promotes the prompt tone matching effect, carry out in the building-up process of text in synthesis system, at first carry out the character layer coupling, if not success of character layer coupling, then carry out phonetic layer coupling, if phonetic layer coupling be not success also, the intellectual analysis that then makes up is automatically handled.
A kind of intelligent method that in speech synthesis system, promotes the prompt tone matching effect, described character layer coupling, below any one or two or three situation in three kinds of situations synthesis text and prompt tone occur also being considered as the match is successful:
(1) there is the difference of full half-angle in character content;
(2) the middle difference that has symbol of character comprises: single quotation marks, double quotation marks, middle line, backslash, punctuation marks used to enclose the title;
(3) there is the difference of symbol in the character tail, comprising: fullstop, exclamation mark, question mark, branch, comma.
A kind of intelligent method that in speech synthesis system, promotes the prompt tone matching effect, described phonetic layer coupling is meant by retrieval prompt tone index file, the Pinyin information of judging text to be synthesized whether with certain prompt tone Pinyin information identical, have only identical, think that just the match is successful, if the match is successful for the phonetic layer,, extract speech data and synthesize broadcast then according to the speech data deposit position that provides in the prompt tone index file.
A kind of intelligent method that promotes the prompt tone matching effect in speech synthesis system, the making that described prompt tone makes up the sound storehouse automatically is by instrument, extracts the speech parameter information of prompt tone resource, and stores in the binary file mode.
A kind of intelligent method that in speech synthesis system, promotes the prompt tone matching effect, if character layer, phonetic layer coupling all do not have successfully, synthesis system is according to the information of text to be synthesized, according to big corpus composition algorithm, automatically make up from prompt tone and to select the voice elementary cell the sound storehouse and carry out waveform concatenation, finally export speech data and play.
A kind of intelligent method that promotes the prompt tone matching effect in speech synthesis system, described voice elementary cell is a word.
Beneficial effect of the present invention is:
At first, prompt tone character layer coupling expands to from the original complete striking resemblances of all characters and can ignore symbol full-shape half-angle, end of the sentence punctuate and other sentence; Prompt tone phonetic layer coupling can realize the text of the different symbol of unisonance; Automatically structure has been realized coupling and the structure of prompt tone in the word one-level especially, these improve the success ratio that has effectively increased the prompt tone coupling, reduce the user owing to need record the cost of prompt tone once more to the slight change of text, greatly reduce the workload that the prompt tone storehouse makes.
Secondly, speech synthesis system is in the process of management suggestion voice, employing science algorithm guarantees the natural transition and linkage of suggestion voice and synthetic speech, in the inner format conversion of processed voice automatically of synthesis system, and provide visualization tool to help the user to solve specific (special) requirements in the practical application, such as the energy contrast of adjusting suggestion voice and synthetic speech etc.These mechanism have satisfied the flexibility ratio and the personalized requirement of practical application better.
In addition, native system also provides the customized prompts sound storehouse towards industry, satisfies the needs that the different industries prompt tone is used.
Description of drawings
Accompanying drawing is a prompt tone intelligence coupling workflow block diagram.
Embodiment
Referring to accompanying drawing.
At first make the prompt tone resource and be put in the synthesis system resource, the user enables the prompt tone function in speech synthesis system then, behind the input synthesis text, synthesis system can be according to the characteristic of text, carry out the intelligence coupling with the prompt tone resource in the prompt tone storehouse, comprise the matching process of three levels: ground floor is the character layer coupling; The second layer is called phonetic layer coupling, solves the character difference, but the identical prompt tone composition problem of phonetic; The 3rd layer of coupling is called prompt tone and makes up automatically, solve a large amount of prompt tones record finish after, how effectively utilize existing problem of resource. when any one deck matches, will call corresponding prompt sound speech data and splice, thereby the output speech data.
The performing step of the intelligent coupling of prompt tone comprises four steps: the prompt tone base resource is made, character layer mates, the phonetic layer mates and automatic the structure.
The first step, resources making.
In prompt tone intelligence matching process, relate to three resources, be specially:
(1) prompt tone index file, in order to write down the prompt tone clauses and subclauses that all are recorded, comprise each prompt tone title, speaker, character content, Pinyin information, speech data (referring to: make up sound storehouse or packing resource automatically) deposit path etc.
(2) prompt tone makes up the sound storehouse automatically, and voice technology professional extracts the speech parameter information of prompt tone by the resources making instrument, and stores in the binary file mode, forms prompt tone sound storehouse; Again because this process is complete machine robotization, so claim that this sound storehouse is that prompt tone makes up the sound storehouse automatically.
(3) prompt tone packing resource, some prompt tone data are arranged, owing to time or other reason are not made into automatic structure sound storehouse, but these prompt tones also need use in a kind of resource mode, at this moment the user just can use instrument that the voice technology professional provides with prompt tone speech data (as: wav etc.) packing, forms prompt tone packing resource; Attention: the prompt tone in the packing resource can only be used as character layer and phonetic layer coupling.
Second step, the character layer coupling.
The user imports synthesis text in synthesis system after, synthesis system is at first carried out text analyzing, comprises operations such as subordinate sentence, character processing, and after text analyzing, system just can carry out the ground floor coupling: the character layer coupling.
The specific implementation flow process of character layer coupling is:
According to current synthetic content of text, relatively whether the character content of text is identical with the character content of certain prompt tone from the character aspect, and here identical comprises following several situation:
(1) character content is just the same;
(2) there is the difference of full half-angle in character content;
(3) the middle difference that has symbol of character; Comprise: ' (single quotation marks), " " (double quotation marks) ,-(middle line) ,/(backslash),<(punctuation marks used to enclose the title);
(4) there is the difference of symbol in the character tail, comprising: fullstop, exclamation mark, question mark, branch, comma.
For example: " this incentive method is finally explained ownership China Merchants Bank to suppose that prompt tone is arranged.", the user imports the text after following several change, still can match this prompt tone:
(1) this incentive method/final power of interpretation/genus China Merchants Bank.
(2) this incentive method is finally explained ownership " China Merchants Bank ".
(3) this incentive method is finally explained ownership<China Merchants Bank 〉.
(4) this incentive method---ownership China Merchants Bank finally explained?
If character layer matches, will arrive and obtain its voice data path in the prompt tone index file, make up automatically from prompt tone according to the path and extract speech data storehouse or the prompt tone packing resource and play.
The 3rd step, phonetic layer coupling.
If it fails to match for the synthesis text ground floor, system will carry out prosodic analysis to synthesis text, determines the Pinyin information (Chinese refers to the Chinese phonetic alphabet, and English refers to phonetic symbol) of synthesis text, and Pinyin information has been arranged, and we just can carry out second layer coupling: phonetic layer coupling.
Phonetic layer matching condition: have only when the Pinyin information of the Pinyin information of synthesis text and certain prompt tone in full accordly, think that just the match is successful.Such as:
(1) prompt tone " finishes input by number sign key ", and the user can import " by the input of # bond bundle " mates;
(2) prompt tone " new password length is less than six ", the user can import " new password length is less than 6 " mates;
(5) prompt tone " please be imported the sixteen bit card number ", and the user can import " please import 16 card numbers " mates;
(6) prompt tone " trust of 1/3rd storehouses ", the user can import " trust of 1/3 storehouse " mates.
(7) prompt tone " Chinese University of Science and Technology's news fly ", the user can import " Chinese University of Science and Technology is fast to fly " mates.
If the match is successful for the phonetic layer, synthesis system will arrive and obtain its voice data path in the prompt tone index file, makes up automatically from prompt tone according to the path and extracts speech data storehouse or the prompt tone packing resource and play.
In the 4th step, make up automatically.
If character layer coupling and phonetic layer coupling are all failed, just need carry out the 3rd layer of intelligence coupling: make up coupling automatically.The so-called coupling that makes up is automatically carried out the splicing of voice by the voice elementary cell in the prompt tone sound storehouse exactly, and is guaranteed that splicing effect is better than synthetic speech, near natural-sounding.
It is as follows that prompt tone makes up implementation automatically:
Synthesis system is according to the information of text to be synthesized, according to big corpus composition algorithm, makes up automatically from prompt tone and to select the voice elementary cell the sound storehouse and carry out waveform concatenation, finally exports speech data.Attention: in order to guarantee splicing effect, requiring the voice elementary cell is word, rather than syllable.
Automatically the difference that makes up coupling and preceding two-layer coupling is: the unit of preceding two-layer coupling is a sentence, that is to say, if the match is successful, the voice data of directly getting corresponding prompt tone sentence plays back; And the unit that makes up coupling automatically is a word, after the word match success, word need be spliced into sentence, and then play.The user does not just need to record more prompt tone like this, has reduced the cost of user's recorded speech and has effectively utilized existing prompt tone resource.

Claims (7)

1. intelligent method that in speech synthesis system, promotes the prompt tone matching effect, comprise the prompt tone resource of making the synthesis system needs according to the speech data of prerecording, the making of prompt tone resource comprises sets up the prompt tone index file, index file comprises: the title of each prompt tone, speaker, the deposit position of character content and speech data, the user provides text message to be synthesized to synthesis system then, synthesis system is mated through character layer, the character content of text to be synthesized and prompt tone character content are exportable synthetic speech data after must be in full accord, it is characterized in that using synthesis system to carry out in the synthetic process of text, handle through phonetic layer coupling and the automatic intellectual analysis that makes up of prompt tone; Index file also includes: Pinyin information and prompt tone make up the speech data in sound storehouse automatically and deposit the path.
2. a kind of intelligent method that in speech synthesis system, promotes the prompt tone matching effect according to claim 1, it is characterized in that carrying out in the building-up process of text in synthesis system, at first carry out the character layer coupling, if not success of character layer coupling, then carry out phonetic layer coupling, if phonetic layer coupling be not success also, the intellectual analysis that then makes up is automatically handled.
3. a kind of intelligent method that in speech synthesis system, promotes the prompt tone matching effect according to claim 1, it is characterized in that described character layer coupling, below any one or two or three situation in three kinds of situations synthesis text and prompt tone occur also being considered as the match is successful:
(1) there is the difference of full half-angle in character content;
(2) the middle difference that has symbol of character comprises: single quotation marks, double quotation marks, middle line, backslash, punctuation marks used to enclose the title;
(3) there is the difference of symbol in the character tail, comprising: fullstop, exclamation mark, question mark, branch, comma.
4. a kind of intelligent method that in speech synthesis system, promotes the prompt tone matching effect according to claim 1, it is characterized in that described phonetic layer coupling is meant by retrieval prompt tone index file, the Pinyin information of judging text to be synthesized whether with certain prompt tone Pinyin information identical, have only identical, think that just the match is successful, if the match is successful for the phonetic layer, then, extract speech data and synthesize broadcast according to the speech data deposit position that provides in the prompt tone index file.
5. a kind of intelligent method that in speech synthesis system, promotes the prompt tone matching effect according to claim 1, it is characterized in that the making that described prompt tone makes up the sound storehouse automatically is to pass through instrument, extract the speech parameter information of prompt tone resource, and store in the binary file mode.
6. a kind of according to claim 1 or 5 intelligent method that in speech synthesis system, promotes the prompt tone matching effect, it is characterized in that if character layer, phonetic layer coupling all do not have successfully, synthesis system is according to the information of text to be synthesized, according to big corpus composition algorithm, automatically make up from prompt tone and to select the voice elementary cell the sound storehouse and carry out waveform concatenation, finally export speech data and play.
7. a kind of intelligent method that promotes the prompt tone matching effect in speech synthesis system according to claim 6 is characterized in that described voice elementary cell is a word.
CN200610096676A 2006-10-16 2006-10-16 Intelligent method for improving prompting voice matching effect in voice synthetic system Active CN1945692B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200610096676A CN1945692B (en) 2006-10-16 2006-10-16 Intelligent method for improving prompting voice matching effect in voice synthetic system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200610096676A CN1945692B (en) 2006-10-16 2006-10-16 Intelligent method for improving prompting voice matching effect in voice synthetic system

Publications (2)

Publication Number Publication Date
CN1945692A CN1945692A (en) 2007-04-11
CN1945692B true CN1945692B (en) 2010-05-12

Family

ID=38045070

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200610096676A Active CN1945692B (en) 2006-10-16 2006-10-16 Intelligent method for improving prompting voice matching effect in voice synthetic system

Country Status (1)

Country Link
CN (1) CN1945692B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101425939B (en) * 2008-12-23 2011-01-12 武汉噢易科技有限公司 Intelligent bionic speech service system and serving method
US10134385B2 (en) * 2012-03-02 2018-11-20 Apple Inc. Systems and methods for name pronunciation
CN103366732A (en) * 2012-04-06 2013-10-23 上海博泰悦臻电子设备制造有限公司 Voice broadcast method and device and vehicle-mounted system
CN105589843B (en) * 2014-10-24 2019-02-26 科大讯飞股份有限公司 A kind of text word string matching process and system
CN108091321B (en) * 2017-11-06 2021-07-16 芋头科技(杭州)有限公司 Speech synthesis method
CN109119066A (en) * 2018-09-30 2019-01-01 苏州浪潮智能软件有限公司 A kind of method of quick carry out voice broadcasting
CN113516962B (en) * 2021-04-08 2024-04-02 Oppo广东移动通信有限公司 Voice broadcasting method and device, storage medium and electronic equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1584980A (en) * 2004-06-01 2005-02-23 安徽中科大讯飞信息科技有限公司 Method for synthetic output with prompting sound and text sound in speech synthetic system
CN1787072A (en) * 2004-12-07 2006-06-14 北京捷通华声语音技术有限公司 Method for synthesizing pronunciation based on rhythm model and parameter selecting voice

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1584980A (en) * 2004-06-01 2005-02-23 安徽中科大讯飞信息科技有限公司 Method for synthetic output with prompting sound and text sound in speech synthetic system
CN1787072A (en) * 2004-12-07 2006-06-14 北京捷通华声语音技术有限公司 Method for synthesizing pronunciation based on rhythm model and parameter selecting voice

Also Published As

Publication number Publication date
CN1945692A (en) 2007-04-11

Similar Documents

Publication Publication Date Title
CN101079301B (en) Time sequence mapping method for text to audio realized by computer
CN1945692B (en) Intelligent method for improving prompting voice matching effect in voice synthetic system
US7062437B2 (en) Audio renderings for expressing non-audio nuances
CN101872615A (en) System and method for distributed text-to-speech synthesis and intelligibility
CN105869446B (en) A kind of electronic reading device and voice reading loading method
US20110224972A1 (en) Localization for Interactive Voice Response Systems
CA3011397A1 (en) Natural expression processing method, processing and response method, device and system
EP1490861A1 (en) Text structure for voice synthesis, voice synthesis method, voice synthesis apparatus, and computer program thereof
CN106486121A (en) It is applied to the voice-optimizing method and device of intelligent robot
CN110740275B (en) Nonlinear editing system
CN110211562B (en) Voice synthesis method, electronic equipment and readable storage medium
GB2444539A (en) Altering text attributes in a text-to-speech converter to change the output speech characteristics
CN103098124B (en) Method and system for text to speech conversion
CN111142667A (en) System and method for generating voice based on text mark
CN102822889A (en) Pre-saved data compression for tts concatenation cost
CN109243450A (en) A kind of audio recognition method and system of interactive mode
CN110164413A (en) Phoneme synthesizing method, device, computer equipment and storage medium
CN101123089B (en) Voice mixing method for Chinese voice code
CN1455386A (en) Imbedded voice synthesis method and system
CN110808028B (en) Embedded voice synthesis method and device, controller and medium
CN101825953A (en) Chinese character input product with combined voice input and Chinese phonetic alphabet input functions
CN101441626A (en) Multimedia retrieval system and method
CN109492126B (en) Intelligent interaction method and device
CN110600004A (en) Voice synthesis playing method and device and storage medium
CN110767233A (en) Voice conversion system and method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C56 Change in the name or address of the patentee

Owner name: ANHUI USTC IFLYTEK CO., LTD.

Free format text: FORMER NAME: ZHONGKEDA XUNFEI INFORMATION SCIENCE +. TECHNOLOGY CO., LTD., ANHUI PROV.

CP01 Change in the name or title of a patent holder

Address after: 230088 Mount Huangshan road Anhui High-tech Development Zone, Hefei City No. 616 Xunfei building

Patentee after: Anhui USTC iFLYTEK Co., Ltd.

Address before: 230088 Mount Huangshan road Anhui High-tech Development Zone, Hefei City No. 616 Xunfei building

Patentee before: Zhongkeda Xunfei Information Science &. Technology Co., Ltd., Anhui Prov.

C56 Change in the name or address of the patentee

Owner name: IFLYTEK CO., LTD.

Free format text: FORMER NAME: ANHUI USTC IFLYTEK CO., LTD.

CP03 Change of name, title or address

Address after: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 No. 666

Patentee after: Iflytek Co., Ltd.

Address before: 230088 Mount Huangshan road Anhui High-tech Development Zone, Hefei City No. 616 Xunfei building

Patentee before: Anhui USTC iFLYTEK Co., Ltd.