CN108091321B - Speech synthesis method - Google Patents

Speech synthesis method Download PDF

Info

Publication number
CN108091321B
CN108091321B CN201711080122.6A CN201711080122A CN108091321B CN 108091321 B CN108091321 B CN 108091321B CN 201711080122 A CN201711080122 A CN 201711080122A CN 108091321 B CN108091321 B CN 108091321B
Authority
CN
China
Prior art keywords
character
speaking
role
text
synthesizer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711080122.6A
Other languages
Chinese (zh)
Other versions
CN108091321A (en
Inventor
孟猛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yutou Technology Hangzhou Co Ltd
Original Assignee
Yutou Technology Hangzhou Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yutou Technology Hangzhou Co Ltd filed Critical Yutou Technology Hangzhou Co Ltd
Priority to CN201711080122.6A priority Critical patent/CN108091321B/en
Publication of CN108091321A publication Critical patent/CN108091321A/en
Application granted granted Critical
Publication of CN108091321B publication Critical patent/CN108091321B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/06Elementary speech units used in speech synthesisers; Concatenation rules

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a voice synthesis method, belonging to the technical field of voice processing; in the above method, presetting a plurality of human characters and a preset synthesizer parameter set further includes: obtaining a sentence text; analyzing each quote part and the speaking role corresponding to each quote part from the sentence text; globally regulating the speaking role according to the sentence text, matching the speaking role with a preset character role, and respectively determining the character role corresponding to the speaking role and a synthesizer parameter set according to a matching result; and performing voice synthesis on the corresponding reference part according to the synthesizer parameter set of each speaking role, thereby forming and outputting synthesized voice corresponding to the sentence text. The beneficial effects of the above technical scheme are: the characters of different characters are distinguished and the characters are reflected to the synthetic voice, so that the recognition degree of each character is improved, the synthetic voice is closer to the effect of describing the text by people, and the user experience is improved.

Description

Speech synthesis method
Technical Field
The invention relates to the technical field of voice processing, in particular to a voice synthesis method.
Background
With the continuous development of speech technology, more and more software applications begin to cover the content of speech recognition and processing, for example, a certain software application recognizes the text input by the user, and synthesizes and outputs the corresponding speech according to the recognition result.
Generally, in real life, especially in languages describing story types, the same speaker often distinguishes different characters and scenes by changing tone of voice, for example, a mom tells a story about a wolf and a lamb to a child, a relatively deep and dull voice is used for explanation when telling the wolf, and a relatively lovely sharp voice is used for interpretation when telling the lamb. For another example, in some commentary, the shaping of different characters may be done with different sound sizes, and the dialog between different characters can be easily distinguished without the need for a bystander.
However, in the conventional speech software application, the speech synthesized and output according to the large text sentence is generally played uniformly by using a smoother tone, which gives the user experience that the speech is played in a tone completely without mood fluctuation when listening to a machine, and the above playing manner easily confuses the speaking roles set in different characters in the text, and the user needs to listen to the synthesized speech carefully and distinguish the different speaking roles from the played content, so that the output synthesized speech is completely different from the effect of describing the text sentence by people in real life, and the user experience is reduced.
Disclosure of Invention
According to the above problems in the prior art, a technical solution of a speech synthesis method is provided, which aims to distinguish characters of different characters and react to synthesized speech, and improve the recognition degree of each character in the synthesized speech, so that the synthesized speech is closer to the effect of describing texts by people, thereby improving user experience.
The technical scheme specifically comprises the following steps:
a speech synthesis method is characterized in that a plurality of types of characters are preset, and synthesizer parameter sets are preset respectively for each type of characters, and the method further comprises the following steps:
step S1, obtaining a sentence text to be synthesized;
step S2, analyzing each quoted part from the sentence text and obtaining the speaking role corresponding to each quoted part;
step S3, globally regulating speaking roles according to the sentence text to identify multiple identical speaking roles in the sentence text, matching the multiple identical speaking roles with preset character roles, and respectively determining character roles corresponding to the speaking roles and a synthesizer parameter set according to matching results;
step S4, according to the synthesizer parameter set of each speaking role, the corresponding quote part is synthesized by voice, so as to form and output the synthesized voice corresponding to the sentence text;
the reference part is a statement between two quotation marks;
the sentence text globally normalizes the speaking roles so as to unify the corresponding character roles and the synthesizer parameter sets thereof;
and matching the speaking role with a preset character role to find a class of character roles corresponding to the speaking role and a synthesizer parameter set thereof.
Preferably, the speech synthesis method is characterized in that step S2 specifically includes:
step S21, decomposing the sentence text into a plurality of independent sentences;
step S22, the quote part and the speaking role corresponding to the quote part of each sentence are analyzed from each sentence.
Preferably, in step S22, the speech synthesis method is characterized in that, according to the constraints of punctuation marks, a text analysis means is used to analyze and obtain a quote part from each sentence, and a corresponding speaking role is obtained according to the analysis of the quote part.
Preferably, the speech synthesis method is characterized in that the synthesizer parameter set comprises a plurality of synthesizer parameters;
the synthesizer parameters include a formant parameter, and/or a fundamental frequency fluctuation ratio parameter, and/or a speech rate parameter.
Preferably, the speech synthesis method is characterized in that the preset characters comprise a voice-over character for representing voice-over;
in step S3, matching the part of the sentence text excluding the reference part and the speaking character with the bystander character;
in step S4, speech synthesis is performed on the sentence text excluding the reference part and the speaking character by using the synthesizer parameter set corresponding to the bystander character.
Preferably, the speech synthesis method is characterized in that each preset character role comprises a plurality of sub-characters;
in step S3, for a speaking character, a sub-character is selected from the corresponding characters according to the matching result and determined as the character corresponding to the speaking character.
Preferably, in the speech synthesis method, in step S3, the matching result is output for the user to view after the corresponding character is matched for each speaking character, and the process goes to step S4 after the user views and confirms the matching result.
Preferably, the speech synthesis method is characterized in that a role label is set in advance for each type of character;
in step S3, the output matching result is a character text formed by adding a corresponding character label to each position of the speaking character in the sentence text.
The beneficial effects of the above technical scheme are: the method for synthesizing the speech can distinguish characters of different characters and reflect the characters into the synthesized speech, improves the identification degree of each character in the synthesized speech, enables the synthesized speech to be closer to the effect of describing the text by people, and improves user experience.
Drawings
FIG. 1 is a schematic flow chart of a speech synthesis method according to a preferred embodiment of the present invention;
FIG. 2 is a flow chart illustrating the finding of a reference part and a corresponding speaking role in the preferred embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.
The invention is further described with reference to the following drawings and specific examples, which are not intended to be limiting.
In light of the above problems in the prior art, a speech synthesis method is provided, which can distinguish the speaking roles with different characters and different settings in the text while synthesizing the speech according to the text, so that the output synthesized speech is more suitable for the description language of people.
In the speech synthesis method, a plurality of types of characters are preset, and a synthesizer parameter set is preset for each type of characters. The steps as described in fig. 1 are then performed:
step S1, obtaining a sentence text to be synthesized;
step S2, analyzing each quoted part from the sentence text and obtaining the speaking role corresponding to each quoted part;
step S3, globally regulating the speaking role according to the sentence text, matching the speaking role with the preset character role, and respectively determining the character role corresponding to the speaking role and the synthesizer parameter set according to the matching result;
and step S4, performing voice synthesis on the corresponding quoted part according to the synthesizer parameter set of each speaking role, thereby forming and outputting synthesized voice corresponding to the sentence text.
Specifically, in this embodiment, before the speech synthesis method is executed, a plurality of types of characters are preset, and a synthesizer parameter set is preset for each type of characters. Specifically, the preset multiple types of characters may be characters conforming to some basic characters frequently involved in human speech in reality, such as men and women, or more detailed points may include men, women, the elderly, and children. Different synthesizer parameter sets are set for different categories of personas, respectively. Each synthesizer parameter set comprises a plurality of synthesizer parameters, and the specific voice, intonation and even speed of speech of the corresponding character can be simulated and formed by putting a certain synthesizer parameter set into a voice synthesis engine, so that the speaking effect of the corresponding character is realized in the synthesized voice.
In this embodiment, a standard sentence text to be synthesized usually includes large sections of sentences, and these sentences can be roughly classified into several categories:
1) a statement between two quotations marks may indicate a word or phrase having a special meaning or a word or phrase saying a certain talking role. Whether a word or phrase is intended to have a special meaning or a specific paragraph can be distinguished by the length of the word or word included in the reference section. Such statements are hereinafter referred to as "reference parts".
2) Words preceding a reference portion used to represent a segment of speech are typically used to represent the speaking role to which the subsequent reference portion corresponds. Such words are hereinafter referred to as "speaking roles
3) All statements except the above-mentioned reference part and the speaking role are usually expressed by some descriptive contents, such as description of the scene where the conversation occurs, description of the speaking role, and the like. Such statements are hereinafter referred to as "non-referenced parts". Also, the cited parts in the above-mentioned class 1) for expressing words or phrases having special meanings are also classified into non-cited parts.
In this embodiment, the sentence text to be recognized is first obtained, and the obtaining manner may be directly input by the user through the input device, or the text on the network may be captured by the crawler engine, or the corresponding text may be downloaded through the specified network address of the user, which is not described herein again.
In this embodiment, after the sentence text to be recognized is obtained, the sentence text needs to be divided first to decompose the whole text into a plurality of independent sentences, which is convenient for subsequent analysis and processing. The processing of the clause may be performed by a processor.
In this embodiment, after sentence segmentation, the whole text forms a plurality of independent sentences, and then the sentences are analyzed to obtain the reference parts in the sentences and obtain the speaking roles corresponding to each reference part. The analysis process after the sentence division can be executed by the processor.
In this embodiment, since one speaking role may appear in the sentence text for many times, after all speaking roles in the whole text of the sentence text are identified, the whole text of the speaking role is regulated, so as to arrange a plurality of identified identical speaking roles for uniform processing. For example, if multiple "woman says" appear in a sentence text to be recognized, the multiple "woman says" will be processed uniformly after final regularization, that is, the multiple "woman says" will correspond to a type of personas and their synthesizer parameter sets uniformly.
In this embodiment, after the full text is normalized, the preset personas are respectively matched for each speaking persona, so as to find a kind of personas corresponding to the speaking persona and the synthesizer parameter set thereof. And then, aiming at each speaking role, putting the corresponding synthesizer parameter set into a speech synthesis engine to carry out speech synthesis processing on the reference part behind the speaking role, thereby forming and outputting the synthesized speech aiming at the whole sentence text. The above-described processing may be performed by a speech synthesis engine or speech synthesizer in the processor.
In a preferred embodiment of the present invention, as shown in fig. 2, the step S2 specifically includes:
step S21, decomposing the sentence text into a plurality of independent sentences;
step S22, the quote part and the speaking role corresponding to the quote part of each sentence are analyzed from each sentence.
In this embodiment, the sentence text is first decomposed into a plurality of independent sentences according to punctuation marks in the sentence text. Specifically, the sentence text may be decomposed into a plurality of sentences according to punctuation marks such as periods, commas, exclamation marks, question marks, and semicolons, and the contents between two quotations are divided into the same sentence for the sake of completeness of the quotation part.
Subsequently, in the above step S22, the reference part is analyzed from each sentence by the text analysis means according to the constraint of punctuation, and the corresponding speaking character is analyzed according to the reference part.
Specifically, the manner of analyzing the reference part and the corresponding speaking role can be realized by adopting text analysis means such as syntactic analysis and part-of-speech tagging. Specifically, the method comprises the following steps:
for the reference part, the judgment may be made according to one or several of the following judgment rules:
1) a statement between two adjacent quotation marks;
2) the length of the sentence between the adjacent quotation marks is more than a preset value, for example, more than 4 words, or the sentence between the adjacent quotation marks is a certain symbol, for example, a period, a question mark, an exclamation mark or other punctuation marks;
3) commas or colons exist before quotations corresponding to sentences between adjacent quotation marks.
For the speaking role, after the reference part is confirmed, the sentence or word corresponding to the reference part can be used as the speaking role corresponding to the reference part. For example,
according to the organization of the sentences, if a colon exists before the quoted part, the words before the colon are used as the speaking roles of the quoted part, for example: a, saying: "XXXX. ", then A is taken as" XXXX. "the talking role of this reference part.
Or a comma is connected after the quotation part, a word is connected after the comma, and a sentence pattern of the quotation part is connected after the word, so that the word in the middle of the two quotation parts is used as the speaking role of the two quotation parts, for example: "XXXX. ", A says" XXX ". ", then A is taken as" XXXX. "and" XXX. "the talking roles of these two referenced parts.
Or a comma is connected after the quote part, a word is connected after the comma, and a period is connected after the word, then the word is used as the speaking role of the precedent quote part, for example: "XXXX. ", A says so. Then a serves as the previous reference portion "XXXX. "the speaking role.
Moreover, the speaking character can also locate its position in the sentence text by "speaking", or "saying," or other similar verbs used to represent speaking.
The analysis method for analyzing the reference part and the corresponding speaking role can also be realized by adopting other means of syntactic analysis and part-of-speech tagging, which are only some typical analysis methods and do not limit the protection scope of the analysis method.
In a preferred embodiment of the present invention, the synthesizer parameter set includes a plurality of synthesizer parameters;
the synthesizer parameters include a formant parameter, and/or a fundamental frequency fluctuation ratio parameter, and/or a speech rate parameter.
Specifically, in this embodiment, each synthesizer parameter mentioned above may affect a certain aspect of the human voice. For example:
the fundamental frequency parameter is determined by the conditions of the length, width, tightness and the like of the vocal cords of the speaker, and the longer, thicker and looser the vocal cords are, the fewer the times of vocal cord vibration are, and the lower the sound is; the shorter, thinner, and stronger the vocal cords are, the more the vocal cords vibrate, the sharper the sound is made. Therefore, the fundamental frequency parameters of female voice and child voice are higher, and the fundamental frequency parameters of male voice are lower.
The fundamental frequency fluctuation ratio parameter can be used for adjusting the change amplitude of the fundamental frequency, the larger the change amplitude of the fundamental frequency is, the more obvious the feeling of suppressing the yangtong frustration is, and the smaller the change amplitude of the fundamental frequency is, the emitted sound is similar to the robot voice without tone change. The fundamental frequency fluctuation ratio parameter can thus be used to adjust and simulate the mood fluctuations of the utterance.
The tone color is related to the fundamental frequency parameters, and also has close relation with the size, shape and applicable parts of the resonance cavity, after the sound is emitted, the sound is resonated through each sounding resonance cavity (pharyngeal cavity, oral cavity, nasal cavity and the like), wherein the resonance cavities such as the thoracic cavity, the nasal cavity and the like belong to non-adjustable fixed resonance cavities, and the resonance cavities such as the laryngeal cavity, the pharyngeal cavity, the oral cavity and the like belong to adjustable variable resonance cavities. The parameter of the resonance peak can be used as a main characteristic parameter of the resonance cavity change.
In this embodiment, different timbres can be obtained by adjusting a series of parameters, such as the fundamental frequency parameter, the fundamental frequency fluctuation ratio parameter, the formant parameter, and the speech rate parameter. And combining the synthesizer parameter sets in advance and adjusting in advance to obtain preset timbres respectively aiming at each type of personas, wherein the preset timbres correspond to the synthesizer parameter sets.
In a preferred embodiment of the present invention, the preset characters include a voice character for representing voice;
in the step S3, the part of the sentence text excluding the reference part and the speaking character is matched with the bystander character;
in step S4, the speech synthesis is performed on the sentence text excluding the reference part and the speaking character by using the synthesizer parameter set corresponding to the bystander character.
Specifically, in this embodiment, a bystander role is set in the preset personas, and a synthesizer parameter set corresponding to the bystander role is set, the synthesizer parameters in the synthesizer parameter set may all be default parameters, and the final integrated tone may be a sound without mood fluctuation similar to machine sound production, or a sound with a tone different from that of other personas, for example, a sound produced by sampling the tone of a newscaster or a famous host.
In this embodiment, for the non-cited part, the synthesizer parameter set corresponding to the voice-over character is uniformly adopted to perform the speech synthesis process, and finally a sound that can be defined as "voice-over" is formed to distinguish the sound corresponding to the speaking character.
In a preferred embodiment of the present invention, each preset character comprises a plurality of sub-characters;
then, in step S3, for a speaking character, a sub-character is selected from the corresponding group of characters according to the matching result and determined as the character corresponding to the speaking character.
Specifically, in this embodiment, each character role includes a plurality of sub-characters, for example, three sub-characters, namely, "man 1", "man 2", and "man 3", are included in the character roles of "man", and the synthesizer parameter sets corresponding to each sub-character are different, that is, the sound color presented by each sub-character is completely different, for example, the sub-character of "man 1" presents the sound of a young male, the sub-character of "man 2" presents the sound of a more mature and steady adult male, and the sub-character of "man 3" presents the sound of a slightly old middle-aged male. If a speaking role is matched with the human roles such as 'man', one of the sub-roles can be selected as the human role corresponding to the speaking role, and the voice synthesis is performed on the reference part corresponding to the speaking role according to the corresponding synthesizer parameter set.
Further, in a preferred embodiment of the present invention, the manner of selecting the sub-role may include:
1) and randomly selecting, namely randomly selecting one of the sub-roles after matching with one type of character roles and determining the character role corresponding to the speaking role.
2) Two sub-roles with larger tone difference are allocated to two adjacent speaking roles which belong to the same character role. For example, two adjacent speaking characters belonging to the character characters of 'man', are assigned to the sub-character of the former speaking character 'man 1', and are assigned to the sub-character of the latter speaking character 'man 2', so that the user can distinguish the two speaking characters.
3) If the words representing the speaking character have special orientation, for example, the words representing the speaking character are 'young men', the child character of 'man 1' is directly assigned to the speaking character. That is, a plurality of words with special directions are preset for each sub-role, and if the word representing the speaking role is matched with the preset word under the sub-role, the sub-role is directly assigned to the speaking role.
In a preferred embodiment of the present invention, a default set of personas can be set for assigning the indistinguishable categories of speaking personas. Such as a speaking character represented directly by a person's name. Of course, multiple sub-personas are also included within the class of default personas. If a sentence text to be identified has a plurality of speaking roles which can not be classified respectively, different sub-roles are respectively allocated to prevent confusion among different speaking roles. Meanwhile, the synthesizer parameter set corresponding to the default persona needs to be different from the synthesizer parameter set corresponding to the bystander persona to avoid confusion.
In a preferred embodiment of the present invention, in the step S3, the matching result is output for the user to view after the corresponding human character is matched for each speaking character, and the process goes to the step S4 after the user views and confirms the matching result.
Specifically, in the present embodiment, in order to ensure that the output synthesized speech is compatible with the description language of people in real life, before the synthesized speech is finally output, the matching result of the human character needs to be output to the user for viewing so as to obtain the confirmation of the user.
Further, in a preferred embodiment of the present invention, a role label is set in advance for each type of character;
in step S3, the matching result is output as a character text formed by adding a corresponding character label to each speaking character position in the sentence text.
Specifically, when setting each type of character and even each child character, it is necessary to set corresponding character labels such as "man", "man 1", "man 2", and "man 3" described above, respectively, or give more visualized labels such as "man", "young man", "mature man", and "middle man", and the like. When the matching result is output, the role label corresponding to the assigned character role can be added to the corresponding position on the sentence text on the basis of the original sentence text, so that a role text is formed and output to the user for viewing. And outputting the final synthesized voice after the user confirms the character text. The user can also modify the assigned character role by himself so as to modify and output a new synthesized voice, that is, the user can perform manual intervention on the synthesized voice so as to achieve a better output effect.
In summary, the technical solution of the present invention provides a speech synthesis method, which can distinguish each quote part and its speaking role in a sentence text, and allocate synthesizer parameter sets for representing different timbres to distinguish the speech difference between the bystander part and the role part, and the speech difference between the role and the role, so that the output synthesized speech is closer to the speaking habit of people describing the language in real life, and the synthesis effect is significantly improved compared with the traditional speech synthesis.
While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention.

Claims (8)

1. A speech synthesis method is characterized in that a plurality of types of personas are preset and synthesizer parameter sets are preset for each type of persona respectively, and the method further comprises the following steps:
step S1, the obtaining unit obtains the sentence text to be synthesized;
step S2, analyzing each quote part from the sentence text, and the speaking role corresponding to each quote part;
step S3, globally regulating the speaking role according to the sentence text to identify a plurality of identical speaking roles in the sentence text, matching the plurality of identical speaking roles with the preset character role, and respectively determining the character role corresponding to the speaking role and the synthesizer parameter set according to the matching result;
step S4, according to the synthesizer parameter set of each speaking role, carrying out voice synthesis on the corresponding quoted part, thereby forming and outputting the synthesized voice corresponding to the sentence text;
the reference part is a statement between two quotation marks;
the sentence text globally normalizes the speaking roles so as to uniformly correspond to one type of the character roles and a synthesizer parameter set thereof;
and matching the speaking role with the preset character role to find the character role of the same type corresponding to the speaking role and a synthesizer parameter set thereof.
2. The speech synthesis method according to claim 1, wherein the step S2 specifically includes:
step S21, decomposing the sentence text into a plurality of independent sentences;
step S22, analyzing the reference part and the speaking role corresponding to each reference part from each sentence respectively.
3. The speech synthesis method according to claim 2, wherein in step S22, the quote part is analyzed from each sentence by using a text analysis means according to the constraint of punctuation, and the corresponding speaking role is analyzed according to the quote part.
4. The speech synthesis method of claim 1, wherein the synthesizer parameter set includes a plurality of synthesizer parameters;
the synthesizer parameters comprise a formant parameter, and/or a fundamental frequency fluctuation ratio parameter, and/or a speech speed parameter.
5. The speech synthesis method of claim 1, wherein the predetermined plurality of human characters includes a voice character for representing voice;
in step S3, matching the part of the sentence text excluding the reference part and the speaking character with the voice character;
in step S4, the synthesizer parameter set corresponding to the voice-over character is used to perform speech synthesis on the sentence text excluding the reference part and the part of the speaking character.
6. The speech synthesis method of claim 1, wherein each preset type of the human character comprises a plurality of sub-characters;
in step S3, for one of the speaking characters, one of the child characters is selected from the corresponding class of characters according to the matching result, and the child character is determined as the character corresponding to the speaking character.
7. The speech synthesis method according to claim 1, wherein in step S3, the matching result is output for a user to view after the corresponding character role is matched for each of the speaking roles, and the process goes to step S4 after the user views and confirms the matching result.
8. The speech synthesis method according to claim 7, wherein a character tag is set in advance for each type of the human character;
in step S3, the output matching result is a character text formed by adding the corresponding character label to the position of each speaking character in the sentence text.
CN201711080122.6A 2017-11-06 2017-11-06 Speech synthesis method Active CN108091321B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711080122.6A CN108091321B (en) 2017-11-06 2017-11-06 Speech synthesis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711080122.6A CN108091321B (en) 2017-11-06 2017-11-06 Speech synthesis method

Publications (2)

Publication Number Publication Date
CN108091321A CN108091321A (en) 2018-05-29
CN108091321B true CN108091321B (en) 2021-07-16

Family

ID=62170675

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711080122.6A Active CN108091321B (en) 2017-11-06 2017-11-06 Speech synthesis method

Country Status (1)

Country Link
CN (1) CN108091321B (en)

Families Citing this family (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109036375B (en) 2018-07-25 2023-03-24 腾讯科技(深圳)有限公司 Speech synthesis method, model training device and computer equipment
CN109036372B (en) * 2018-08-24 2021-10-08 科大讯飞股份有限公司 Voice broadcasting method, device and system
CN109273001B (en) * 2018-10-25 2021-06-18 珠海格力电器股份有限公司 Voice broadcasting method and device, computing device and storage medium
CN109523988B (en) * 2018-11-26 2021-11-05 安徽淘云科技股份有限公司 Text deduction method and device
CN109658916B (en) * 2018-12-19 2021-03-09 腾讯科技(深圳)有限公司 Speech synthesis method, speech synthesis device, storage medium and computer equipment
CN109523986B (en) * 2018-12-20 2022-03-08 百度在线网络技术(北京)有限公司 Speech synthesis method, apparatus, device and storage medium
CN110364139B (en) * 2019-06-27 2023-04-18 上海麦克风文化传媒有限公司 Character-to-speech working method for intelligent role matching
CN110399461A (en) * 2019-07-19 2019-11-01 腾讯科技(深圳)有限公司 Data processing method, device, server and storage medium
CN110337030B (en) * 2019-08-08 2020-08-11 腾讯科技(深圳)有限公司 Video playing method, device, terminal and computer readable storage medium
CN110534131A (en) * 2019-08-30 2019-12-03 广州华多网络科技有限公司 A kind of audio frequency playing method and system
CN112837672B (en) * 2019-11-01 2023-05-09 北京字节跳动网络技术有限公司 Method and device for determining conversation attribution, electronic equipment and storage medium
CN112908292B (en) * 2019-11-19 2023-04-07 北京字节跳动网络技术有限公司 Text voice synthesis method and device, electronic equipment and storage medium
CN111158630B (en) * 2019-12-25 2023-06-23 网易(杭州)网络有限公司 Playing control method and device
CN111415650A (en) * 2020-03-25 2020-07-14 广州酷狗计算机科技有限公司 Text-to-speech method, device, equipment and storage medium
CN113539230A (en) * 2020-03-31 2021-10-22 北京奔影网络科技有限公司 Speech synthesis method and device
CN111696518A (en) * 2020-06-05 2020-09-22 四川纵横六合科技股份有限公司 Automatic speech synthesis method based on text
CN112203153B (en) * 2020-09-21 2021-10-08 腾讯科技(深圳)有限公司 Live broadcast interaction method, device, equipment and readable storage medium
CN112270167B (en) 2020-10-14 2022-02-08 北京百度网讯科技有限公司 Role labeling method and device, electronic equipment and storage medium
CN112992147A (en) * 2021-02-26 2021-06-18 平安科技(深圳)有限公司 Voice processing method, device, computer equipment and storage medium
CN112966490A (en) * 2021-03-15 2021-06-15 掌阅科技股份有限公司 Electronic book-based dialog character recognition method, electronic device and storage medium
CN113409766A (en) * 2021-05-31 2021-09-17 北京搜狗科技发展有限公司 Recognition method, device for recognition and voice synthesis method
CN113539234B (en) * 2021-07-13 2024-02-13 标贝(青岛)科技有限公司 Speech synthesis method, device, system and storage medium
CN113539235B (en) * 2021-07-13 2024-02-13 标贝(青岛)科技有限公司 Text analysis and speech synthesis method, device, system and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN201336138Y (en) * 2008-12-19 2009-10-28 众智瑞德科技(北京)有限公司 Text reading device
US20110166861A1 (en) * 2010-01-04 2011-07-07 Kabushiki Kaisha Toshiba Method and apparatus for synthesizing a speech with information
US20110320198A1 (en) * 2010-06-28 2011-12-29 Threewits Randall Lee Interactive environment for performing arts scripts
CN104508629A (en) * 2012-07-25 2015-04-08 托伊托克有限公司 Artificial intelligence script tool
CN104809923A (en) * 2015-05-13 2015-07-29 苏州清睿信息技术有限公司 Self-complied and self-guided method and system for generating intelligent voice communication

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0335296A (en) * 1989-06-30 1991-02-15 Sharp Corp Text voice synthesizing device
JPH11109991A (en) * 1997-10-08 1999-04-23 Mitsubishi Electric Corp Man machine interface system
CN1945692B (en) * 2006-10-16 2010-05-12 安徽中科大讯飞信息科技有限公司 Intelligent method for improving prompting voice matching effect in voice synthetic system
CN102486922B (en) * 2010-12-03 2014-12-03 株式会社理光 Speaker recognition method, device and system
CN102237089B (en) * 2011-08-15 2012-11-14 哈尔滨工业大学 Method for reducing error identification rate of text irrelevant speaker identification system
CN105340003B (en) * 2013-06-20 2019-04-05 株式会社东芝 Speech synthesis dictionary creating apparatus and speech synthesis dictionary creating method
CN104485100B (en) * 2014-12-18 2018-06-15 天津讯飞信息科技有限公司 Phonetic synthesis speaker adaptive approach and system
CN106326303B (en) * 2015-06-30 2019-09-13 芋头科技(杭州)有限公司 A kind of spoken semantic analysis system and method
CN106571136A (en) * 2016-10-28 2017-04-19 努比亚技术有限公司 Voice output device and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN201336138Y (en) * 2008-12-19 2009-10-28 众智瑞德科技(北京)有限公司 Text reading device
US20110166861A1 (en) * 2010-01-04 2011-07-07 Kabushiki Kaisha Toshiba Method and apparatus for synthesizing a speech with information
US20110320198A1 (en) * 2010-06-28 2011-12-29 Threewits Randall Lee Interactive environment for performing arts scripts
CN104508629A (en) * 2012-07-25 2015-04-08 托伊托克有限公司 Artificial intelligence script tool
CN104809923A (en) * 2015-05-13 2015-07-29 苏州清睿信息技术有限公司 Self-complied and self-guided method and system for generating intelligent voice communication

Also Published As

Publication number Publication date
CN108091321A (en) 2018-05-29

Similar Documents

Publication Publication Date Title
CN108091321B (en) Speech synthesis method
CN110688911B (en) Video processing method, device, system, terminal equipment and storage medium
CN111048062B (en) Speech synthesis method and apparatus
CN108492817B (en) Song data processing method based on virtual idol and singing interaction system
CN110211563B (en) Chinese speech synthesis method, device and storage medium for scenes and emotion
CN112771607B (en) Electronic apparatus and control method thereof
US20200279553A1 (en) Linguistic style matching agent
Pierre-Yves The production and recognition of emotions in speech: features and algorithms
CN112650831A (en) Virtual image generation method and device, storage medium and electronic equipment
CN112334973B (en) Method and system for creating object-based audio content
JP4745036B2 (en) Speech translation apparatus and speech translation method
CN106653052A (en) Virtual human face animation generation method and device
CN109801349B (en) Sound-driven three-dimensional animation character real-time expression generation method and system
CN111260761B (en) Method and device for generating mouth shape of animation character
CN109949791A (en) Emotional speech synthesizing method, device and storage medium based on HMM
Mori et al. Conversational and Social Laughter Synthesis with WaveNet.
CN110556092A (en) Speech synthesis method and device, storage medium and electronic device
CN117558259A (en) Digital man broadcasting style control method and device
Hill et al. Real-time articulatory speech-synthesis-by-rules
CN111402919B (en) Method for identifying style of playing cavity based on multi-scale and multi-view
JP6222465B2 (en) Animation generating apparatus, animation generating method and program
CN116631434A (en) Video and voice synchronization method and device based on conversion system and electronic equipment
CN114446268B (en) Audio data processing method, device, electronic equipment, medium and program product
WO2023059818A1 (en) Acoustic-based linguistically-driven automated text formatting
JP2020134719A (en) Translation device, translation method, and translation program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant