CN114373444B - Method, system and equipment for synthesizing voice based on montage - Google Patents
Method, system and equipment for synthesizing voice based on montage Download PDFInfo
- Publication number
- CN114373444B CN114373444B CN202210285222.7A CN202210285222A CN114373444B CN 114373444 B CN114373444 B CN 114373444B CN 202210285222 A CN202210285222 A CN 202210285222A CN 114373444 B CN114373444 B CN 114373444B
- Authority
- CN
- China
- Prior art keywords
- paragraphs
- text
- processed
- tone
- proportion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/137—Hierarchical processing, e.g. outlines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/033—Voice editing, e.g. manipulating the voice of the synthesiser
- G10L13/0335—Pitch control
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
Abstract
The application discloses a method, a system and equipment for synthesizing voice based on montage, wherein the method comprises the following steps: after paragraph segmentation preprocessing is carried out on existing natural paragraphs of a text to be processed, the text to be processed is divided into a plurality of actual paragraphs based on a scene type and an emotion level type; calculating the relevance of the scenes and the emotion levels of adjacent paragraphs in a plurality of actual paragraphs; after the intonation parameters of the text to be processed are set, the intonation change proportion and the intonation change direction of the text to be processed are calculated according to the correlation; and performing paragraph speech synthesis on the text to be processed according to the tone change proportion and the tone change direction. The technical problem that the speech synthesis sounds very hard in the prior art is solved.
Description
Technical Field
The present application relates to the field of speech synthesis technologies, and in particular, to a method, a system, and an apparatus for synthesizing speech based on montage.
Background
Montage generally refers to scene conversion in a movie, and materials are selected and rejected through splitting and assembling of a lens, a scene and a paragraph, so that the expressed content is mainly and clearly identified, and high summarization and concentration are achieved.
Disclosure of Invention
The application provides a method, a system and equipment for synthesizing voice based on montage, which are used for solving the technical problem that the voice synthesis in the prior art sounds very hard.
In view of the above, the present application provides, in a first aspect, a montage-based speech synthesis method, including:
after paragraph segmentation preprocessing is carried out on existing natural paragraphs of a text to be processed, the text to be processed is divided into a plurality of actual paragraphs based on a scene type and an emotion level type;
calculating the relevance of scenes and emotion hierarchies of adjacent paragraphs in a plurality of actual paragraphs;
after the intonation parameters of the text to be processed are set, the intonation change proportion and the intonation change direction of the text to be processed are calculated according to the correlation;
and performing paragraph speech synthesis on the text to be processed according to the tone change proportion and the tone change direction.
Optionally, the paragraph segmentation preprocessing is performed on the existing natural paragraphs of the text to be processed, and specifically includes: and performing paragraph division processing on the existing natural paragraphs of the text to be processed through the line feed key.
Optionally, the dividing the text to be processed into a plurality of actual paragraphs based on the scene type and the emotion level type specifically includes:
different paragraphs with the same scene type and the same emotion level type are combined into the same paragraph, and sub-paragraphs with different scene types and different emotion level types in the same paragraph are correspondingly divided into a plurality of paragraphs.
Optionally, the calculating the relevance between the scene and the emotion level of adjacent paragraphs in the actual paragraphs specifically includes:
and carrying out relevance training after marking scenes and emotion levels on the text to be processed manually to obtain a relevance calculation model, and calculating the relevance of the scenes and the emotion levels of adjacent paragraphs in a plurality of actual paragraphs based on the relevance calculation model.
Optionally, after the intonation parameters of the text to be processed are set, the intonation change proportion and the intonation change direction of the text to be processed are calculated according to the correlation, and the method specifically includes:
setting the proportion range of the total tone change of the text to be processed and the upper and lower limits of the reference tone and the initial tone, calculating the tone change proportion of adjacent paragraphs, calculating the proportion of the total tone change and the correlation and the rising and falling of the tone of the adjacent paragraphs, and using the proportion and the rising and falling as the tone change direction, thereby obtaining the tone change proportion and the tone change direction of the text to be processed.
A second aspect of the application provides a montage-based speech synthesis system, the system comprising:
the dividing unit is used for dividing the text to be processed into a plurality of actual paragraphs based on the scene type and the emotion level type after paragraph segmentation preprocessing is carried out on the existing natural paragraphs of the text to be processed;
the first calculation unit is used for calculating the relevance of scenes and emotion hierarchies of adjacent paragraphs in a plurality of actual paragraphs;
the second calculation unit is used for calculating the tone change proportion and the tone change direction of the text to be processed according to the correlation after setting tone parameters of the text to be processed;
and the synthesis unit is used for carrying out paragraph speech synthesis on the text to be processed according to the tone change proportion and the tone change direction.
Optionally, the dividing unit is specifically configured to:
performing paragraph division processing on existing natural paragraphs of a text to be processed through a line-feed key;
different paragraphs with the same scene type and the same emotion level type are combined into the same paragraph, and sub-paragraphs with different scene types and different emotion level types in the same paragraph are correspondingly divided into a plurality of paragraphs.
Optionally, the first computing unit is specifically configured to:
and carrying out relevance training after marking scenes and emotion levels on the text to be processed manually to obtain a relevance calculation model, and calculating the relevance of the scenes and the emotion levels of adjacent paragraphs in a plurality of actual paragraphs based on the relevance calculation model.
Optionally, the second computing unit is specifically configured to:
setting the proportion range of the total tone change of the text to be processed, the upper limit and the lower limit of the reference tone and the initial tone, calculating the tone change proportion of adjacent paragraphs, calculating the proportion of the total tone change and the correlation and the rising and falling of the tones of adjacent paragraphs, and using the proportion and the rising and falling as the tone change direction, thereby obtaining the tone change proportion and the tone change direction of the text to be processed.
A third aspect of the application provides a montage-based speech synthesis apparatus, the apparatus comprising a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is adapted to perform the steps of the montage-based speech synthesis method according to the first aspect as described above, according to instructions in the program code.
According to the technical scheme, the method has the following advantages:
the application provides a method for synthesizing voice based on montage, which comprises the following steps: after paragraph segmentation preprocessing is carried out on existing natural paragraphs of a text to be processed, the text to be processed is divided into a plurality of actual paragraphs based on a scene type and an emotion level type; calculating the relevance of scenes and emotion hierarchies of adjacent paragraphs in a plurality of actual paragraphs; after the intonation parameters of the text to be processed are set, the intonation change proportion and the intonation change direction of the text to be processed are calculated according to the correlation; and performing paragraph speech synthesis on the text to be processed according to the tone change proportion and the tone change direction. Compared with the prior art, the method comprises the steps of firstly dividing a text to be processed according to a scene and emotion levels to obtain paragraphs according with an actual scene and emotion, then calculating the correlation between adjacent paragraphs, determining the parameters such as the starting tone of the paragraphs and the reference tone of the actual paragraphs based on the correlation, and thus obtaining the tone change proportion and the tone change direction of the text to be processed, and finally carrying out paragraph speech synthesis according to the determined tone change proportion and the tone change direction, so that the speech synthesis is more vivid and accords with the auditory habits of people. Therefore, the technical problem that the speech synthesis sounds very hard in the prior art is solved.
Drawings
Fig. 1 is a schematic flowchart of an embodiment of a montage-based speech synthesis method provided in an embodiment of the present application;
fig. 2 is a schematic structural diagram of an embodiment of a montage-based speech synthesis system provided in the embodiment of the present application.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, an embodiment of a method for synthesizing a montage-based speech according to the present application includes:
it should be noted that, in this embodiment, first, a line feed key is used to perform paragraph division processing on existing natural paragraphs of a text to be processed, then different paragraphs with the same scene type and the same emotion level type are merged into the same paragraph, and sub-paragraphs with different scene types and different emotion level types in the same paragraph are correspondingly divided into a plurality of paragraphs. It is understood that, for example: 1) although the text to be processed is two paragraphs, the two paragraphs are combined into one paragraph if the text is in the same scene and the same layer; 2) although the text is a paragraph, the text refers to a plurality of scenes and a plurality of emotion hierarchies, but the text is divided into different paragraphs according to the types of the scenes and the types of the emotion hierarchies.
102, calculating the relevance of scenes and emotion hierarchies of adjacent paragraphs in a plurality of actual paragraphs;
it should be noted that in the embodiment, a correlation calculation model is obtained by performing correlation training after labeling scenes and emotion levels of a text to be processed manually, and the correlation between the scenes and the emotion levels of adjacent paragraphs in a plurality of actual paragraphs is calculated based on the correlation calculation model. It can be understood that a large amount of correlation training needs to be performed through manual labeling, for example, the a and B segment scene correlation K =50%, and the correlation range is: 0 to 100 percent.
103, after the intonation parameters of the text to be processed are set, calculating the intonation change proportion and the intonation change direction of the text to be processed according to the correlation;
it should be noted that, in this embodiment, a ratio range of a total tone change of the text to be processed, an upper limit and a lower limit of a reference tone and a tone starting limit are set, a tone change ratio of adjacent paragraphs is calculated, and a ratio of the total tone change to the correlation and a rise and a fall of a tone of the adjacent paragraphs are calculated and used as a tone change direction, so that the tone change ratio and the tone change direction of the text to be processed are obtained.
The method comprises the following specific steps:
1) determining a total pitch change ratio range {0% -H% } (typically 0-50%); determining an upper limit JDH and a lower limit JDL of a reference intonation value JD and an upper limit QDH and a lower limit QDL of a starting intonation QD;
2) determining a ratio of total pitch change to correlation R = H/100;
3) obtaining the scene correlation Kn of each paragraph relative to the upper part intonation by artificial intelligence;
4) determining the tone change proportion Vn = R × Kn of the current segment and the previous segment;
5) determining the rise and fall of the speech pitch of the section relative to the previous section:
a. after the intonation is raised or lowered, if JD belongs to [ JDH, JDL ] and QD belongs to [ QDH, QDL ], the raising or lowering of the intonation is randomly determined.
b. If JD does not belong to [ JDH _, JDL ] or QD does not belong to [ QDH, QDL ] after intonation lifting, the change direction of intonation is changed (if lifting is originally wanted, lifting is needed after exceeding the range).
And step 104, performing paragraph speech synthesis on the text to be processed according to the tone change proportion and the tone change direction.
Finally, according to the tone variation ratio and the tone variation direction determined in step 103, the paragraph speech synthesis is performed.
The embodiment of the method for synthesizing the voice based on the montage includes the steps of firstly dividing a text to be processed according to a scene and an emotion level to obtain paragraphs conforming to an actual scene and emotion, then calculating correlation of adjacent paragraphs, determining and then determining parameters such as starting tone of the paragraphs and reference tone of the actual paragraphs based on the correlation to obtain tone change proportion and tone change direction of the text to be processed, and finally performing paragraph voice synthesis according to the determined tone change proportion and tone change direction to enable voice synthesis to be more vivid and conform to auditory habits of people. Therefore, the technical problem that the speech synthesis sounds very hard in the prior art is solved.
The foregoing is an embodiment of a method for synthesizing a voice based on a montage provided in the embodiment of the present application, and the following is an embodiment of a system for synthesizing a voice based on a montage provided in the embodiment of the present application.
Referring to fig. 2, an embodiment of a montage-based speech synthesis system provided in an embodiment of the present application includes:
the dividing unit 201 is configured to divide a text to be processed into a plurality of actual paragraphs based on a scene type and an emotion level type after performing paragraph segmentation preprocessing on existing natural paragraphs of the text to be processed;
a first calculating unit 202, configured to calculate correlations between scenes and emotion hierarchies of adjacent paragraphs in a plurality of actual paragraphs;
the second calculating unit 203 is configured to calculate a tone change ratio and a tone change direction of the text to be processed according to the correlation after setting the tone parameter of the text to be processed;
and the synthesis unit 204 is configured to perform paragraph speech synthesis on the text to be processed according to the tone variation ratio and the tone variation direction.
In this embodiment, a montage-based speech synthesis system divides a text to be processed according to a scene and an emotion level to obtain paragraphs conforming to an actual scene and emotion, calculates correlation between adjacent paragraphs, determines and then determines parameters such as a start tone of a paragraph and a reference tone of the actual paragraph based on the correlation to obtain a tone variation ratio and a tone variation direction of the text to be processed, and finally performs speech synthesis on the paragraphs according to the determined tone variation ratio and tone variation direction to make the speech synthesis more vivid and conform to auditory habits of a person. Therefore, the technical problem that the speech synthesis sounds very hard in the prior art is solved.
Further, an embodiment of the present application further provides a montage-based speech synthesis device, where the device includes a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is used for executing the montage-based speech synthesis method of the above method embodiment according to the instructions in the program code
It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the system, the unit and the device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The terms "first," "second," "third," "fourth," and the like in the description of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that in the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" for describing an association relationship of associated objects, indicating that there may be three relationships, e.g., "a and/or B" may indicate: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one type of logical functional division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a portable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present application.
Claims (10)
1. A method for synthesizing a voice based on a montage is characterized by comprising the following steps:
after paragraph segmentation preprocessing is carried out on existing natural paragraphs of a text to be processed, the text to be processed is divided into a plurality of actual paragraphs based on a scene type and an emotion level type;
calculating the relevance of scenes and emotion hierarchies of adjacent paragraphs in a plurality of actual paragraphs;
after the intonation parameters of the text to be processed are set, the intonation change proportion and the intonation change direction of the text to be processed are calculated according to the correlation;
and performing paragraph speech synthesis on the text to be processed according to the tone change proportion and the tone change direction.
2. The montage-based speech synthesis method according to claim 1, wherein the pre-processing of paragraph segmentation is performed on existing natural paragraphs of the text to be processed, and specifically comprises: and performing paragraph division processing on the existing natural paragraphs of the text to be processed through the line feed key.
3. The montage-based speech synthesis method according to claim 1, wherein the dividing of the text to be processed into a plurality of actual paragraphs based on the scene type and the emotion level type specifically comprises:
different paragraphs with the same scene type and the same emotion level type are combined into the same paragraph, and sub-paragraphs with different scene types and different emotion level types in the same paragraph are correspondingly divided into a plurality of paragraphs.
4. The montage-based speech synthesis method according to claim 1, wherein the calculating of the correlation between the scene and the emotion level of adjacent paragraphs in the plurality of actual paragraphs specifically comprises:
and carrying out relevance training after marking scenes and emotion levels on the text to be processed manually to obtain a relevance calculation model, and calculating the relevance of the scenes and the emotion levels of adjacent paragraphs in a plurality of actual paragraphs based on the relevance calculation model.
5. The montage-based speech synthesis method according to claim 1, wherein after the intonation parameters of the text to be processed are set, the intonation change proportion and the intonation change direction of the text to be processed are calculated according to the correlation, and the method specifically comprises the following steps:
setting the proportion range of the total tone change of the text to be processed and the upper and lower limits of the reference tone and the initial tone, calculating the tone change proportion of adjacent paragraphs, calculating the proportion of the total tone change and the correlation and the rising and falling of the tone of the adjacent paragraphs, and using the proportion and the rising and falling as the tone change direction, thereby obtaining the tone change proportion and the tone change direction of the text to be processed.
6. A montage-based speech synthesis system, comprising:
the dividing unit is used for dividing the text to be processed into a plurality of actual paragraphs based on the scene type and the emotion level type after paragraph segmentation preprocessing is carried out on the existing natural paragraphs of the text to be processed;
the first calculation unit is used for calculating the relevance of the scenes and the emotion hierarchies of adjacent paragraphs in the actual paragraphs;
the second calculation unit is used for calculating the tone change proportion and the tone change direction of the text to be processed according to the correlation after the tone parameters of the text to be processed are set;
and the synthesis unit is used for carrying out paragraph speech synthesis on the text to be processed according to the tone change proportion and the tone change direction.
7. The montage-based speech synthesis system according to claim 6, wherein the partitioning unit is specifically configured to:
performing paragraph division processing on existing natural paragraphs of a text to be processed through a line-feed key;
different paragraphs with the same scene type and the same emotion level type are combined into the same paragraph, and sub-paragraphs with different scene types and different emotion level types in the same paragraph are correspondingly divided into a plurality of paragraphs.
8. The montage-based speech synthesis system of claim 6, wherein the first computing unit is specifically configured to:
and carrying out relevance training after carrying out scene and emotion level labeling on the text to be processed manually to obtain a relevance calculation model, and calculating the relevance of the scenes and the emotion levels of adjacent paragraphs in the actual paragraphs based on the relevance calculation model.
9. The montage-based speech synthesis system of claim 6, wherein the second computing unit is specifically configured to:
setting the proportion range of the total tone change of the text to be processed, the upper limit and the lower limit of the reference tone and the initial tone, calculating the tone change proportion of adjacent paragraphs, calculating the proportion of the total tone change and the correlation and the rising and falling of the tones of adjacent paragraphs, and using the proportion and the rising and falling as the tone change direction, thereby obtaining the tone change proportion and the tone change direction of the text to be processed.
10. A montage-based speech synthesis apparatus, the apparatus comprising a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to perform the montage-based speech synthesis method of any of claims 1-5 according to instructions in the program code.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210285222.7A CN114373444B (en) | 2022-03-23 | 2022-03-23 | Method, system and equipment for synthesizing voice based on montage |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210285222.7A CN114373444B (en) | 2022-03-23 | 2022-03-23 | Method, system and equipment for synthesizing voice based on montage |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114373444A CN114373444A (en) | 2022-04-19 |
CN114373444B true CN114373444B (en) | 2022-05-27 |
Family
ID=81146954
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210285222.7A Active CN114373444B (en) | 2022-03-23 | 2022-03-23 | Method, system and equipment for synthesizing voice based on montage |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114373444B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114678006B (en) * | 2022-05-30 | 2022-08-23 | 广东电网有限责任公司佛山供电局 | Rhythm-based voice synthesis method and system |
CN114783402B (en) * | 2022-06-22 | 2022-09-13 | 广东电网有限责任公司佛山供电局 | Variation method and device for synthetic voice, electronic equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110211563A (en) * | 2019-06-19 | 2019-09-06 | 平安科技(深圳)有限公司 | Chinese speech synthesis method, apparatus and storage medium towards scene and emotion |
CN111243571A (en) * | 2020-01-14 | 2020-06-05 | 北京字节跳动网络技术有限公司 | Text processing method, device and equipment and computer readable storage medium |
CN111292715A (en) * | 2020-02-03 | 2020-06-16 | 北京奇艺世纪科技有限公司 | Speech synthesis method, speech synthesis device, electronic equipment and computer-readable storage medium |
CN111681641A (en) * | 2020-05-26 | 2020-09-18 | 微软技术许可有限责任公司 | Phrase-based end-to-end text-to-speech (TTS) synthesis |
WO2021060591A1 (en) * | 2019-09-26 | 2021-04-01 | 미디어젠 주식회사 | Device for changing speech synthesis models according to character utterance contexts |
-
2022
- 2022-03-23 CN CN202210285222.7A patent/CN114373444B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110211563A (en) * | 2019-06-19 | 2019-09-06 | 平安科技(深圳)有限公司 | Chinese speech synthesis method, apparatus and storage medium towards scene and emotion |
WO2021060591A1 (en) * | 2019-09-26 | 2021-04-01 | 미디어젠 주식회사 | Device for changing speech synthesis models according to character utterance contexts |
CN111243571A (en) * | 2020-01-14 | 2020-06-05 | 北京字节跳动网络技术有限公司 | Text processing method, device and equipment and computer readable storage medium |
CN111292715A (en) * | 2020-02-03 | 2020-06-16 | 北京奇艺世纪科技有限公司 | Speech synthesis method, speech synthesis device, electronic equipment and computer-readable storage medium |
CN111681641A (en) * | 2020-05-26 | 2020-09-18 | 微软技术许可有限责任公司 | Phrase-based end-to-end text-to-speech (TTS) synthesis |
Also Published As
Publication number | Publication date |
---|---|
CN114373444A (en) | 2022-04-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114373444B (en) | Method, system and equipment for synthesizing voice based on montage | |
CN110955786B (en) | Dance action data generation method and device | |
CN109088995B (en) | Method and mobile phone for supporting global language translation | |
CN106448630B (en) | Method and device for generating digital music score file of song | |
CN110889381A (en) | Face changing method and device, electronic equipment and storage medium | |
CN106375780B (en) | A kind of multimedia file producting method and its equipment | |
CN110264993B (en) | Speech synthesis method, device, equipment and computer readable storage medium | |
CN110427809A (en) | Lip reading recognition methods, device, electronic equipment and medium based on deep learning | |
CN111696029A (en) | Virtual image video generation method and device, computer equipment and storage medium | |
CN114255187A (en) | Multi-level and multi-level image optimization method and system based on big data platform | |
CN108764114B (en) | Signal identification method and device, storage medium and terminal thereof | |
CN108170676A (en) | Method, system and the terminal of story creation | |
CN110297897B (en) | Question-answer processing method and related product | |
CN113327576A (en) | Speech synthesis method, apparatus, device and storage medium | |
CN117152308B (en) | Virtual person action expression optimization method and system | |
CN116612759A (en) | Speech recognition method and storage medium | |
CN115333879B (en) | Remote conference method and system | |
CN111785236A (en) | Automatic composition method based on motivational extraction model and neural network | |
CN108717851A (en) | A kind of audio recognition method and device | |
CN110298903B (en) | Curve editing method and device, computing equipment and storage medium | |
CN113312902A (en) | Intelligent auditing and checking method and device for same text | |
CN112381151A (en) | Similar video determination method and device | |
CN113345411B (en) | Sound changing method, device, equipment and storage medium | |
CN117348736B (en) | Digital interaction method, system and medium based on artificial intelligence | |
CN110312040B (en) | Information processing method, device and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |