CN114373444B - Method, system and equipment for synthesizing voice based on montage - Google Patents

Method, system and equipment for synthesizing voice based on montage Download PDF

Info

Publication number
CN114373444B
CN114373444B CN202210285222.7A CN202210285222A CN114373444B CN 114373444 B CN114373444 B CN 114373444B CN 202210285222 A CN202210285222 A CN 202210285222A CN 114373444 B CN114373444 B CN 114373444B
Authority
CN
China
Prior art keywords
paragraphs
text
processed
tone
proportion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210285222.7A
Other languages
Chinese (zh)
Other versions
CN114373444A (en
Inventor
余勇
钟少恒
陈志刚
王翊
曹小冬
吴启明
蔡勇超
林承勋
吕华良
丁铖
林家树
郭泽豪
符春造
方美明
陈瑾
李鸿盛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Foshan Power Supply Bureau of Guangdong Power Grid Corp
Original Assignee
Foshan Power Supply Bureau of Guangdong Power Grid Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Foshan Power Supply Bureau of Guangdong Power Grid Corp filed Critical Foshan Power Supply Bureau of Guangdong Power Grid Corp
Priority to CN202210285222.7A priority Critical patent/CN114373444B/en
Publication of CN114373444A publication Critical patent/CN114373444A/en
Application granted granted Critical
Publication of CN114373444B publication Critical patent/CN114373444B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/137Hierarchical processing, e.g. outlines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser
    • G10L13/0335Pitch control
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination

Abstract

The application discloses a method, a system and equipment for synthesizing voice based on montage, wherein the method comprises the following steps: after paragraph segmentation preprocessing is carried out on existing natural paragraphs of a text to be processed, the text to be processed is divided into a plurality of actual paragraphs based on a scene type and an emotion level type; calculating the relevance of the scenes and the emotion levels of adjacent paragraphs in a plurality of actual paragraphs; after the intonation parameters of the text to be processed are set, the intonation change proportion and the intonation change direction of the text to be processed are calculated according to the correlation; and performing paragraph speech synthesis on the text to be processed according to the tone change proportion and the tone change direction. The technical problem that the speech synthesis sounds very hard in the prior art is solved.

Description

Method, system and equipment for synthesizing voice based on montage
Technical Field
The present application relates to the field of speech synthesis technologies, and in particular, to a method, a system, and an apparatus for synthesizing speech based on montage.
Background
Montage generally refers to scene conversion in a movie, and materials are selected and rejected through splitting and assembling of a lens, a scene and a paragraph, so that the expressed content is mainly and clearly identified, and high summarization and concentration are achieved.
Disclosure of Invention
The application provides a method, a system and equipment for synthesizing voice based on montage, which are used for solving the technical problem that the voice synthesis in the prior art sounds very hard.
In view of the above, the present application provides, in a first aspect, a montage-based speech synthesis method, including:
after paragraph segmentation preprocessing is carried out on existing natural paragraphs of a text to be processed, the text to be processed is divided into a plurality of actual paragraphs based on a scene type and an emotion level type;
calculating the relevance of scenes and emotion hierarchies of adjacent paragraphs in a plurality of actual paragraphs;
after the intonation parameters of the text to be processed are set, the intonation change proportion and the intonation change direction of the text to be processed are calculated according to the correlation;
and performing paragraph speech synthesis on the text to be processed according to the tone change proportion and the tone change direction.
Optionally, the paragraph segmentation preprocessing is performed on the existing natural paragraphs of the text to be processed, and specifically includes: and performing paragraph division processing on the existing natural paragraphs of the text to be processed through the line feed key.
Optionally, the dividing the text to be processed into a plurality of actual paragraphs based on the scene type and the emotion level type specifically includes:
different paragraphs with the same scene type and the same emotion level type are combined into the same paragraph, and sub-paragraphs with different scene types and different emotion level types in the same paragraph are correspondingly divided into a plurality of paragraphs.
Optionally, the calculating the relevance between the scene and the emotion level of adjacent paragraphs in the actual paragraphs specifically includes:
and carrying out relevance training after marking scenes and emotion levels on the text to be processed manually to obtain a relevance calculation model, and calculating the relevance of the scenes and the emotion levels of adjacent paragraphs in a plurality of actual paragraphs based on the relevance calculation model.
Optionally, after the intonation parameters of the text to be processed are set, the intonation change proportion and the intonation change direction of the text to be processed are calculated according to the correlation, and the method specifically includes:
setting the proportion range of the total tone change of the text to be processed and the upper and lower limits of the reference tone and the initial tone, calculating the tone change proportion of adjacent paragraphs, calculating the proportion of the total tone change and the correlation and the rising and falling of the tone of the adjacent paragraphs, and using the proportion and the rising and falling as the tone change direction, thereby obtaining the tone change proportion and the tone change direction of the text to be processed.
A second aspect of the application provides a montage-based speech synthesis system, the system comprising:
the dividing unit is used for dividing the text to be processed into a plurality of actual paragraphs based on the scene type and the emotion level type after paragraph segmentation preprocessing is carried out on the existing natural paragraphs of the text to be processed;
the first calculation unit is used for calculating the relevance of scenes and emotion hierarchies of adjacent paragraphs in a plurality of actual paragraphs;
the second calculation unit is used for calculating the tone change proportion and the tone change direction of the text to be processed according to the correlation after setting tone parameters of the text to be processed;
and the synthesis unit is used for carrying out paragraph speech synthesis on the text to be processed according to the tone change proportion and the tone change direction.
Optionally, the dividing unit is specifically configured to:
performing paragraph division processing on existing natural paragraphs of a text to be processed through a line-feed key;
different paragraphs with the same scene type and the same emotion level type are combined into the same paragraph, and sub-paragraphs with different scene types and different emotion level types in the same paragraph are correspondingly divided into a plurality of paragraphs.
Optionally, the first computing unit is specifically configured to:
and carrying out relevance training after marking scenes and emotion levels on the text to be processed manually to obtain a relevance calculation model, and calculating the relevance of the scenes and the emotion levels of adjacent paragraphs in a plurality of actual paragraphs based on the relevance calculation model.
Optionally, the second computing unit is specifically configured to:
setting the proportion range of the total tone change of the text to be processed, the upper limit and the lower limit of the reference tone and the initial tone, calculating the tone change proportion of adjacent paragraphs, calculating the proportion of the total tone change and the correlation and the rising and falling of the tones of adjacent paragraphs, and using the proportion and the rising and falling as the tone change direction, thereby obtaining the tone change proportion and the tone change direction of the text to be processed.
A third aspect of the application provides a montage-based speech synthesis apparatus, the apparatus comprising a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is adapted to perform the steps of the montage-based speech synthesis method according to the first aspect as described above, according to instructions in the program code.
According to the technical scheme, the method has the following advantages:
the application provides a method for synthesizing voice based on montage, which comprises the following steps: after paragraph segmentation preprocessing is carried out on existing natural paragraphs of a text to be processed, the text to be processed is divided into a plurality of actual paragraphs based on a scene type and an emotion level type; calculating the relevance of scenes and emotion hierarchies of adjacent paragraphs in a plurality of actual paragraphs; after the intonation parameters of the text to be processed are set, the intonation change proportion and the intonation change direction of the text to be processed are calculated according to the correlation; and performing paragraph speech synthesis on the text to be processed according to the tone change proportion and the tone change direction. Compared with the prior art, the method comprises the steps of firstly dividing a text to be processed according to a scene and emotion levels to obtain paragraphs according with an actual scene and emotion, then calculating the correlation between adjacent paragraphs, determining the parameters such as the starting tone of the paragraphs and the reference tone of the actual paragraphs based on the correlation, and thus obtaining the tone change proportion and the tone change direction of the text to be processed, and finally carrying out paragraph speech synthesis according to the determined tone change proportion and the tone change direction, so that the speech synthesis is more vivid and accords with the auditory habits of people. Therefore, the technical problem that the speech synthesis sounds very hard in the prior art is solved.
Drawings
Fig. 1 is a schematic flowchart of an embodiment of a montage-based speech synthesis method provided in an embodiment of the present application;
fig. 2 is a schematic structural diagram of an embodiment of a montage-based speech synthesis system provided in the embodiment of the present application.
Detailed Description
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, an embodiment of a method for synthesizing a montage-based speech according to the present application includes:
step 101, after paragraph segmentation preprocessing is carried out on existing natural paragraphs of a text to be processed, the text to be processed is divided into a plurality of actual paragraphs based on scene types and emotion level types;
it should be noted that, in this embodiment, first, a line feed key is used to perform paragraph division processing on existing natural paragraphs of a text to be processed, then different paragraphs with the same scene type and the same emotion level type are merged into the same paragraph, and sub-paragraphs with different scene types and different emotion level types in the same paragraph are correspondingly divided into a plurality of paragraphs. It is understood that, for example: 1) although the text to be processed is two paragraphs, the two paragraphs are combined into one paragraph if the text is in the same scene and the same layer; 2) although the text is a paragraph, the text refers to a plurality of scenes and a plurality of emotion hierarchies, but the text is divided into different paragraphs according to the types of the scenes and the types of the emotion hierarchies.
102, calculating the relevance of scenes and emotion hierarchies of adjacent paragraphs in a plurality of actual paragraphs;
it should be noted that in the embodiment, a correlation calculation model is obtained by performing correlation training after labeling scenes and emotion levels of a text to be processed manually, and the correlation between the scenes and the emotion levels of adjacent paragraphs in a plurality of actual paragraphs is calculated based on the correlation calculation model. It can be understood that a large amount of correlation training needs to be performed through manual labeling, for example, the a and B segment scene correlation K =50%, and the correlation range is: 0 to 100 percent.
103, after the intonation parameters of the text to be processed are set, calculating the intonation change proportion and the intonation change direction of the text to be processed according to the correlation;
it should be noted that, in this embodiment, a ratio range of a total tone change of the text to be processed, an upper limit and a lower limit of a reference tone and a tone starting limit are set, a tone change ratio of adjacent paragraphs is calculated, and a ratio of the total tone change to the correlation and a rise and a fall of a tone of the adjacent paragraphs are calculated and used as a tone change direction, so that the tone change ratio and the tone change direction of the text to be processed are obtained.
The method comprises the following specific steps:
1) determining a total pitch change ratio range {0% -H% } (typically 0-50%); determining an upper limit JDH and a lower limit JDL of a reference intonation value JD and an upper limit QDH and a lower limit QDL of a starting intonation QD;
2) determining a ratio of total pitch change to correlation R = H/100;
3) obtaining the scene correlation Kn of each paragraph relative to the upper part intonation by artificial intelligence;
4) determining the tone change proportion Vn = R × Kn of the current segment and the previous segment;
5) determining the rise and fall of the speech pitch of the section relative to the previous section:
a. after the intonation is raised or lowered, if JD belongs to [ JDH, JDL ] and QD belongs to [ QDH, QDL ], the raising or lowering of the intonation is randomly determined.
b. If JD does not belong to [ JDH _, JDL ] or QD does not belong to [ QDH, QDL ] after intonation lifting, the change direction of intonation is changed (if lifting is originally wanted, lifting is needed after exceeding the range).
And step 104, performing paragraph speech synthesis on the text to be processed according to the tone change proportion and the tone change direction.
Finally, according to the tone variation ratio and the tone variation direction determined in step 103, the paragraph speech synthesis is performed.
The embodiment of the method for synthesizing the voice based on the montage includes the steps of firstly dividing a text to be processed according to a scene and an emotion level to obtain paragraphs conforming to an actual scene and emotion, then calculating correlation of adjacent paragraphs, determining and then determining parameters such as starting tone of the paragraphs and reference tone of the actual paragraphs based on the correlation to obtain tone change proportion and tone change direction of the text to be processed, and finally performing paragraph voice synthesis according to the determined tone change proportion and tone change direction to enable voice synthesis to be more vivid and conform to auditory habits of people. Therefore, the technical problem that the speech synthesis sounds very hard in the prior art is solved.
The foregoing is an embodiment of a method for synthesizing a voice based on a montage provided in the embodiment of the present application, and the following is an embodiment of a system for synthesizing a voice based on a montage provided in the embodiment of the present application.
Referring to fig. 2, an embodiment of a montage-based speech synthesis system provided in an embodiment of the present application includes:
the dividing unit 201 is configured to divide a text to be processed into a plurality of actual paragraphs based on a scene type and an emotion level type after performing paragraph segmentation preprocessing on existing natural paragraphs of the text to be processed;
a first calculating unit 202, configured to calculate correlations between scenes and emotion hierarchies of adjacent paragraphs in a plurality of actual paragraphs;
the second calculating unit 203 is configured to calculate a tone change ratio and a tone change direction of the text to be processed according to the correlation after setting the tone parameter of the text to be processed;
and the synthesis unit 204 is configured to perform paragraph speech synthesis on the text to be processed according to the tone variation ratio and the tone variation direction.
In this embodiment, a montage-based speech synthesis system divides a text to be processed according to a scene and an emotion level to obtain paragraphs conforming to an actual scene and emotion, calculates correlation between adjacent paragraphs, determines and then determines parameters such as a start tone of a paragraph and a reference tone of the actual paragraph based on the correlation to obtain a tone variation ratio and a tone variation direction of the text to be processed, and finally performs speech synthesis on the paragraphs according to the determined tone variation ratio and tone variation direction to make the speech synthesis more vivid and conform to auditory habits of a person. Therefore, the technical problem that the speech synthesis sounds very hard in the prior art is solved.
Further, an embodiment of the present application further provides a montage-based speech synthesis device, where the device includes a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is used for executing the montage-based speech synthesis method of the above method embodiment according to the instructions in the program code
It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the system, the unit and the device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The terms "first," "second," "third," "fourth," and the like in the description of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be understood that in the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" for describing an association relationship of associated objects, indicating that there may be three relationships, e.g., "a and/or B" may indicate: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one type of logical functional division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a portable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present application.

Claims (10)

1. A method for synthesizing a voice based on a montage is characterized by comprising the following steps:
after paragraph segmentation preprocessing is carried out on existing natural paragraphs of a text to be processed, the text to be processed is divided into a plurality of actual paragraphs based on a scene type and an emotion level type;
calculating the relevance of scenes and emotion hierarchies of adjacent paragraphs in a plurality of actual paragraphs;
after the intonation parameters of the text to be processed are set, the intonation change proportion and the intonation change direction of the text to be processed are calculated according to the correlation;
and performing paragraph speech synthesis on the text to be processed according to the tone change proportion and the tone change direction.
2. The montage-based speech synthesis method according to claim 1, wherein the pre-processing of paragraph segmentation is performed on existing natural paragraphs of the text to be processed, and specifically comprises: and performing paragraph division processing on the existing natural paragraphs of the text to be processed through the line feed key.
3. The montage-based speech synthesis method according to claim 1, wherein the dividing of the text to be processed into a plurality of actual paragraphs based on the scene type and the emotion level type specifically comprises:
different paragraphs with the same scene type and the same emotion level type are combined into the same paragraph, and sub-paragraphs with different scene types and different emotion level types in the same paragraph are correspondingly divided into a plurality of paragraphs.
4. The montage-based speech synthesis method according to claim 1, wherein the calculating of the correlation between the scene and the emotion level of adjacent paragraphs in the plurality of actual paragraphs specifically comprises:
and carrying out relevance training after marking scenes and emotion levels on the text to be processed manually to obtain a relevance calculation model, and calculating the relevance of the scenes and the emotion levels of adjacent paragraphs in a plurality of actual paragraphs based on the relevance calculation model.
5. The montage-based speech synthesis method according to claim 1, wherein after the intonation parameters of the text to be processed are set, the intonation change proportion and the intonation change direction of the text to be processed are calculated according to the correlation, and the method specifically comprises the following steps:
setting the proportion range of the total tone change of the text to be processed and the upper and lower limits of the reference tone and the initial tone, calculating the tone change proportion of adjacent paragraphs, calculating the proportion of the total tone change and the correlation and the rising and falling of the tone of the adjacent paragraphs, and using the proportion and the rising and falling as the tone change direction, thereby obtaining the tone change proportion and the tone change direction of the text to be processed.
6. A montage-based speech synthesis system, comprising:
the dividing unit is used for dividing the text to be processed into a plurality of actual paragraphs based on the scene type and the emotion level type after paragraph segmentation preprocessing is carried out on the existing natural paragraphs of the text to be processed;
the first calculation unit is used for calculating the relevance of the scenes and the emotion hierarchies of adjacent paragraphs in the actual paragraphs;
the second calculation unit is used for calculating the tone change proportion and the tone change direction of the text to be processed according to the correlation after the tone parameters of the text to be processed are set;
and the synthesis unit is used for carrying out paragraph speech synthesis on the text to be processed according to the tone change proportion and the tone change direction.
7. The montage-based speech synthesis system according to claim 6, wherein the partitioning unit is specifically configured to:
performing paragraph division processing on existing natural paragraphs of a text to be processed through a line-feed key;
different paragraphs with the same scene type and the same emotion level type are combined into the same paragraph, and sub-paragraphs with different scene types and different emotion level types in the same paragraph are correspondingly divided into a plurality of paragraphs.
8. The montage-based speech synthesis system of claim 6, wherein the first computing unit is specifically configured to:
and carrying out relevance training after carrying out scene and emotion level labeling on the text to be processed manually to obtain a relevance calculation model, and calculating the relevance of the scenes and the emotion levels of adjacent paragraphs in the actual paragraphs based on the relevance calculation model.
9. The montage-based speech synthesis system of claim 6, wherein the second computing unit is specifically configured to:
setting the proportion range of the total tone change of the text to be processed, the upper limit and the lower limit of the reference tone and the initial tone, calculating the tone change proportion of adjacent paragraphs, calculating the proportion of the total tone change and the correlation and the rising and falling of the tones of adjacent paragraphs, and using the proportion and the rising and falling as the tone change direction, thereby obtaining the tone change proportion and the tone change direction of the text to be processed.
10. A montage-based speech synthesis apparatus, the apparatus comprising a processor and a memory:
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to perform the montage-based speech synthesis method of any of claims 1-5 according to instructions in the program code.
CN202210285222.7A 2022-03-23 2022-03-23 Method, system and equipment for synthesizing voice based on montage Active CN114373444B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210285222.7A CN114373444B (en) 2022-03-23 2022-03-23 Method, system and equipment for synthesizing voice based on montage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210285222.7A CN114373444B (en) 2022-03-23 2022-03-23 Method, system and equipment for synthesizing voice based on montage

Publications (2)

Publication Number Publication Date
CN114373444A CN114373444A (en) 2022-04-19
CN114373444B true CN114373444B (en) 2022-05-27

Family

ID=81146954

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210285222.7A Active CN114373444B (en) 2022-03-23 2022-03-23 Method, system and equipment for synthesizing voice based on montage

Country Status (1)

Country Link
CN (1) CN114373444B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114678006B (en) * 2022-05-30 2022-08-23 广东电网有限责任公司佛山供电局 Rhythm-based voice synthesis method and system
CN114783402B (en) * 2022-06-22 2022-09-13 广东电网有限责任公司佛山供电局 Variation method and device for synthetic voice, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110211563A (en) * 2019-06-19 2019-09-06 平安科技(深圳)有限公司 Chinese speech synthesis method, apparatus and storage medium towards scene and emotion
CN111243571A (en) * 2020-01-14 2020-06-05 北京字节跳动网络技术有限公司 Text processing method, device and equipment and computer readable storage medium
CN111292715A (en) * 2020-02-03 2020-06-16 北京奇艺世纪科技有限公司 Speech synthesis method, speech synthesis device, electronic equipment and computer-readable storage medium
CN111681641A (en) * 2020-05-26 2020-09-18 微软技术许可有限责任公司 Phrase-based end-to-end text-to-speech (TTS) synthesis
WO2021060591A1 (en) * 2019-09-26 2021-04-01 미디어젠 주식회사 Device for changing speech synthesis models according to character utterance contexts

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110211563A (en) * 2019-06-19 2019-09-06 平安科技(深圳)有限公司 Chinese speech synthesis method, apparatus and storage medium towards scene and emotion
WO2021060591A1 (en) * 2019-09-26 2021-04-01 미디어젠 주식회사 Device for changing speech synthesis models according to character utterance contexts
CN111243571A (en) * 2020-01-14 2020-06-05 北京字节跳动网络技术有限公司 Text processing method, device and equipment and computer readable storage medium
CN111292715A (en) * 2020-02-03 2020-06-16 北京奇艺世纪科技有限公司 Speech synthesis method, speech synthesis device, electronic equipment and computer-readable storage medium
CN111681641A (en) * 2020-05-26 2020-09-18 微软技术许可有限责任公司 Phrase-based end-to-end text-to-speech (TTS) synthesis

Also Published As

Publication number Publication date
CN114373444A (en) 2022-04-19

Similar Documents

Publication Publication Date Title
CN114373444B (en) Method, system and equipment for synthesizing voice based on montage
CN110955786B (en) Dance action data generation method and device
CN109088995B (en) Method and mobile phone for supporting global language translation
CN106448630B (en) Method and device for generating digital music score file of song
CN110889381A (en) Face changing method and device, electronic equipment and storage medium
CN106375780B (en) A kind of multimedia file producting method and its equipment
CN110264993B (en) Speech synthesis method, device, equipment and computer readable storage medium
CN110427809A (en) Lip reading recognition methods, device, electronic equipment and medium based on deep learning
CN111696029A (en) Virtual image video generation method and device, computer equipment and storage medium
CN114255187A (en) Multi-level and multi-level image optimization method and system based on big data platform
CN108764114B (en) Signal identification method and device, storage medium and terminal thereof
CN108170676A (en) Method, system and the terminal of story creation
CN110297897B (en) Question-answer processing method and related product
CN113327576A (en) Speech synthesis method, apparatus, device and storage medium
CN117152308B (en) Virtual person action expression optimization method and system
CN116612759A (en) Speech recognition method and storage medium
CN115333879B (en) Remote conference method and system
CN111785236A (en) Automatic composition method based on motivational extraction model and neural network
CN108717851A (en) A kind of audio recognition method and device
CN110298903B (en) Curve editing method and device, computing equipment and storage medium
CN113312902A (en) Intelligent auditing and checking method and device for same text
CN112381151A (en) Similar video determination method and device
CN113345411B (en) Sound changing method, device, equipment and storage medium
CN117348736B (en) Digital interaction method, system and medium based on artificial intelligence
CN110312040B (en) Information processing method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant