CN114863906B

CN114863906B - Method and device for marking alias of text-to-speech processing

Info

Publication number: CN114863906B
Application number: CN202210791135.9A
Authority: CN
Inventors: 刘丹; 汤跃忠; 田野; 杨静波; 陈龙
Original assignee: Third Research Institute Of China Electronics Technology Group Corp; Beijing Zhongdian Huisheng Technology Co ltd
Current assignee: Third Research Institute Of China Electronics Technology Group Corp; Beijing Zhongdian Huisheng Technology Co ltd
Priority date: 2022-07-07
Filing date: 2022-07-07
Publication date: 2022-10-28
Anticipated expiration: 2042-07-07
Also published as: CN114863906A

Abstract

The invention discloses an alias marking method and device for text-to-speech processing, which comprises the following steps: providing a plurality of tag menu items, each tag menu item having a tag tool for a class of functions, and the plurality of tag menu items including at least a first menu item for alias tagging; after the first target text is selected, adding alias information for the first target text based on a first menu item, and performing associated presentation on the alias information and the first target text in a text form; and under the condition that the alias information does not meet the requirement, selecting a second target text from texts corresponding to the alias information, and adding mark information to the second target text based on the selected mark menu item. According to the alias marking method provided by the embodiment of the invention, the added alias information is presented in a text form in a correlated manner with the first target text, so that the alias information can be further corrected for the second time, the availability and the usability of the alias marking are improved, and the accuracy of text-to-speech is improved.

Description

Method and device for marking alias of text-to-speech processing

Technical Field

The invention relates to the technical field of voice transcription, in particular to an alias marking method and device for text-to-voice processing.

Background

In the text-to-speech audio software, the accuracy and naturalness of the synthesized speech can be improved by adding text pronunciation and prosody marks.

When part of the content of the original text is inconsistent with the pronunciation content of the desired synthesized voice (spoken language, dialect, kana, abbreviation, etc.), and the original text needs to be kept unchanged, the user needs to replace the pronunciation content with the desired pronunciation character in the form of a mark. If the pronunciation replacing text is directly modified on the original text, the situations of wrongly written characters or improper semanteme and the like in the article are easily caused, the readability of the synthesized text is damaged, the intelligibility is reduced, the speech synthesis efficiency is reduced, and the retention and tracing of the original text and the use of secondary speech synthesis are not facilitated.

In the prior art, an alias adding mode which is convenient for a user is not available, for example, if an alias is added in a mark form, the mark is used as an integral graphic symbol, the content or the type of the mark is displayed, and content modification or deletion can be performed by clicking, but the prosody or pronunciation modification cannot be performed on the content again.

Disclosure of Invention

The embodiment of the invention provides an alias marking method and device for text-to-speech processing, which are used for providing an alias adding method, and a user can further correct aliases, so that the problem of reprocessing text marks is solved.

The embodiment of the invention provides an alias marking method for text-to-speech processing, which comprises the following steps:

providing a plurality of tag menu items, each tag menu item having a tag tool for a class of functions, and the plurality of tag menu items including at least a first menu item for alias tagging;

after a first target text is selected, adding alias information for the first target text based on the first menu item, wherein the alias information is in a text form and is in associated presentation with the first target text;

and under the condition that the alias information does not meet the requirement, selecting a second target text from the texts corresponding to the alias information, and adding mark information to the second target text based on the selected mark menu item.

Optionally, when the alias information does not meet the requirement, directly modifying the text corresponding to the alias information to modify the alias information.

Optionally, in the case of deleting the first target text, the associated alias information is deleted at the same time.

Optionally, the method further includes: in a text-to-speech process, the desired audio is synthesized based on the alias information.

Optionally, the original text corresponding to the alias information is not pronounced, and the original text corresponding to the alias information and the alias information are displayed at the same time.

Optionally, the alias information added to the first target text is presented in a different color from the first target text.

The embodiment of the present invention further provides an alias marking device for text-to-speech processing, which includes a processor and a memory, where the memory stores a computer program, and the computer program, when executed by the processor, implements the steps of the alias marking method for text-to-speech processing.

The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the alias tagging method for text-to-speech processing are implemented.

According to the alias marking method provided by the embodiment of the invention, the added alias information is associated with the first target text in a text form for presentation, so that the alias information can be further secondarily corrected, the availability and the usability of alias marking are improved, and the accuracy of text-to-speech is improved.

The above description is only an overview of the technical solutions of the present invention, and the present invention can be implemented in accordance with the content of the description so as to make the technical means of the present invention more clearly understood, and the above and other objects, features, and advantages of the present invention will be more clearly understood.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 is a basic flow diagram of an alias tagging method according to an embodiment of the present application;

FIG. 2 illustrates a menu item for marking in accordance with an embodiment of the present application;

FIG. 3 is an example of alias addition according to an embodiment of the present application;

fig. 4 is an example of secondary labeling of the added alias information according to the embodiment of the present application.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

The scheme of the embodiment of the application mainly solves the problem that in the process of adding the alias, if the added alias is more than two characters, the situation that machine-synthesized voice prosody is unnatural easily occurs in the replaced text, and the common marking design method of the competitive product software cannot solve the problem. On the basis of keeping the mark icon, the alternative text of the alias is added into the original text in a text form, so that the function of adding the mark in the mark can be realized. The problem of reprocessing the text marks is well solved. The alias tag is deleted at the same time, and all tags and content within the tag can be deleted at the same time.

Specifically, an embodiment of the present invention provides an alias tagging method for text-to-speech processing, as shown in fig. 1, including the following steps:

in step S101, a plurality of marker menu items each having a marker tool of one kind of function are provided, and the plurality of marker menu items include at least a first menu item for alias marker. Referring specifically to FIG. 2, in some examples, the plurality of marker menu items includes at least: pause markers, read-through markers, polyphone markers, local volume markers, reread markers, alias markers.

In step S102, after a first target text is selected, alias information is added to the first target text based on the first menu item, and the alias information is presented in a text form in association with the first target text. As shown in fig. 3, the text segment "futures transaction and derivatives transaction and related activities outside china" wherein the alias information of "china" is "the people's republic of china", the alias information "the people's republic of china" is displayed in a text form in association with the first target text "china" in this example.

In step S103, when the alias information does not meet the requirement, a second target text is selected from the texts corresponding to the alias information, and label information is added to the second target text based on the selected label menu item. In some embodiments, the original text corresponding to the alias information is not pronounced, and the original text corresponding to the alias information is displayed simultaneously with the alias information. In some embodiments, further comprising: in a text-to-speech process, the desired audio is synthesized based on the alias information. In the voice synthesis process, the required audio is synthesized by the alias information ' the people's republic of China ', and the first target text ' China ' does not pronounce. In some examples, in a case where the user finds that there is a pronunciation or prosody problem in the "people's republic of china", a mark may be added based on the text corresponding to the alias information, and as shown in fig. 4, a pause mark "no pause" may be added to the text segment "people" of the alias information to further improve the prosody of the alias information for speech synthesis and improve the accuracy of speech synthesis.

According to the alias marking method provided by the embodiment of the invention, the added alias information is presented in a text form in a correlated manner with the first target text, so that the alias information can be further corrected for the second time, the availability and the usability of the alias marking are improved, and the accuracy of text-to-speech is improved.

In some embodiments, in the case that the alias information is not satisfactory, the text corresponding to the alias information is directly modified to modify the alias information. With further reference to the foregoing example, in the present application, the alias information is presented in a text form in association with the first target text, and if the user finds that the alias information provided by the program is inconsistent with the desired alias information, the text information corresponding to the alias information may be manually modified, thereby further improving the efficiency of adding the alias.

In some embodiments, in the event the first target text is deleted, the associated alias information is deleted at the same time. In some application scenarios, for example, if the user deletes the first target text, the corresponding alias information is also deleted at the same time, thereby avoiding the user from needing a secondary operation to delete the alias information, and improving the efficiency of alias addition.

In some embodiments, the alias information added for the first target text is presented in a different color than the first target text. The added alias information can be conveniently reviewed by the user by being presented in different colors.

According to the scheme, on the basis of reserving the alias mark icon, the alternative text of the alias is added into the original text in a text form, and the function of adding a new mark for the second time in the mark can be achieved. The problem of reprocessing the text marks is well solved. And meanwhile, the alias mark is deleted, so that all marks and contents in the mark can be deleted at the same time, and the efficiency of adding the alias in the text is greatly improved.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the description of the foregoing embodiments, it is clear to those skilled in the art that the method of the foregoing embodiments may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but in many cases, the former is a better implementation. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. An alias tagging method for text-to-speech processing, comprising:

under the condition that the alias information does not meet the requirement, selecting a second target text from texts corresponding to the alias information, and adding mark information for the second target text based on the selected mark menu item; under the condition that the alias information does not meet the requirement, directly modifying the text corresponding to the alias information to modify the alias information;

synthesizing required audio based on the alias information in a text-to-speech process;

and the original text corresponding to the alias information is not pronounced, and the original text corresponding to the alias information and the alias information are displayed at the same time.

2. The text-to-speech process alias tagging method of claim 1, wherein in the event that the first target text is deleted, the associated alias information is deleted at the same time.

3. The text-to-speech process alias tagging method of claim 1, wherein the alias information added for the first target text is presented in a different color than the first target text.

4. An alias tagging apparatus for text to speech processing, comprising a processor and a memory, the memory storing a computer program which, when executed by the processor, performs the steps of the alias tagging method for text to speech processing as claimed in any one of claims 1 to 3.

5. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the text-to-speech processed alias tagging method according to any one of claims 1 to 3.