CN114863906B - Method and device for marking alias of text-to-speech processing - Google Patents

Method and device for marking alias of text-to-speech processing Download PDF

Info

Publication number
CN114863906B
CN114863906B CN202210791135.9A CN202210791135A CN114863906B CN 114863906 B CN114863906 B CN 114863906B CN 202210791135 A CN202210791135 A CN 202210791135A CN 114863906 B CN114863906 B CN 114863906B
Authority
CN
China
Prior art keywords
alias
text
information
alias information
target text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210791135.9A
Other languages
Chinese (zh)
Other versions
CN114863906A (en
Inventor
刘丹
汤跃忠
田野
杨静波
陈龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Third Research Institute Of China Electronics Technology Group Corp
Beijing Zhongdian Huisheng Technology Co ltd
Original Assignee
Third Research Institute Of China Electronics Technology Group Corp
Beijing Zhongdian Huisheng Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Third Research Institute Of China Electronics Technology Group Corp, Beijing Zhongdian Huisheng Technology Co ltd filed Critical Third Research Institute Of China Electronics Technology Group Corp
Priority to CN202210791135.9A priority Critical patent/CN114863906B/en
Publication of CN114863906A publication Critical patent/CN114863906A/en
Application granted granted Critical
Publication of CN114863906B publication Critical patent/CN114863906B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/0482Interaction with lists of selectable items, e.g. menus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • G06F3/04842Selection of displayed objects or displayed text elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/117Tagging; Marking up; Designating a block; Setting of attributes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/033Voice editing, e.g. manipulating the voice of the synthesiser

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention discloses an alias marking method and device for text-to-speech processing, which comprises the following steps: providing a plurality of tag menu items, each tag menu item having a tag tool for a class of functions, and the plurality of tag menu items including at least a first menu item for alias tagging; after the first target text is selected, adding alias information for the first target text based on a first menu item, and performing associated presentation on the alias information and the first target text in a text form; and under the condition that the alias information does not meet the requirement, selecting a second target text from texts corresponding to the alias information, and adding mark information to the second target text based on the selected mark menu item. According to the alias marking method provided by the embodiment of the invention, the added alias information is presented in a text form in a correlated manner with the first target text, so that the alias information can be further corrected for the second time, the availability and the usability of the alias marking are improved, and the accuracy of text-to-speech is improved.

Description

Method and device for marking alias of text-to-speech processing
Technical Field
The invention relates to the technical field of voice transcription, in particular to an alias marking method and device for text-to-voice processing.
Background
In the text-to-speech audio software, the accuracy and naturalness of the synthesized speech can be improved by adding text pronunciation and prosody marks.
When part of the content of the original text is inconsistent with the pronunciation content of the desired synthesized voice (spoken language, dialect, kana, abbreviation, etc.), and the original text needs to be kept unchanged, the user needs to replace the pronunciation content with the desired pronunciation character in the form of a mark. If the pronunciation replacing text is directly modified on the original text, the situations of wrongly written characters or improper semanteme and the like in the article are easily caused, the readability of the synthesized text is damaged, the intelligibility is reduced, the speech synthesis efficiency is reduced, and the retention and tracing of the original text and the use of secondary speech synthesis are not facilitated.
In the prior art, an alias adding mode which is convenient for a user is not available, for example, if an alias is added in a mark form, the mark is used as an integral graphic symbol, the content or the type of the mark is displayed, and content modification or deletion can be performed by clicking, but the prosody or pronunciation modification cannot be performed on the content again.
Disclosure of Invention
The embodiment of the invention provides an alias marking method and device for text-to-speech processing, which are used for providing an alias adding method, and a user can further correct aliases, so that the problem of reprocessing text marks is solved.
The embodiment of the invention provides an alias marking method for text-to-speech processing, which comprises the following steps:
providing a plurality of tag menu items, each tag menu item having a tag tool for a class of functions, and the plurality of tag menu items including at least a first menu item for alias tagging;
after a first target text is selected, adding alias information for the first target text based on the first menu item, wherein the alias information is in a text form and is in associated presentation with the first target text;
and under the condition that the alias information does not meet the requirement, selecting a second target text from the texts corresponding to the alias information, and adding mark information to the second target text based on the selected mark menu item.
Optionally, when the alias information does not meet the requirement, directly modifying the text corresponding to the alias information to modify the alias information.
Optionally, in the case of deleting the first target text, the associated alias information is deleted at the same time.
Optionally, the method further includes: in a text-to-speech process, the desired audio is synthesized based on the alias information.
Optionally, the original text corresponding to the alias information is not pronounced, and the original text corresponding to the alias information and the alias information are displayed at the same time.
Optionally, the alias information added to the first target text is presented in a different color from the first target text.
The embodiment of the present invention further provides an alias marking device for text-to-speech processing, which includes a processor and a memory, where the memory stores a computer program, and the computer program, when executed by the processor, implements the steps of the alias marking method for text-to-speech processing.
The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the alias tagging method for text-to-speech processing are implemented.
According to the alias marking method provided by the embodiment of the invention, the added alias information is associated with the first target text in a text form for presentation, so that the alias information can be further secondarily corrected, the availability and the usability of alias marking are improved, and the accuracy of text-to-speech is improved.
The above description is only an overview of the technical solutions of the present invention, and the present invention can be implemented in accordance with the content of the description so as to make the technical means of the present invention more clearly understood, and the above and other objects, features, and advantages of the present invention will be more clearly understood.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a basic flow diagram of an alias tagging method according to an embodiment of the present application;
FIG. 2 illustrates a menu item for marking in accordance with an embodiment of the present application;
FIG. 3 is an example of alias addition according to an embodiment of the present application;
fig. 4 is an example of secondary labeling of the added alias information according to the embodiment of the present application.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The scheme of the embodiment of the application mainly solves the problem that in the process of adding the alias, if the added alias is more than two characters, the situation that machine-synthesized voice prosody is unnatural easily occurs in the replaced text, and the common marking design method of the competitive product software cannot solve the problem. On the basis of keeping the mark icon, the alternative text of the alias is added into the original text in a text form, so that the function of adding the mark in the mark can be realized. The problem of reprocessing the text marks is well solved. The alias tag is deleted at the same time, and all tags and content within the tag can be deleted at the same time.
Specifically, an embodiment of the present invention provides an alias tagging method for text-to-speech processing, as shown in fig. 1, including the following steps:
in step S101, a plurality of marker menu items each having a marker tool of one kind of function are provided, and the plurality of marker menu items include at least a first menu item for alias marker. Referring specifically to FIG. 2, in some examples, the plurality of marker menu items includes at least: pause markers, read-through markers, polyphone markers, local volume markers, reread markers, alias markers.
In step S102, after a first target text is selected, alias information is added to the first target text based on the first menu item, and the alias information is presented in a text form in association with the first target text. As shown in fig. 3, the text segment "futures transaction and derivatives transaction and related activities outside china" wherein the alias information of "china" is "the people's republic of china", the alias information "the people's republic of china" is displayed in a text form in association with the first target text "china" in this example.
In step S103, when the alias information does not meet the requirement, a second target text is selected from the texts corresponding to the alias information, and label information is added to the second target text based on the selected label menu item. In some embodiments, the original text corresponding to the alias information is not pronounced, and the original text corresponding to the alias information is displayed simultaneously with the alias information. In some embodiments, further comprising: in a text-to-speech process, the desired audio is synthesized based on the alias information. In the voice synthesis process, the required audio is synthesized by the alias information ' the people's republic of China ', and the first target text ' China ' does not pronounce. In some examples, in a case where the user finds that there is a pronunciation or prosody problem in the "people's republic of china", a mark may be added based on the text corresponding to the alias information, and as shown in fig. 4, a pause mark "no pause" may be added to the text segment "people" of the alias information to further improve the prosody of the alias information for speech synthesis and improve the accuracy of speech synthesis.
According to the alias marking method provided by the embodiment of the invention, the added alias information is presented in a text form in a correlated manner with the first target text, so that the alias information can be further corrected for the second time, the availability and the usability of the alias marking are improved, and the accuracy of text-to-speech is improved.
In some embodiments, in the case that the alias information is not satisfactory, the text corresponding to the alias information is directly modified to modify the alias information. With further reference to the foregoing example, in the present application, the alias information is presented in a text form in association with the first target text, and if the user finds that the alias information provided by the program is inconsistent with the desired alias information, the text information corresponding to the alias information may be manually modified, thereby further improving the efficiency of adding the alias.
In some embodiments, in the event the first target text is deleted, the associated alias information is deleted at the same time. In some application scenarios, for example, if the user deletes the first target text, the corresponding alias information is also deleted at the same time, thereby avoiding the user from needing a secondary operation to delete the alias information, and improving the efficiency of alias addition.
In some embodiments, the alias information added for the first target text is presented in a different color than the first target text. The added alias information can be conveniently reviewed by the user by being presented in different colors.
According to the scheme, on the basis of reserving the alias mark icon, the alternative text of the alias is added into the original text in a text form, and the function of adding a new mark for the second time in the mark can be achieved. The problem of reprocessing the text marks is well solved. And meanwhile, the alias mark is deleted, so that all marks and contents in the mark can be deleted at the same time, and the efficiency of adding the alias in the text is greatly improved.
The embodiment of the present invention further provides an alias marking device for text-to-speech processing, which includes a processor and a memory, where the memory stores a computer program, and the computer program, when executed by the processor, implements the steps of the alias marking method for text-to-speech processing.
The embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the alias tagging method for text-to-speech processing are implemented.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the description of the foregoing embodiments, it is clear to those skilled in the art that the method of the foregoing embodiments may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but in many cases, the former is a better implementation. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (5)

1. An alias tagging method for text-to-speech processing, comprising:
providing a plurality of tag menu items, each tag menu item having a tag tool for a class of functions, and the plurality of tag menu items including at least a first menu item for alias tagging;
after a first target text is selected, adding alias information for the first target text based on the first menu item, wherein the alias information is in a text form and is in associated presentation with the first target text;
under the condition that the alias information does not meet the requirement, selecting a second target text from texts corresponding to the alias information, and adding mark information for the second target text based on the selected mark menu item; under the condition that the alias information does not meet the requirement, directly modifying the text corresponding to the alias information to modify the alias information;
synthesizing required audio based on the alias information in a text-to-speech process;
and the original text corresponding to the alias information is not pronounced, and the original text corresponding to the alias information and the alias information are displayed at the same time.
2. The text-to-speech process alias tagging method of claim 1, wherein in the event that the first target text is deleted, the associated alias information is deleted at the same time.
3. The text-to-speech process alias tagging method of claim 1, wherein the alias information added for the first target text is presented in a different color than the first target text.
4. An alias tagging apparatus for text to speech processing, comprising a processor and a memory, the memory storing a computer program which, when executed by the processor, performs the steps of the alias tagging method for text to speech processing as claimed in any one of claims 1 to 3.
5. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the text-to-speech processed alias tagging method according to any one of claims 1 to 3.
CN202210791135.9A 2022-07-07 2022-07-07 Method and device for marking alias of text-to-speech processing Active CN114863906B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210791135.9A CN114863906B (en) 2022-07-07 2022-07-07 Method and device for marking alias of text-to-speech processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210791135.9A CN114863906B (en) 2022-07-07 2022-07-07 Method and device for marking alias of text-to-speech processing

Publications (2)

Publication Number Publication Date
CN114863906A CN114863906A (en) 2022-08-05
CN114863906B true CN114863906B (en) 2022-10-28

Family

ID=82625946

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210791135.9A Active CN114863906B (en) 2022-07-07 2022-07-07 Method and device for marking alias of text-to-speech processing

Country Status (1)

Country Link
CN (1) CN114863906B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116092477A (en) * 2023-03-30 2023-05-09 北京中电慧声科技有限公司 Voice synthesis system mark memory library-based audio generation method and device

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6947947B2 (en) * 2001-08-17 2005-09-20 Universal Business Matrix Llc Method for adding metadata to data
JP6415929B2 (en) * 2014-10-30 2018-10-31 株式会社東芝 Speech synthesis apparatus, speech synthesis method and program
JP6728116B2 (en) * 2017-09-21 2020-07-22 株式会社東芝 Speech recognition device, speech recognition method and program
CN108647197B (en) * 2018-05-08 2021-07-27 腾讯科技(深圳)有限公司 Information processing method, device and storage medium
CN111142667A (en) * 2019-12-27 2020-05-12 苏州思必驰信息科技有限公司 System and method for generating voice based on text mark
CN113539235B (en) * 2021-07-13 2024-02-13 标贝(青岛)科技有限公司 Text analysis and speech synthesis method, device, system and storage medium
CN114023302B (en) * 2022-01-10 2022-05-24 北京中电慧声科技有限公司 Text speech processing device and text pronunciation processing method

Also Published As

Publication number Publication date
CN114863906A (en) 2022-08-05

Similar Documents

Publication Publication Date Title
US6397183B1 (en) Document reading system, read control method, and recording medium
US5909667A (en) Method and apparatus for fast voice selection of error words in dictated text
US6401067B2 (en) System and method for providing user-directed constraints for handwriting recognition
US7165034B2 (en) Information processing apparatus and method, and program
US20080034044A1 (en) Electronic mail reader capable of adapting gender and emotions of sender
JP2015018276A (en) Method and device for recognizing utterance of user
CN114863906B (en) Method and device for marking alias of text-to-speech processing
JP4872323B2 (en) HTML mail generation system, communication apparatus, HTML mail generation method, and recording medium
CN112805734A (en) Speech assistance device for calling attention to speech inhibition words
JP2006236315A (en) Method and apparatus for enabling foreign language text display when encoding is not available
CN111211970A (en) Mail contact adding method, device and storage medium
CN108805958A (en) A kind of image processing method and device
CN114863907B (en) Marking method and device for text-to-speech processing
JPH10162098A (en) Document electrolyzing device and method therefor
CN113220738A (en) Business rule generation method, device, equipment and storage medium
JP6392445B2 (en) Transliteration support device, transliteration support method, and transliteration support program
CN112101003A (en) Sentence text segmentation method, device and equipment and computer readable storage medium
CN112686000A (en) Format conversion method of electronic book document, electronic equipment and storage medium
JP2010117529A (en) Device, method and program for generating voice reading sentence
US10216385B2 (en) Group radio control
US11928431B2 (en) Method of processing language, recording medium, system for processing language, and language processing apparatus
US5617120A (en) Two-relation icon ranking and selecting method
CN116153289A (en) Processing method and related device for speech synthesis marked text
JP3838507B2 (en) Sentence reading apparatus, program for reading out, and recording medium
WO2015022759A1 (en) Document file generation device, document file generation method, and document file generation program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant