CN116092477A - Voice synthesis system mark memory library-based audio generation method and device - Google Patents

Voice synthesis system mark memory library-based audio generation method and device Download PDF

Info

Publication number
CN116092477A
CN116092477A CN202310322513.3A CN202310322513A CN116092477A CN 116092477 A CN116092477 A CN 116092477A CN 202310322513 A CN202310322513 A CN 202310322513A CN 116092477 A CN116092477 A CN 116092477A
Authority
CN
China
Prior art keywords
text
memory
audio file
user
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310322513.3A
Other languages
Chinese (zh)
Inventor
杨静波
汤跃忠
陈龙
刘丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Third Research Institute Of China Electronics Technology Group Corp
Beijing Zhongdian Huisheng Technology Co ltd
Original Assignee
Third Research Institute Of China Electronics Technology Group Corp
Beijing Zhongdian Huisheng Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Third Research Institute Of China Electronics Technology Group Corp, Beijing Zhongdian Huisheng Technology Co ltd filed Critical Third Research Institute Of China Electronics Technology Group Corp
Priority to CN202310322513.3A priority Critical patent/CN116092477A/en
Publication of CN116092477A publication Critical patent/CN116092477A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • G06F3/04842Selection of displayed objects or displayed text elements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/117Tagging; Marking up; Designating a block; Setting of attributes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/04Details of speech synthesis systems, e.g. synthesiser structure or memory management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention provides an audio generation method and device based on a speech synthesis system mark memory library, wherein the method comprises the following steps: acquiring a text to be searched; searching the text to be searched based on a pre-configured memory bank text to obtain a mark text matched with the memory bank text in the text to be searched; determining the marking information of the marking text in the memory text as the marking information of the marking text in the text to be searched; generating a corresponding audio file based on the marked text with the marked information; based on the interaction process with the user, it is determined whether the audio file meets the user requirements. The invention can be used for automatically searching the speech synthesis text content through the memory function, and once the detection result is matched with the memory content, the text marking function and the scheme of the memory can be called, thereby realizing the speech synthesis effect in the memory, avoiding repeated manual addition of the same mark by a user and greatly reducing the speech synthesis workload of the user.

Description

Voice synthesis system mark memory library-based audio generation method and device
Technical Field
The invention relates to the technical field of speech synthesis, in particular to an audio generation method and device based on a speech synthesis system mark memory bank.
Background
Currently, a speech synthesis system marks text content based on the personalized requirements of a user, such as pause marks, continuous reading marks, rereading marks, multi-tone marks, alias marks, and the like, in the use process. However, the personalized requirements of each user may be similar, when the next speech item is synthesized after the speech synthesis of one item is completed, and when the content which needs to be marked again is encountered, the same text content needs to be marked again, so that the operation mode can cause repeated labor and the process is complex and tedious.
Disclosure of Invention
The invention aims to solve the technical problem of simplifying the repeated marking process in the voice synthesis process; in view of the above, the present invention provides an audio generating method and apparatus based on a tag memory of a speech synthesis system.
The invention adopts the technical scheme that the audio generation method based on the voice synthesis system mark memory library comprises the following steps:
acquiring a text to be searched;
searching the text to be searched based on a pre-configured memory bank text to obtain a mark text matched with the memory bank text in the text to be searched;
determining the marking information of the marking text in the memory text as the marking information of the marking text in the text to be searched;
generating a corresponding audio file based on the marked text with the marked information;
based on the interaction process with the user, it is determined whether the audio file meets the user requirements.
In one embodiment, the method further comprises:
and setting corresponding mark information for part of texts in the memory bank texts.
In one embodiment, the method further comprises:
and in the memory text, at least one of adding, editing and deleting the mark information is processed.
In one embodiment, the determining whether the audio file meets the user requirement based on the interaction process with the user includes:
and responding to the confirmation information of the user on the audio file, and outputting the current audio file.
In one embodiment, the determining whether the audio file meets the user requirement based on the interaction process with the user includes:
responding to negative information of the user on the audio file, and further configuring a mark text corresponding to the audio file;
the reconfigured markup text is synthesized into an audio file for further interaction with the user.
In one embodiment, the responding to the negative information of the user on the audio file further configures the marked text corresponding to the audio file, and the method comprises the following steps:
and responding to negative information of the user on the audio file, and further configuring the marking information in the marking text corresponding to the audio file.
In one embodiment, the responding to the negative information of the user on the audio file further configures the marking information in the marking text corresponding to the audio file, and the method includes:
and responding to negative information of the user on the audio file, and performing at least one of adding, deleting and modifying operations on the marking information in the marking text corresponding to the audio file.
The invention also provides an audio generating device based on the speech synthesis system mark memory library, comprising:
an acquisition unit configured to acquire a text to be retrieved;
the retrieval unit is configured to retrieve the text to be retrieved based on a pre-configured memory bank text so as to obtain a marked text matched with the memory bank text in the text to be retrieved;
the calling unit is configured to determine the marking information of the marking text in the memory text as the marking information of the marking text in the text to be searched;
an audio synthesis unit configured to generate a corresponding audio file based on the markup text with markup information;
and the interaction unit is configured to determine whether the audio file meets the requirement of a user based on the interaction process with the user.
Another aspect of the present invention also provides an electronic device including: memory, a processor and a computer program stored on the memory and executable on the processor, which when executed by the processor, performs the steps of the speech synthesis system tag memory based audio generation method as claimed in any one of the preceding claims.
Another aspect of the present invention also provides a computer storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the speech synthesis system tag memory based audio generation method as described in any of the above.
By adopting the technical scheme, the audio generation method based on the voice synthesis system mark memory bank provided by the invention can be used for automatically searching voice synthesis text content due to the function of the built-in memory bank, and once the detection result is matched with the memory bank content, the text mark function and scheme of the memory bank can be called, so that the voice synthesis effect in the memory bank is realized, and the user is prevented from repeatedly adding the same mark manually. The speech synthesis workload of the user is greatly reduced.
Drawings
FIG. 1 is a flow chart of an audio generation method based on a tag memory of a speech synthesis system according to an embodiment of the invention;
FIG. 2 is a flowchart of another audio generation method based on a tag memory of a speech synthesis system according to an embodiment of the present invention;
FIG. 3 is a screenshot of an unedditive markup page in an application example of a speech synthesis system markup repository-based audio generation method according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating the addition of a "read-through" markup page screenshot in an application example of a method for generating audio based on a markup memory of a speech synthesis system according to an embodiment of the present invention;
FIG. 5 is a screenshot added to a library page in an application example of a speech synthesis system tag library-based audio generation method according to an embodiment of the present invention;
FIG. 6 is a screenshot of a library tagging scheme invocation page in an application example of a speech synthesis system tag library-based audio generation method according to an embodiment of the present invention;
FIG. 7 is a screenshot of a memory management function in an application example of a method for generating audio based on a markup memory of a speech synthesis system according to an embodiment of the present invention;
FIG. 8 is a screenshot of a second editing page of a speech synthesis system tag library in an application example of the speech synthesis system tag library-based audio generation method according to an embodiment of the present invention;
FIG. 9 is a block diagram showing the construction of an audio generating apparatus based on a tag memory of a speech synthesis system according to an embodiment of the present invention;
fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to further describe the technical means and effects adopted by the present invention for achieving the intended purpose, the following detailed description of the present invention is given with reference to the accompanying drawings and preferred embodiments.
In the drawings, the thickness, size and shape of the object have been slightly exaggerated for convenience of explanation. The figures are merely examples and are not drawn to scale.
It will be further understood that the terms "comprises," "comprising," "includes," "including," "having," "containing," and/or "including," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. Furthermore, when a statement such as "at least one of the following" appears after a list of features that are listed, the entire listed feature is modified instead of modifying a separate element in the list. Furthermore, when describing embodiments of the present application, the use of "may" means "one or more embodiments of the present application. Also, the term "exemplary" is intended to refer to an example or illustration.
As used herein, the terms "substantially," "about," and the like are used as terms of a table approximation, not as terms of a table level, and are intended to illustrate inherent deviations in measured or calculated values that would be recognized by one of ordinary skill in the art.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
In the prior art, the electric speech synthesis system supports various means editing, including pause marks, continuous reading marks, repeated reading marks, multi-voice marks, digital reading marks, english reading marks, alias marks, local variable speed marks, local volume marks and multi-voice marks. When a user uses the electric speech synthesis system, personalized setting of the mark content can be realized according to actual needs. For example, "stable pushing" in the phrase "fast and stable pushing" is also an independent phrase, so that the machine cannot accurately judge the prosodic information of the word when performing speech synthesis, and problems occur in word segmentation and sentence breaking. The combined 'hoof stable propulsion' has a slight pause between 'walking' and 'stable', which is inconsistent with the 'hoof stable' which is actually needed. At this time, the user can manually add the continuous reading mark to continuously read the 'hoof stable' step. After the artificial intervention, the shoe-shaped stepping stabilization pushing device is combined to slightly pause between the shoe-shaped stepping stabilization pushing device and the pushing device, so that the user requirements are met. However, the same problems are encountered in the subsequent text of the synthesis task or the subsequent synthesis task, and the user is still required to manually intervene, and the same marks are added to solve the problems. Such operations can bring significant labor costs to the user's use.
In a first embodiment of the present invention, as shown in fig. 1, an audio generating method based on a speech synthesis system tag memory library includes the following steps:
step S1, acquiring a text to be retrieved;
step S2, searching the text to be searched based on a pre-configured memory bank text to obtain a marked text matched with the memory bank text in the text to be searched;
step S3, determining the marking information of the marking text in the memory text as the marking information of the marking text in the text to be searched;
step S4, generating a corresponding audio file based on the marked text with the marked information;
step S5, based on the interaction process with the user, whether the audio file meets the user requirement is determined.
The method provided in this embodiment will be described in detail with reference to fig. 1 or 2.
And S1, acquiring a text to be retrieved.
In this embodiment, the text to be retrieved may be directly obtained by copying, importing, or the like, or may be manually edited and input. The text to be retrieved may include chinese characters, english characters, punctuation marks, numeric characters, or any character information that may exist in text form, as will not be limited in this context.
And step S2, searching the text to be searched based on the pre-configured memory bank text so as to obtain a mark text matched with the memory bank text in the text to be searched.
In this embodiment, the corresponding flag information may be set in advance for a part of the text in the memory text.
Specifically, the above configuration process for the memory bank may include: and at least one of adding, editing and deleting the mark information in the memory bank text.
In this embodiment, the mark text matched with the memory text in the text to be searched may be text information overlapping the text to be searched and the memory text.
Step S3, determining the marking information of the marking text in the memory text as the marking information of the marking text in the text to be searched;
that is, when text information overlapping or matching with the memory bank text is retrieved in the text to be retrieved, the tag information of the corresponding memory bank text may be directly called to the corresponding text information in the text to be retrieved.
And S4, generating a corresponding audio file based on the marked text with the marked information.
It will be appreciated that the audio file generated based on the markup text with the markup information is an audio file generated based on the markup text and with continuous and/or pauses (markup information).
Step S5, based on the interaction process with the user, whether the audio file meets the user requirement is determined.
In one embodiment, determining whether the audio file meets the user requirements based on an interaction process with the user includes: and responding to the confirmation information of the user on the audio file, and outputting the current audio file.
In one embodiment, determining whether the audio file meets the user requirements based on an interaction process with the user includes: responding to negative information of the user on the audio file, and further configuring a mark text corresponding to the audio file; the reconfigured markup text is synthesized into an audio file for further interaction with the user.
Specifically, responding to the negative information of the user on the audio file, further configuring the marked text corresponding to the audio file, and comprises the following steps: and responding to negative information of the user on the audio file, and further configuring the marking information in the marking text corresponding to the audio file.
Illustratively, in response to the negative information of the user on the audio file, further configuring the tag information in the tag text corresponding to the audio file, including: and responding to negative information of the user on the audio file, and performing at least one of adding, deleting and modifying operations on the marking information in the marking text corresponding to the audio file.
In this embodiment, after the user of the speech synthesis system completes the marking of the text content according to the actual need, the marking may be added to the memory bank. When the user performs the speech synthesis task again, the system automatically searches the text content, and if the text content matched with the memory bank appears, the same mark of the memory bank is automatically added. The similar problems are solved once. The class of speech synthesis tasks in the workplace of the same user is approximately consistent, such as politics, culture, military, etc., and the problems encountered are also approximately the same. The design and the realization of the function of the mark memory library can effectively reduce the workload of users, increase the automation degree of the system and promote the perception of users of products.
That is, the present embodiment has at least the following advantages:
in this embodiment, the system is designed to realize the function of the memory, automatically retrieve the text content of the speech synthesis, and once the detection result is matched with the content of the memory, the text marking function and the scheme of the memory are called, so as to realize the speech synthesis effect in the memory, and avoid the user from repeatedly manually adding the same mark. The speech synthesis workload of the user is greatly reduced.
The second embodiment of the present invention, corresponding to the first embodiment, introduces an application example of the audio generation method based on the speech synthesis system tag memory provided in the first embodiment.
In this embodiment, after the login system is successful, click on the [ real-time synthesis ] function menu to enter the real-time synthesis page. Clicking the [ listen to test ] button after editing the text content, listening to the synthesized audio, as shown in fig. 3. It should be understood that the system mentioned in this embodiment is a system for implementing the method provided in the first embodiment, and may be implemented in a computer in a software form, and the related designs such as page appearance and the like are merely exemplary in this embodiment, and are not intended to limit the scope of the present invention.
After listening to the audio file, for example, a slight pause is found between "step" and "steady" in "hoof step|steady push", which does not fit the scene. The 'hoof steady' text can be selected in a sliding way in the system, and the 'continuous reading' function button is clicked. The continuous reading mark is added into the 'hoof stable walking', as shown in figure 4, and the hearing is tried again, so that the requirements of users are met.
Further, the "continuous reading X (hoof-walk) pushing" word is slid, and the [ memory ] function button is clicked and added to the memory bank, as shown in FIG. 5.
Whenever a new composite task appears "hoof-walk-stable push" word, the system can automatically retrieve the matching memory and automatically invoke the memory text-marking scheme to add "hoof-walk-stable" marks as shown in fig. 6.
And, the memory management function may perform a second edit, as shown in fig. 7.
Specifically, the button can be clicked (edited) from the management page of the memory bank, the detailed editing page can be entered again, and the marking scheme in the memory bank can be modified. As shown in fig. 8.
The third embodiment of the present invention, corresponding to the first embodiment, introduces an audio generating device based on a tag memory of a speech synthesis system, as shown in fig. 9, and includes the following components:
an acquisition unit configured to acquire a text to be retrieved;
the retrieval unit is configured to retrieve the text to be retrieved based on the pre-configured memory bank text so as to obtain a mark text matched with the memory bank text in the text to be retrieved;
the calling unit is configured to determine the marking information of the marking text in the memory text as the marking information of the marking text in the text to be searched;
an audio synthesis unit configured to generate a corresponding audio file based on the markup text with the markup information;
and an interaction unit configured to determine whether the audio file satisfies the user requirement based on an interaction process with the user.
In one embodiment, the apparatus further comprises: the configuration module is used for setting corresponding mark information for part of texts in the memory bank texts.
In one embodiment, the configuration module is further to: and at least one of adding, editing and deleting the mark information in the memory bank text.
In one embodiment, the interaction unit is further configured to: and outputting the current audio file in response to the confirmation information of the user on the audio file.
In one embodiment, the interaction unit is further configured to: responding to negative information of the user on the audio file, and further configuring a mark text corresponding to the audio file; the reconfigured markup text is synthesized into an audio file for further interaction with the user.
In one embodiment, the interaction unit is further configured to: and responding to negative information of the user on the audio file, and further configuring the marking information in the marking text corresponding to the audio file.
In one embodiment, the interaction unit is further configured to: and responding to negative information of the user on the audio file, and performing at least one of adding, deleting and modifying operations on the marking information in the marking text corresponding to the audio file.
A fourth embodiment of the present invention, as shown in fig. 10, can be understood as a physical device, including a processor and a memory storing processor-executable instructions, which when executed by the processor, perform the following operations:
step S1, acquiring a text to be retrieved;
step S2, searching the text to be searched based on a pre-configured memory bank text to obtain a mark text matched with the memory bank text in the text to be searched;
step S3, determining the marking information of the marking text in the memory text as the marking information of the marking text in the text to be searched;
step S4, generating a corresponding audio file based on the marked text with the marked information;
step S5, based on the interaction process with the user, whether the audio file meets the user requirement is determined.
In a fifth embodiment of the present invention, the flow of the audio generation method based on the tag memory of the speech synthesis system in this embodiment is the same as that in the first, second or third embodiment, except that in engineering implementation, this embodiment may be implemented by means of software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but in many cases, the former is a preferred implementation. Based on such understanding, the method of the present invention may be embodied in the form of a computer software product stored on a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) comprising instructions for causing an apparatus to perform the method of the embodiments of the present invention.
While the invention has been described in connection with specific embodiments thereof, it is to be understood that these drawings are included in the spirit and scope of the invention, it is not to be limited thereto.

Claims (10)

1. An audio generation method based on a speech synthesis system mark memory library is characterized by comprising the following steps:
acquiring a text to be searched;
searching the text to be searched based on a pre-configured memory bank text to obtain a mark text matched with the memory bank text in the text to be searched;
determining the marking information of the marking text in the memory text as the marking information of the marking text in the text to be searched;
generating a corresponding audio file based on the marked text with the marked information;
based on the interaction process with the user, it is determined whether the audio file meets the user requirements.
2. The method for generating audio based on a speech synthesis system tag memory library of claim 1, further comprising:
and setting corresponding mark information for part of texts in the memory bank texts.
3. The method for generating audio based on a speech synthesis system tag memory library of claim 2, further comprising:
and in the memory text, at least one of adding, editing and deleting the mark information is processed.
4. The method for generating audio based on a tag memory of a speech synthesis system according to claim 1, wherein the determining whether the audio file satisfies the user requirement based on the interaction process with the user comprises:
and responding to the confirmation information of the user on the audio file, and outputting the current audio file.
5. The method for generating audio based on a tag memory of a speech synthesis system according to claim 1, wherein the determining whether the audio file satisfies the user requirement based on the interaction process with the user comprises:
responding to negative information of the user on the audio file, and further configuring a mark text corresponding to the audio file;
the reconfigured markup text is synthesized into an audio file for further interaction with the user.
6. The method for generating audio based on a markup memory of a speech synthesis system according to claim 5, wherein said further configuring markup text corresponding to said audio file in response to negative information of said audio file by a user comprises:
and responding to negative information of the user on the audio file, and further configuring the marking information in the marking text corresponding to the audio file.
7. The method for generating audio based on a markup memory of a speech synthesis system according to claim 6, wherein said further configuring markup information in markup text corresponding to said audio file in response to negative information of said audio file by a user comprises:
and responding to negative information of the user on the audio file, and performing at least one of adding, deleting and modifying operations on the marking information in the marking text corresponding to the audio file.
8. An audio generating device based on a speech synthesis system tag memory library, comprising:
an acquisition unit configured to acquire a text to be retrieved;
the retrieval unit is configured to retrieve the text to be retrieved based on a pre-configured memory bank text so as to obtain a marked text matched with the memory bank text in the text to be retrieved;
the calling unit is configured to determine the marking information of the marking text in the memory text as the marking information of the marking text in the text to be searched;
an audio synthesis unit configured to generate a corresponding audio file based on the markup text with markup information;
and the interaction unit is configured to determine whether the audio file meets the requirement of a user based on the interaction process with the user.
9. An electronic device, the electronic device comprising: memory, a processor and a computer program stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the speech synthesis system tag memory based audio generation method of any one of claims 1 to 7.
10. A computer storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the speech synthesis system tag memory based audio generation method of any of claims 1 to 7.
CN202310322513.3A 2023-03-30 2023-03-30 Voice synthesis system mark memory library-based audio generation method and device Pending CN116092477A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310322513.3A CN116092477A (en) 2023-03-30 2023-03-30 Voice synthesis system mark memory library-based audio generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310322513.3A CN116092477A (en) 2023-03-30 2023-03-30 Voice synthesis system mark memory library-based audio generation method and device

Publications (1)

Publication Number Publication Date
CN116092477A true CN116092477A (en) 2023-05-09

Family

ID=86204824

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310322513.3A Pending CN116092477A (en) 2023-03-30 2023-03-30 Voice synthesis system mark memory library-based audio generation method and device

Country Status (1)

Country Link
CN (1) CN116092477A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160147853A1 (en) * 2013-06-23 2016-05-26 T-Jat Systems (2006) Ltd. Method and system for consolidating data retrieved from different sources
US20170004821A1 (en) * 2014-10-30 2017-01-05 Kabushiki Kaisha Toshiba Voice synthesizer, voice synthesis method, and computer program product
US10140973B1 (en) * 2016-09-15 2018-11-27 Amazon Technologies, Inc. Text-to-speech processing using previously speech processed data
CN112634858A (en) * 2020-12-16 2021-04-09 平安科技(深圳)有限公司 Speech synthesis method, speech synthesis device, computer equipment and storage medium
US20210110811A1 (en) * 2019-10-11 2021-04-15 Samsung Electronics Company, Ltd. Automatically generating speech markup language tags for text
CN113516963A (en) * 2020-04-09 2021-10-19 菜鸟智能物流控股有限公司 Audio data generation method and device, server and intelligent loudspeaker box
CN114863906A (en) * 2022-07-07 2022-08-05 北京中电慧声科技有限公司 Method and device for marking alias of text-to-speech processing
CN114999438A (en) * 2021-05-08 2022-09-02 中移互联网有限公司 Audio playing method and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160147853A1 (en) * 2013-06-23 2016-05-26 T-Jat Systems (2006) Ltd. Method and system for consolidating data retrieved from different sources
US20170004821A1 (en) * 2014-10-30 2017-01-05 Kabushiki Kaisha Toshiba Voice synthesizer, voice synthesis method, and computer program product
US10140973B1 (en) * 2016-09-15 2018-11-27 Amazon Technologies, Inc. Text-to-speech processing using previously speech processed data
US20210110811A1 (en) * 2019-10-11 2021-04-15 Samsung Electronics Company, Ltd. Automatically generating speech markup language tags for text
CN113516963A (en) * 2020-04-09 2021-10-19 菜鸟智能物流控股有限公司 Audio data generation method and device, server and intelligent loudspeaker box
CN112634858A (en) * 2020-12-16 2021-04-09 平安科技(深圳)有限公司 Speech synthesis method, speech synthesis device, computer equipment and storage medium
CN114999438A (en) * 2021-05-08 2022-09-02 中移互联网有限公司 Audio playing method and device
CN114863906A (en) * 2022-07-07 2022-08-05 北京中电慧声科技有限公司 Method and device for marking alias of text-to-speech processing

Similar Documents

Publication Publication Date Title
US7788590B2 (en) Lightweight reference user interface
JP4651613B2 (en) Voice activated message input method and apparatus using multimedia and text editor
CN109597976B (en) Document editing method and device
CN107798123B (en) Knowledge base and establishing, modifying and intelligent question and answer methods, devices and equipment thereof
JP6165913B1 (en) Information processing apparatus, information processing method, and program
US20150024351A1 (en) System and Method for the Relevance-Based Categorizing and Near-Time Learning of Words
US11295069B2 (en) Speech to text enhanced media editing
WO2006046523A1 (en) Document analysis system and document adaptation system
CN111142667A (en) System and method for generating voice based on text mark
EP2682931B1 (en) Method and apparatus for recording and playing user voice in mobile terminal
JP4094777B2 (en) Image communication system
JP2009140466A (en) Method and system for providing conversation dictionary services based on user created dialog data
CN112084756A (en) Conference file generation method and device and electronic equipment
US20240169972A1 (en) Synchronization method and apparatus for audio and text, device, and medium
CN102323858B (en) Identify the input method of modification item in input, terminal and system
US11119727B1 (en) Digital tutorial generation system
CN110297965B (en) Courseware page display and page set construction method, device, equipment and medium
KR102643902B1 (en) Apparatus for managing minutes and method thereof
KR20000024318A (en) The TTS(text-to-speech) system and the service method of TTS through internet
CN116092477A (en) Voice synthesis system mark memory library-based audio generation method and device
JP2005173999A (en) Device, system and method for searching electronic file, program, and recording media
CN114841178A (en) Method, device, electronic equipment and storage medium for realizing session translation
CN112578965A (en) Processing method and device and electronic equipment
JP2002073662A (en) Information presenting device and recording medium with information presenting program recorded thereon
KR20210132115A (en) Editing Support Programs, Editing Support Methods, and Editing Support Devices

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination