CN111667815B - Method, apparatus, chip circuit and medium for text-to-speech conversion - Google Patents

Method, apparatus, chip circuit and medium for text-to-speech conversion Download PDF

Info

Publication number
CN111667815B
CN111667815B CN202010498289.XA CN202010498289A CN111667815B CN 111667815 B CN111667815 B CN 111667815B CN 202010498289 A CN202010498289 A CN 202010498289A CN 111667815 B CN111667815 B CN 111667815B
Authority
CN
China
Prior art keywords
text
audio data
converted
stored
unique identifier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010498289.XA
Other languages
Chinese (zh)
Other versions
CN111667815A (en
Inventor
封宣阳
蔡海蛟
冯歆鹏
周骥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NextVPU Shanghai Co Ltd
Original Assignee
NextVPU Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NextVPU Shanghai Co Ltd filed Critical NextVPU Shanghai Co Ltd
Priority to CN202010498289.XA priority Critical patent/CN111667815B/en
Publication of CN111667815A publication Critical patent/CN111667815A/en
Application granted granted Critical
Publication of CN111667815B publication Critical patent/CN111667815B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination

Abstract

The application provides a method, device, computer readable storage medium and chip circuit for text-to-speech conversion. The method comprises the following steps: receiving a text to be converted; generating a unique identifier for the text to be converted; determining whether the unique identifier is stored; and if the unique identifier is determined to be stored, acquiring audio data corresponding to the unique identifier as output audio of the text to be converted.

Description

Method, apparatus, chip circuit and medium for text-to-speech conversion
Technical Field
The present application relates to the field of speech synthesis, and more particularly to a method for text-to-speech conversion, a device implementing such a method, a chip circuit and a computer readable storage medium.
Background
Speech recognition and speech synthesis techniques are two key techniques required to implement human-machine speech communication. Among them, speech synthesis is a technique of generating artificial speech by a mechanical, electronic method. TTS (Text To Speech) is an important Speech synthesis technique that can convert an input Text file into a Speech output in natural language. Currently, TTS technology is widely applied to various fields such as voice navigation, audio books, online translation, online education, etc., and helps to convert input or built-in text content into audio data and play the audio data. A typical TTS process includes: text content is input to the TTS engine, which converts the text content into audio data, which is then played through a speaker. The TTS engine repeatedly performs the above-described process without distinction even if the same text content is encountered later. Since audio output is only an auxiliary output mode in many current application scenarios, the audio output is not frequently used, and the timeliness requirement is not high, so that the processing burden caused by repetition is acceptable.
However, for some applications, such as applications dedicated to help vision-impaired people to read, frequent text-to-speech conversion is required, and if the above processing manner is still used, the processing amount and power consumption of the device will be greatly increased, which causes waste of resources, and it is difficult to ensure real-time performance.
Disclosure of Invention
Aiming at the problems, the application provides a scheme for converting text into voice, which accelerates the speed of traditional text-to-voice conversion, lightens the processing load of a voice synthesis module and is very suitable for application scenes with large text-to-voice processing requirement and high timeliness requirement.
According to one aspect of the present application, a method for text-to-speech conversion is provided. The method comprises the following steps: receiving a text to be converted; generating a unique identifier for the text to be converted; determining whether the unique identifier is stored; and if the unique identifier is determined to be stored, acquiring audio data corresponding to the unique identifier as output audio of the text to be converted.
According to another aspect of the present application, there is provided an apparatus for text-to-speech conversion. The apparatus includes: a memory having computer program code stored thereon; and a processor configured to run the computer program code to perform the method as described above.
According to yet another aspect of the present application, a computer-readable storage medium is provided. The computer readable storage medium has stored thereon computer program code which, when executed, performs the method as described above.
According to yet another aspect of the present application, there is provided a chip circuit comprising a circuit unit configured to perform the method as described above upon power-up.
Drawings
FIG. 1 illustrates a flow chart of a method for text-to-speech conversion according to some embodiments of the application;
FIG. 2 illustrates a flow chart of a method for text-to-speech conversion according to further embodiments of the application;
FIG. 3 illustrates a flow chart of a method for text-to-speech conversion according to still other embodiments of the present application; and
FIG. 4 shows a schematic block diagram of an example device that may be used to implement an embodiment of the application.
Detailed Description
The following detailed description of various embodiments of the present application will be provided in connection with the accompanying drawings to provide a clearer understanding of the objects, features and advantages of the present application. It should be understood that the embodiments shown in the drawings are not intended to limit the scope of the application, but rather are merely illustrative of the true spirit of the application.
In the following description, for the purposes of explanation of various inventive embodiments, certain specific details are set forth in order to provide a thorough understanding of the various inventive embodiments. One skilled in the relevant art will recognize, however, that an embodiment may be practiced without one or more of the specific details. In other instances, well-known devices, structures, and techniques associated with the present application may not be shown or described in detail to avoid unnecessarily obscuring the description of the embodiments.
Throughout the specification and claims, unless the context requires otherwise, the word "comprise" and variations such as "comprises" and "comprising" will be understood to be open-ended, meaning of inclusion, i.e. to be interpreted to mean "including, but not limited to.
Reference throughout this specification to "one embodiment" or "some embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. Thus, appearances of the phrases "in one embodiment" or "in some embodiments" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Furthermore, the terms first, second and the like in the description and in the claims, are used for descriptive purposes only and not for limiting the size or other order of the objects described.
Fig. 1 illustrates a flow chart of a method 100 for text-to-speech conversion according to some embodiments of the application. The method 100 may be performed, for example, by the apparatus 400 as described below. As in fig. 4, fig. 4 shows a schematic block diagram of an example device 400 that may be used to implement an embodiment of the application.
As shown in fig. 1, the method 100 includes a step 110 of receiving text to be converted. Here, the text to be converted may be received from the outside through various I/O interfaces (I/O interface 450 described below). For example, the text to be converted may be input by a user through an input device such as a keyboard or a mouse. Alternatively, the text to be converted may be text generated after a user captures an image through an input device such as a camera and performs image recognition (e.g., optical Character Recognition (OCR)). In some other embodiments, the text to be converted may also be predetermined text preset in the device 400, such as text for voice prompts or navigation, etc.
Next, at step 120, a unique Identifier (ID) is generated for the text to be converted. The unique identifier is used to uniquely identify the text to be converted.
In some embodiments, the unique identifier may include a field for indicating the content of the text to be converted. For example, the field may contain some coding of the text to be converted itself or some variation of the coding (e.g., hash value).
In other embodiments, the unique identifier may also include other information about the text to be converted. For example, the unique identifier may include a second field for indicating at least one of a type of a speech synthesis module (e.g., TTS engine) used by the device 400, a user preferred speech rate, and a tone color, in addition to the first field for indicating the content of the text to be converted, as described above. Here, the voice synthesis module may be any type of voice synthesis module currently existing or developed in the future, and the type thereof may include a model (manufacturer), version, and the like of the voice synthesis module. The speech synthesis module may be implemented, for example, by CPU 410 in device 400 running a computer program for implementing speech synthesis. Alternatively, the speech synthesis module may be a separate dedicated chip or the like. As an example of a speech synthesis module, various types of TTS engines have been developed by a number of vendors (e.g., google, mass, hundred degrees, etc.), and all of these TTS engines may be used as the speech synthesis module described herein. The speech rate of the user's preference may be set by the user at initialization or automatically by the device 400 based on the user's previous behavioral habits, such as the speech rate of the user when making voice input. For example, the speech rate may be set to several gears, such as fast, medium, slow, etc., or the speech rate may be set to a specific rate, such as 60-80 words/min. The user preferred timbre may also be set by the user at initialization or automatically by the device 400 according to the user's previous behavioral habits. For example, the tone may include male, female, child, etc. Alternatively, the tone color may be a tone color customized by a user, such as the voice of the user himself or a specific other person, or the like.
Next, at step 130, a determination is made as to whether the unique identifier is stored. In one embodiment, a database is utilized in device 400 to store unique identifiers corresponding to different text. Here, the database may be a database integrated in the device 400 as described below (e.g., in ROM 420, RAM 430, storage unit 480, or other flash memory) or in a separate storage device independent of the device 400. In one embodiment, to ensure real-time output audio, the database is provided in a form that facilitates quick access by the processor of the device 400, such as in a cache.
If it is determined that the unique identifier is stored (yes in step 130), in step 140, audio data corresponding to the unique identifier is acquired as output audio corresponding to the text to be converted. The output audio may be played to the user at step 170 as output audio corresponding to the text to be converted. In one embodiment, audio data corresponding to unique identifiers of different text and related information for the audio data may be stored in a cache of device 400. Here, the cache may be located in RAM 430 (e.g., NVRAM), storage unit 480 of device 400, or in a flash memory, disk, or hard disk that is separate from device 400, as described below.
In some embodiments, the information associated with the unique identifier may include a starting location of the audio data in the buffer corresponding to the unique identifier and a length of the audio data.
On the other hand, if it is determined that the unique identifier is not saved (no in step 130), then in step 150, the text to be converted may be converted into audio data (e.g., by inputting the text to be converted into a speech synthesis module). The audio data may be played at step 170 as output audio corresponding to the text to be converted.
In addition, between steps 150 and 170, the method 100 may further include step 160, in which the audio data obtained in step 150 is stored and related information about the audio data (such as a starting position of the audio data in the buffer and a length of the audio data) is stored. In one embodiment, the audio data resulting from step 150 may be stored in a buffer and related information for the audio data stored in a database, such that the audio data and related information facilitate the execution of the next text-to-speech conversion process.
Note that although step 160 is shown in fig. 1 as being located between steps 150 and 170, those skilled in the art will appreciate that step 160 may also be performed after step 170 or in parallel with step 170 without departing from the scope of the present disclosure.
With the scheme shown in fig. 1, for the text to be converted in which the corresponding audio data has been stored in the device 400, the method 100 may directly acquire the audio data without performing the process of speech synthesis, thereby greatly improving the real-time performance of audio data acquisition. The text to be converted, which does not store the corresponding audio data in the device 400, can still be converted by the speech synthesis module, so that the conversion effect is ensured.
Fig. 2 illustrates a flow chart of a method 200 for text-to-speech conversion in accordance with further embodiments of the present application. The method 200 may be performed, for example, by the apparatus 400 as described below.
In the embodiment shown in fig. 2, steps 210, 220, 230, 240, 250, 260, and 270 are similar to steps 110, 120, 130, 140, 150, 160, and 170 in the embodiment shown in fig. 1. In the embodiment shown in fig. 2, the unique identifier of the text to be converted may include a second field for indicating at least one of a type of a voice synthesis module used by the apparatus 400, a user-preferred speech rate, and a tone color, in addition to the first field for indicating the content of the text to be converted.
Unlike the embodiment shown in fig. 1, in the method 200, after step 230, if it is determined that the unique identifier of the text to be converted is not saved (no in step 230), then in step 235, it is determined whether a second identifier matching the first field of the unique identifier is saved. In one embodiment, the first field of the matched second identifier is identical to the first field of the unique identifier of the text to be converted. In another embodiment, the first field of the matched second identifier is semantically identical or similar to the first field of the unique identifier of the text to be converted. In this case, in step 235, the text to be converted may be semantically analyzed and a second identifier having the same or similar semantics may be found from the database. Alternatively, the database may store a plurality of corresponding relation tables of texts with the same or similar semantics in advance. In this case, in step 235, a second identifier that is the same as or similar to the semantics of the text to be converted may be looked up from the correspondence table.
That is, a unique identifier (i.e., a second identifier) of a text that is identical or similar to the content of the text to be converted but different in other information (e.g., type of speech synthesis module used, speed of speech, and color of voice preferred by the user) may be found. If such a second identifier is present, it means that audio data of text which is identical or similar to the content of the text to be converted but different from other information has been stored in the buffer.
In this case, if it is determined that the second identifier is stored (yes in step 235), then in step 245, audio data corresponding to the second identifier may be found as output audio (e.g., based on information about the second identifier stored in the database). The output audio may be played as audio data corresponding to the text to be converted at step 270. That is, the output audio is audio data capable of embodying the content of the text to be converted but not fully conforming to the user's needs (type of speech synthesis module, speed of speech and tone of user preference, etc.).
On the other hand, if it is determined that the second identifier is not stored (no in step 235), the method 200 may proceed to step 250 to convert the text to be converted into audio data as output audio, similar to that in step 150.
With the scheme shown in fig. 2, in the case where audio data corresponding to the unique identifier of the text to be converted is not stored but audio data identical or similar to the content itself of the text to be converted is stored, the efficiency of the entire text to speech conversion can be improved at the cost of limited user demands, thereby improving user experience.
Fig. 3 illustrates a flow chart of a method 300 for text-to-speech conversion according to further embodiments of the application. The method 300 may be performed, for example, by the apparatus 400 as described below.
In the embodiment shown in fig. 3, steps 310, 320, 330, 340, 350, 360, and 370 are similar to steps 110, 120, 130, 140, 150, 160, and 170 in the embodiment shown in fig. 1 and steps 210, 220, 230, 240, 250, 260, and 270 in the embodiment shown in fig. 2.
Unlike the embodiment shown in fig. 1 and 2, in the method 300, after converting the text to be converted into audio data in step 350, step 355 may also be included, wherein it is further determined whether the conversion of step 350 is successful. And when it is determined that the conversion of step 350 is successful (yes in step 350), step 370 is executed to play the converted audio data as output audio. Conversely, if it is determined that the conversion of step 350 is unsuccessful, step 380 may be performed to obtain preset specific audio data as output audio (e.g., from a buffer) and play the output audio in step 370. Here, the preset specific audio data may indicate to the user that the conversion is unsuccessful and/or guide the user to perform a specific operation. For example, the specific audio data may be a malfunction processing guidance voice or a customer service guidance voice preset in the apparatus 400 for guiding the user to troubleshoot himself or for guiding the user to seek customer service assistance.
The methods 100 through 300 illustrated in fig. 1 through 3 are exemplary to illustrate aspects according to embodiments of the present application, and those skilled in the art will appreciate that these illustrations are not limiting of the scope of the present application, but may be combined or modified in various ways to extend the scope of the present application. For example, in one combination of the embodiments of fig. 2 and 3, steps 235 and 245 as shown in fig. 2 may be performed between steps 330 and 350. In another combination of the embodiments of fig. 2 and 3, steps 235 and 245 as shown in fig. 2 may be performed in place of step 380 when no is determined in step 355.
Fig. 4 shows a schematic block diagram of an example device 400 that may be used to implement an embodiment of the application. The device 400 may be, for example, a desktop or laptop computer or other electronic device for text-to-speech conversion. As shown, the device 400 may include one or more Central Processing Units (CPUs) 410 (only one schematically shown) that may perform various suitable actions and processes, such as a TTS engine for TTS conversion, according to computer program instructions stored in a Read Only Memory (ROM) 420 or loaded from a storage unit 480 into a Random Access Memory (RAM) 430. In RAM 430, various programs and data required for the operation of device 400 may also be stored. CPU 410, ROM 420, and RAM 430 are connected to each other by bus 440. An input/output (I/O) interface 450 is also connected to bus 440.
Various components in device 400 are connected to I/O interface 450, including: an input unit 460 such as a keyboard, a mouse, etc.; an output unit 470 such as various types of displays, speakers, and the like; a storage unit 480 such as a magnetic disk, an optical disk, or the like; and a communication unit 490, such as a network card, modem, wireless communication transceiver, etc. The communication unit 490 allows the device 400 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks.
The methods 100 to 300 described above may be performed, for example, by the processing unit 410 of the device 400. For example, in some embodiments, methods 100-300 may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 480. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 400 via ROM 420 and/or communication unit 490. When the computer program is loaded into RAM 430 and executed by CPU 410, one or more operations of methods 100 through 300 described above may be performed. In addition, the communication unit 490 may support wired or wireless communication functions.
Methods 100 through 300 for text-to-speech conversion and apparatus 400 according to the present application are described above with reference to the accompanying drawings. Those skilled in the art will appreciate, however, that device 400 need not include all of the components shown in fig. 4, but may include only some of the components necessary to perform the functions described herein, and that the manner in which these components are connected is not limited to the form shown in the figures. For example, where the device 400 is a portable device such as a cell phone, the device 400 may have a different structure than in fig. 4.
The present application may be embodied as methods, apparatus, chip circuits and/or computer program products. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for performing various aspects of the present application. The chip circuitry may include circuit elements for performing various aspects of the application.
The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.
The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.
Computer program instructions for carrying out operations of the present application may be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of remote computers, the remote computer may be connected to the user computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (e.g., connected through the internet using an internet service provider). In some embodiments, aspects of the present application are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information for computer readable program instructions, which can execute the computer readable program instructions.
Various aspects of the present application are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer readable program instructions may be provided to a processing unit of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processing unit of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
According to some embodiments of the present application, a method for text-to-speech conversion is provided. The method comprises the following steps: receiving a text to be converted; generating a unique identifier for the text to be converted; determining whether the unique identifier is stored; and if the unique identifier is determined to be stored, acquiring the audio data corresponding to the unique identifier as the output audio of the text to be converted.
According to some embodiments of the application, wherein determining whether the unique identifier is stored comprises: determining whether the unique identifier is stored in a database, and wherein obtaining audio data corresponding to the unique identifier as output audio of the text to be converted comprises: and acquiring audio data corresponding to the unique identifier from a cache based on the related information of the unique identifier stored in the database as output audio of the text to be converted.
According to some embodiments of the application, the method further comprises: if the unique identifier is not stored, converting the text to be converted into audio data as the output audio; and storing the audio data and related information of the audio data.
According to some embodiments of the application, converting the text to be converted into audio data as the output audio comprises: the text to be converted is input into a speech synthesis module to be converted into audio data as the output audio.
According to some embodiments of the application, the speech synthesis module comprises a TTS engine.
According to some embodiments of the application, storing the audio data and related information of the audio data comprises: the audio data is stored in a buffer and relevant information of the audio data is stored in a database.
According to some embodiments of the application, wherein the unique identifier comprises: a first field indicating the content of the text to be converted, and a second field indicating at least one of a type of a speech synthesis module, a user preferred speech rate, and a tone color.
According to some embodiments of the application, the method further comprises: if it is determined that the unique identifier is not stored, determining whether a second identifier matching the first field of the unique identifier is stored; and if it is determined that the second identifier is stored, searching audio data corresponding to the second identifier as the output audio.
According to some embodiments of the application, the method further comprises: and if the second identifier is not stored, converting the text to be converted into audio data as the output audio.
According to some embodiments of the application, the method further comprises: determining whether the text to be converted is successfully converted into audio data or not; playing the output audio if the text to be converted is determined to be successfully converted into the audio data; and if the text to be converted is determined to be unsuccessful in converting the text to be converted into the audio data, acquiring preset specific audio data as the output audio, wherein the specific audio data indicates that the conversion is unsuccessful and/or guides a user to execute specific operations.
According to some embodiments of the application, the related information comprises: a starting position of the audio data in the buffer memory and a length of the audio data.
There is also provided, in accordance with some embodiments of the present application, an apparatus for text-to-speech conversion. The apparatus includes: a memory having computer program code stored thereon; and a processor configured to run the computer program code to perform the method as described above.
According to some embodiments of the present application, there is also provided a computer readable storage medium having stored thereon computer program code which, when executed, performs a method as described above.
According to some embodiments of the application there is also provided a chip circuit comprising a circuit unit configured to perform the method as described above upon power up.
The foregoing description of embodiments of the application has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the technical improvement of the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (10)

1. A method for text-to-speech conversion, comprising:
receiving a text to be converted;
generating a unique identifier for the text to be converted, wherein the unique identifier comprises:
a first field indicating the content of the text to be converted, and
a second field indicating at least one of a type of a speech synthesis module, a user preferred speech rate, and a tone color;
determining whether the unique identifier is stored;
if the unique identifier is determined to be stored, acquiring audio data corresponding to the unique identifier as output audio of the text to be converted;
if it is determined that the unique identifier is not stored, determining whether a second identifier matching the first field of the unique identifier is stored;
if the second identifier is determined to be stored, searching the audio data corresponding to the second identifier as the output audio;
if the second identifier is not stored, converting the text to be converted into audio data serving as the output audio; and
storing the audio data and related information of the audio data.
2. The method of claim 1, wherein determining whether the unique identifier is stored comprises:
determining whether the unique identifier is maintained in the database, and wherein
The obtaining of the audio data corresponding to the unique identifier as the output audio of the text to be converted comprises:
and acquiring audio data corresponding to the unique identifier from a cache based on the related information of the unique identifier stored in the database as output audio of the text to be converted.
3. The method of claim 1, wherein converting the text to be converted into audio data as the output audio comprises:
the text to be converted is input into a speech synthesis module to be converted into audio data as the output audio.
4. A method as claimed in claim 3, wherein the speech synthesis module comprises a TTS engine.
5. The method of claim 1, wherein storing the audio data and related information for the audio data comprises:
the audio data is stored in a buffer and relevant information of the audio data is stored in a database.
6. The method of claim 1, further comprising:
determining whether the text to be converted is successfully converted into audio data or not;
playing the output audio if the text to be converted is determined to be successfully converted into the audio data; and
and if the text to be converted is not successfully converted into the audio data, acquiring preset specific audio data as the output audio, wherein the specific audio data indicates that the conversion is unsuccessful and/or guides a user to execute specific operations.
7. The method of claim 2 or 5, wherein the related information comprises: a starting position of the audio data in the buffer memory and a length of the audio data.
8. An apparatus for text-to-speech conversion, comprising:
a memory having computer program code stored thereon; and
a processor configured to run the computer program code to perform the method of any of claims 1 to 7.
9. A computer readable storage medium having stored thereon computer program code which, when executed, performs the method of any of claims 1 to 7.
10. A chip circuit comprising a circuit unit configured to perform the method of any of claims 1 to 7 upon power up.
CN202010498289.XA 2020-06-04 2020-06-04 Method, apparatus, chip circuit and medium for text-to-speech conversion Active CN111667815B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010498289.XA CN111667815B (en) 2020-06-04 2020-06-04 Method, apparatus, chip circuit and medium for text-to-speech conversion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010498289.XA CN111667815B (en) 2020-06-04 2020-06-04 Method, apparatus, chip circuit and medium for text-to-speech conversion

Publications (2)

Publication Number Publication Date
CN111667815A CN111667815A (en) 2020-09-15
CN111667815B true CN111667815B (en) 2023-09-01

Family

ID=72385997

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010498289.XA Active CN111667815B (en) 2020-06-04 2020-06-04 Method, apparatus, chip circuit and medium for text-to-speech conversion

Country Status (1)

Country Link
CN (1) CN111667815B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1788305A (en) * 2003-06-19 2006-06-14 国际商业机器公司 System and method for configuring voice readers using semantic analysis
CN101354840A (en) * 2008-09-08 2009-01-28 众智瑞德科技(北京)有限公司 Method and apparatus for performing voice reading control of electronic book
US9646601B1 (en) * 2013-07-26 2017-05-09 Amazon Technologies, Inc. Reduced latency text-to-speech system
CN107480159A (en) * 2016-12-02 2017-12-15 广东小天才科技有限公司 The input method and device of a kind of speech data
CN109658917A (en) * 2019-01-17 2019-04-19 深圳壹账通智能科技有限公司 E-book chants method, apparatus, computer equipment and storage medium
CN110503991A (en) * 2019-08-07 2019-11-26 Oppo广东移动通信有限公司 Voice broadcast method, device, electronic equipment and storage medium
CN110853616A (en) * 2019-10-22 2020-02-28 武汉水象电子科技有限公司 Speech synthesis method, system and storage medium based on neural network

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7653542B2 (en) * 2004-05-26 2010-01-26 Verizon Business Global Llc Method and system for providing synthesized speech
US8751562B2 (en) * 2009-04-24 2014-06-10 Voxx International Corporation Systems and methods for pre-rendering an audio representation of textual content for subsequent playback
US9240180B2 (en) * 2011-12-01 2016-01-19 At&T Intellectual Property I, L.P. System and method for low-latency web-based text-to-speech without plugins
US9378727B2 (en) * 2013-04-27 2016-06-28 Tencent Technology (Shenzhen) Company Limited Method and apparatus for audio playing

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1788305A (en) * 2003-06-19 2006-06-14 国际商业机器公司 System and method for configuring voice readers using semantic analysis
CN101354840A (en) * 2008-09-08 2009-01-28 众智瑞德科技(北京)有限公司 Method and apparatus for performing voice reading control of electronic book
US9646601B1 (en) * 2013-07-26 2017-05-09 Amazon Technologies, Inc. Reduced latency text-to-speech system
CN107480159A (en) * 2016-12-02 2017-12-15 广东小天才科技有限公司 The input method and device of a kind of speech data
CN109658917A (en) * 2019-01-17 2019-04-19 深圳壹账通智能科技有限公司 E-book chants method, apparatus, computer equipment and storage medium
CN110503991A (en) * 2019-08-07 2019-11-26 Oppo广东移动通信有限公司 Voice broadcast method, device, electronic equipment and storage medium
CN110853616A (en) * 2019-10-22 2020-02-28 武汉水象电子科技有限公司 Speech synthesis method, system and storage medium based on neural network

Also Published As

Publication number Publication date
CN111667815A (en) 2020-09-15

Similar Documents

Publication Publication Date Title
CN111667816B (en) Model training method, speech synthesis method, device, equipment and storage medium
CN107016994B (en) Voice recognition method and device
CN111369971B (en) Speech synthesis method, device, storage medium and electronic equipment
US20190096402A1 (en) Method and apparatus for extracting information
JP2020505643A (en) Voice recognition method, electronic device, and computer storage medium
US11586689B2 (en) Electronic apparatus and controlling method thereof
CN111489735B (en) Voice recognition model training method and device
CN104573099A (en) Topic searching method and device
CN109616096A (en) Construction method, device, server and the medium of multilingual tone decoding figure
KR102140391B1 (en) Search method and electronic device using the method
US20240029709A1 (en) Voice generation method and apparatus, device, and computer readable medium
KR20210060897A (en) Method and apparatus for processing speech
CN112259089A (en) Voice recognition method and device
JP2023027749A (en) Method and apparatus for determining broadcasting style, equipment, and computer storage medium
CN112133285B (en) Speech recognition method, device, storage medium and electronic equipment
CN113053362A (en) Method, device, equipment and computer readable medium for speech recognition
CN113012683A (en) Speech recognition method and device, equipment and computer readable storage medium
CN111667815B (en) Method, apparatus, chip circuit and medium for text-to-speech conversion
KR102621436B1 (en) Voice synthesizing method, device, electronic equipment and storage medium
CN111783433A (en) Text retrieval error correction method and device
CN112836476B (en) Summary generation method, device, equipment and medium
CN111353035B (en) Man-machine conversation method and device, readable storage medium and electronic equipment
US10726211B1 (en) Automated system for dynamically generating comprehensible linguistic constituents
CN112820280A (en) Generation method and device of regular language model
JPWO2009041220A1 (en) Abbreviation generation apparatus and program, and abbreviation generation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant