WO2022111241A1

WO2022111241A1 - Data generation method and apparatus, readable medium and electronic device

Info

Publication number: WO2022111241A1
Application number: PCT/CN2021/128308
Authority: WO
Inventors: 顾宇
Original assignee: 北京有竹居网络技术有限公司
Priority date: 2020-11-26
Filing date: 2021-11-03
Publication date: 2022-06-02
Also published as: CN112487797A; CN112487797B

Abstract

A data generation method and apparatus, a readable medium, and an electronic device. The method comprises: acquiring, from graphemes included in an initial pronunciation dictionary, grapheme sets conforming to target parts of speech (11); for each target part of speech, determining, from the grapheme set conforming to the target part of speech, at least one keyword corresponding to the target part of speech (12); combining keywords according to a preset grapheme combination mode, so as to obtain a plurality of combined graphemes, wherein the preset grapheme combination mode comprises combining the keywords belonging to the same target part of speech and combining keywords belonging to different target parts of speech (13); and determining a phoneme sequence corresponding to each combined grapheme, so as to generate a mapping relationship between the combined grapheme and the phoneme sequence (14). Therefore, a new combined grapheme can be automatically generated, a phoneme sequence capable of representing the pronunciation of the combined grapheme can be automatically obtained, and manual construction is not needed. In addition, the generated combined grapheme and the phoneme sequence thereof can also be used in augmentation training of a model, so as to improve the generalization capability of the model.

Description

Data Generation Method, Apparatus, Readable Medium, and Electronic Device This application requires a Chinese patent application filed on November 26, 2020, with the application number of 202011355899.0 and titled "Data Generation Method, Device, Readable Medium, and Electronic Device" Priority, the entire contents of which are incorporated herein by reference. FIELD OF THE DISCLOSURE The present disclosure relates to the field of data processing, and in particular, to a data generation method, apparatus, readable medium, and electronic device. BACKGROUND In a speech synthesis scenario, it is usually necessary to determine the phonemes of the text for a piece of text, and then implement pronunciation according to the phonemes, which is an important part of the speech synthesis front-end, referred to as G2P (Grapheme-to-Phoneme, word-to-phoneme). In the related art, a pronunciation dictionary (also called a pronunciation dictionary) is generally used to query the phonemes that can represent the pronunciation of a word, wherein the pronunciation dictionary contains a set of words that can be processed by a speech synthesis system, and indicates its pronunciation. SUMMARY This Summary section is provided to introduce concepts in a simplified form that are described in detail in the Detailed Description section that follows. This summary section is not intended to identify key features or essential features of the claimed technical solution, nor is it intended to limit the scope of the claimed technical solution. In a first aspect, the present disclosure provides a data generation method, the method comprising: obtaining a word set that is consistent with a target part of speech from words contained in an initial pronunciation dictionary; At least one keyword corresponding to the target part of speech is determined from the set of words that match the part of speech; the keywords are combined according to a preset word combination to obtain a plurality of combined words, wherein the preset word combination The method includes combining keywords belonging to the same target part of speech and combining keywords belonging to different target parts of speech; determining the phoneme sequence corresponding to each combined word to generate a mapping relationship between the combined word and the phoneme sequence. In a second aspect, the present disclosure provides a data generation device, the device includes: a first acquisition module, configured to acquire a word set that matches a target part of speech from words included in an initial pronunciation dictionary; a first determination module, For each target part-of-speech, from a set of words that match the target part-of-speech, at least one keyword corresponding to the target part-of-speech is determined; a combination module is used to combine all words according to a preset word combination. Combining the described keywords to obtain a plurality of combined words, wherein the preset word combination method includes combining keywords belonging to the same target part of speech to and combining keywords belonging to different target parts of speech; the second determining module is used to determine the phoneme sequence corresponding to each combined word, so as to generate a mapping relationship between the combined word and the phoneme sequence. In a third aspect, the present disclosure provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processing apparatus, implements the steps of the data generation method described in the first aspect of the present disclosure. In a fourth aspect, the present disclosure provides an electronic device, comprising: a storage device on which a computer program is stored; and a processing device for executing the computer program in the storage device to implement the first aspect of the present disclosure The steps of the data generation method. In a fifth aspect, the present disclosure provides a computer program, comprising: instructions, when executed by a processor, the instructions cause the processor to perform the steps of the data generation method provided in the first aspect of the present disclosure. In a sixth aspect, the present disclosure provides a computer program product, comprising instructions that, when executed by a processor, cause the processor to perform the steps of the data generation method provided in the first aspect of the present disclosure. BRIEF DESCRIPTION OF THE DRAWINGS The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent with reference to the following detailed description taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that the originals and elements are not necessarily drawn to scale. In the accompanying drawings: FIG. 1 is a flowchart of a data generation method provided according to an embodiment of the present disclosure; FIG. 2 is a data generation method provided according to the present disclosure, for each target part of speech, An exemplary flowchart of the steps of determining at least one keyword corresponding to the target part of speech in the matched word set; FIG. 3 is a flowchart of a data generation method provided by another embodiment of the present disclosure; FIG. 4 is a A block diagram of a data generating apparatus provided according to an embodiment of the present disclosure; FIG. 5 shows a schematic structural diagram of an electronic device suitable for implementing an embodiment of the present disclosure. DETAILED DESCRIPTION Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for the purpose of A more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are only used for exemplary purposes, and are not intended to limit the protection scope of the present disclosure. It should be understood that the various steps described in the method embodiments of the present disclosure may be performed in different orders and/or performed in parallel. Furthermore, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this regard. As used herein, the term "including" and variations thereof are open-ended inclusions, ie, "including but not limited to". The term "based on" is "based at least in part on." The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions of other terms will be given in the description below. It should be noted that concepts such as "first" and "second" mentioned in the present disclosure are only used to distinguish different devices, modules or units, and are not used to limit the order of functions performed by these devices, modules or units or interdependence. It should be noted that the modifications of "one" and "plurality" mentioned in the present disclosure are illustrative rather than restrictive, and those skilled in the art should understand that, unless the context clearly indicates otherwise, they should be understood as "one or a plurality of"multiple". The names of messages or information exchanged between multiple devices in the embodiments of the present disclosure are only used for illustrative purposes, and are not used to limit the scope of these messages or information. The words covered by the existing pronunciation dictionary are limited, and it is often impossible to find the phoneme corresponding to the word, and thus the problem of not being able to recognize the pronunciation of the word occurs. Therefore, G2P errors are often caused, which in turn causes speech synthesis to fail to synthesize the pronunciation of certain words. Among them, words whose pronunciation cannot be obtained from the pronunciation dictionary may be abbreviated as OOV (Out of Vocabulary, unregistered words). The present disclosure provides a data generation method, device, readable medium, and electronic device to construct the above-mentioned mapping relationship between 00V and phonemes, and further, when performing model training with the constructed mapping relationship as training data, it can effectively improve The generalization ability of the model. Through the technical solution of the present disclosure, from the words included in the initial pronunciation dictionary, a set of words that is consistent with the target part of speech is obtained, and then, for each target part of speech, from the set of words that are consistent with the target part of speech, the word set that matches the target is determined. For at least one keyword corresponding to the part of speech, the keywords are combined according to a preset word combination mode to obtain a plurality of combined words, and the phoneme sequence corresponding to each combined word is determined to generate a mapping relationship between the combined word and the phoneme sequence. As a result, a new compound word can be automatically generated based on the words in the initial pronunciation dictionary, and a phoneme sequence that can characterize the pronunciation of the compound word can be automatically obtained without manual participation in the construction process. In addition, the generated compound word and its phoneme sequence can also be It is used in the augmentation training of the model to improve the generalization ability of the model. Other features and advantages of the present disclosure will be described in detail later. FIG. 1 is a flowchart of a data generation method provided according to an embodiment of the present disclosure. As shown in Figure 1, the method may include the following steps: In step 11, from the words contained in the initial pronunciation dictionary, obtain a set of words that is consistent with the target part of speech; In step 12, for each target part of speech, Determine at least one keyword corresponding to the target part-of-speech from the set of words that match the target part-of-speech; in step 13, combine the keywords according to a preset word combination mode to obtain a plurality of combined words; In step 14, the phoneme sequence corresponding to each compound word is determined to generate a mapping relationship between the compound word and the phoneme sequence. The initial pronunciation dictionary contains the words and their pronunciations (represented as phonemes) that the dictionary can handle. The target part of speech may include, but is not limited to, at least one of the following: noun, verb, adjective. Therefore, in step 11, from the words contained in the initial pronunciation dictionary, a set of words that is consistent with the target part of speech is obtained, which is equivalent to extracting the corresponding words from the words contained in the initial pronunciation dictionary for each target part of speech. Words of a part of speech, and constitute a set of words corresponding to the part of speech. For example, if the target part of speech includes nouns, verbs, and adjectives, step 11 is equivalent to extracting nouns from the words contained in the initial pronunciation dictionary to form a noun set, extracting verbs to form a verb set, and extracting adjectives to form an adjective set . Then, in step 12, for each target part of speech, at least one keyword corresponding to the target part of speech is determined from the set of words that match the target part of speech. In a possible implementation, several words may be randomly determined from a set of words that match the target part of speech as at least one keyword corresponding to the target part of speech. In another possible implementation, step 12 may include the following steps, as shown in FIG. 2: In step 21, for each word in the word set that matches the target part-of-speech, determine the word's relative value in the target corpus Word frequency; In step 22, the words corresponding to the largest top N word frequencies are determined as keywords corresponding to the target part of speech. where N is a positive integer. For example, the word frequency of a word in the target corpus can be obtained by the ratio of the number of times the word appears in the target corpus to the total number of words in the target corpus. For another example, the term frequency of a word in the target corpus can be calculated by TF-IDF (Term Frequency-Inverse Document Frequency), where the calculation formula can be as follows: Word frequency in the target corpus = ( TF of the word) * (IDF of the word) = (the number of times the word appears in the target corpus / the total number of words in the target corpus) *lg (the total number of articles contained in the target corpus / the number of articles in which the word appears in the target corpus). After calculating the word frequency corresponding to each word, the words corresponding to the largest top N word frequencies may be determined as keywords corresponding to the target part of speech. In the above manner, the words with higher word frequency in the target corpus are used as keywords. On the one hand, the keywords can more effectively represent the situation of the words corresponding to the target part of speech, and on the other hand, the resources consumed by subsequent data processing can be saved. . Returning to FIG. 1, in step 13, keywords are combined according to a preset word combination mode to obtain a plurality of combined words. The preset word combination method includes at least combining keywords belonging to the same target part of speech and combining keywords belonging to different target parts of speech. For example, if the keywords VI, V2, V3 corresponding to the target part of speech S1 are obtained after processing in step 12, the keywords corresponding to the target part of speech S2 are V4 and V5, and the keyword corresponding to the target part of speech S3 is V6. Then, to combine keywords belonging to the same target part-of-speech, taking the target part-of-speech S1 as an example, the keywords in S1 are combined, for example, the combination is V1V2, V3V2V1 and so on. Combining keywords belonging to different target parts of speech, taking the target parts of speech S2 and S3 as an example, is to combine keywords in S2 and S3, for example, to combine V4V6, V5V6 and so on. In addition, the preset word method can also be a combination of keywords belonging to the same target part of speech and keywords belonging to different target parts of speech. For example, for the parts of speech SI, S2, and S3 in the above example, they can be combined into V1V2V4V6 and so on. In a possible implementation, step 13 may include at least one of the following: combining a first preset number of keywords belonging to different target parts of speech to obtain a combined word; combining a second preset number of keywords belonging to the same target part of speech Set a number of keywords to combine to obtain combined words. For example, two keywords whose part of speech is a noun can be combined to obtain a combined word. In this example, the second preset number is 2, and the target part of speech is a noun. For another example, one keyword can be selected from each of the noun and the adjective to be combined to obtain a combined word. In this example, the first preset number is 2, and the target parts of speech are noun and adjective respectively. At the same time, the sequence of each keyword during combination is different, and different combined words can also be obtained. For example, if keyword A and keyword B are combined, two combined words AB and BA can be obtained. In another possible implementation manner, at least one of a word prefix or a word suffix may also be obtained, and, in this implementation manner, step 13 may include at least one of the following: combining the word prefix with the keyword The combination is performed in a front-to-back order to obtain a compound word; the keywords and the word suffix are combined in a front-to-back order to obtain a compound word. For example, word prefixes and word suffixes may be summed up by relevant personnel according to the words contained in the initial pronunciation dictionary, and the pronunciation of these word prefixes and word suffixes may also be obtained from the initial pronunciation dictionary. For another example, word prefixes and word suffixes can also be obtained directly from places that can provide word prefix and word suffix information. In this example, when word prefixes and word suffixes are obtained, word prefixes and word suffixes can also be obtained together. corresponding pronunciation. In general, the word prefix is located at the beginning of the word. Therefore, when obtaining a compound word, it is necessary to associate the word prefix with the keyword Combine in first-to-last order. For example, the word prefix C and the keyword D can be combined into a compound word CD. At the same time, in general, the word suffix is located at the end of the word. Therefore, when obtaining a compound word, it is necessary to combine the keyword and the word suffix in a first-to-last order. For example, the keyword E and the word suffix F can be combined into the compound word EF. In addition, after step 13, the method provided by the present disclosure may further include the following steps: if there is a combined word that cannot form a syllable, delete the combined word that cannot form a syllable from a plurality of combined words. In the compound words formed in step 13, there may be compound words that cannot form syllables, and such compound words are meaningless for subsequent data processing. Therefore, such compound words can be deleted from multiple compound words, instead of It will be processed in the subsequent step 14. There are many ways to judge whether a syllable can be formed, and therefore, some judgment conditions can be preset to judge whether a compound word can form a syllable. For example, in general, two consonants appearing at the same time cannot be pronounced. Therefore, the judgment condition can be set as whether there are adjacent consonants in the combined word. If there are adjacent consonants, it can be determined that the combined word cannot form a syllable. , which in turn is removed from the compound word. In the above manner, the unpronounceable compound word is deleted from the multiple compound words, which can save subsequent data processing overhead and avoid meaningless waste of computing resources. In step 14, the phoneme sequence corresponding to each compound word is determined to generate a mapping relationship between the compound word and the phoneme sequence. Exemplarily, step 14 may include the following steps: for each combined word, perform the following operations: from the initial pronunciation dictionary, obtain the initial phonemes corresponding to each word constituting the combined word; arrange the initial phonemes according to the arrangement of the words in the combined word The combination is performed in order to obtain the phoneme sequence corresponding to the combined word, so as to generate the correspondence between the combined word and the phoneme sequence. For each combined word, since the combined word is composed of words contained in the initial pronunciation dictionary, and its pronunciation is known, therefore, the initial phoneme corresponding to each word that constitutes the combined word can be obtained from the initial pronunciation dictionary, and further, According to the arrangement order of each word in the combined word, the obtained initial phonemes are combined, and then the phoneme sequence corresponding to the combined word is obtained, and the corresponding relationship between the combined word and the phoneme sequence is generated. For example, if the combined word W1W2W3, wherein the pronunciation phoneme corresponding to W1 is P1, the pronunciation phoneme corresponding to W2 is P2, and the pronunciation phoneme corresponding to W3 is P3, then the phoneme sequence corresponding to the combination word W1W2W3 is P1P2P3. Through the above technical solution, from the words contained in the initial pronunciation dictionary, a set of words that is consistent with the target part of speech is obtained, and then, for each target part of speech, from the set of words that are consistent with the target part of speech To determine the set of words corresponding to the target part of speech The at least one keyword of the keyword is combined according to a preset word combination mode to obtain a plurality of combined words, and the phoneme sequence corresponding to each combined word is determined to generate a mapping relationship between the combined word and the phoneme sequence. Thus, it is possible to base The words in the initial pronunciation dictionary can automatically generate new combined words, and can automatically obtain the phoneme sequence that can characterize the pronunciation of the combined word without manual participation in the construction process. In addition, the generated combined words and their phoneme sequences can also be used for the model. In augmented training, the generalization ability of the model is improved. Optionally, the method provided by the present disclosure may further include the following steps, as shown in FIG. 3 . In step 31, the generated mapping relationship between the combined word and the phoneme sequence is added to the initial pronunciation dictionary to generate a target pronunciation dictionary. That is to say, the generated mapping relationship between the combined word and the phoneme sequence can be added to the initial pronunciation dictionary to update the initial pronunciation dictionary to the target pronunciation dictionary, and the target pronunciation dictionary can be directly used in subsequent data processing. For example, using the target pronunciation dictionary for model training of speech synthesis can improve the generalization ability of the model. For another example, after using the initial pronunciation dictionary to train the speech synthesis model, the target pronunciation dictionary may also be used to perform augmentation training on the model, so as to fine-tune the model, which is beneficial to obtain a model with better effect. FIG. 4 is a block diagram of a data generating apparatus provided according to an embodiment of the present disclosure. As shown in FIG. 4 , the device 40 includes: a first obtaining module 41, used for obtaining a word set that matches the target part of speech from the words contained in the initial pronunciation dictionary; a first determining module 42, used for each A target part-of-speech, which determines at least one keyword corresponding to the target part-of-speech from a set of words that match the target part-of-speech. Combining to obtain a plurality of combined words, wherein the preset word combination method includes combining keywords belonging to the same target part of speech and combining keywords belonging to different target parts of speech; the second determination module 44 is used to determine each The phoneme sequence corresponding to the combined word is used to generate the mapping relationship between the combined word and the phoneme sequence. Optionally, the first determination module 42 includes: a first determination sub-module for determining, for each word in the set of words that match the target part of speech, the word frequency of the word in the target corpus; The second determination sub-module is used to determine the words corresponding to the largest top N word frequencies as keywords corresponding to the target part of speech, where N is a positive integer. Optionally, the combining module 43 includes at least one of the following: a first combining sub-module, configured to combine a first preset number of keywords belonging to different target parts of speech to obtain a combined word; a second combination A sub-module for combining the second preset number of keywords belonging to the same target part of speech to obtain compound words. Optionally, the apparatus 40 further includes: a second obtaining module, configured to obtain at least one of a word prefix or a word suffix; the combining module 43, including at least one of the following: a third combining submodule, is used to combine the word prefix and the keyword in a front-to-back order to obtain a combined word; the fourth combining submodule is used to combine the keyword and the word suffix in a front-to-back order , to get compound words. Optionally, the device 40 further includes: after the combination module combines the keywords according to a preset word combination mode to obtain a plurality of combination words, if there is a combination word that cannot form a syllable, the combination word The compound words that describe the inability to form a syllable are deleted from the plurality of compound words. Optionally, the second determining module 44 is configured to perform the following operations for each of the combined words: from the initial pronunciation dictionary, obtain the initial phonemes corresponding to the words constituting the combined word; The words are combined according to the arrangement order of the combined words to obtain a phoneme sequence corresponding to the combined word, so as to generate a correspondence between the combined word and the phoneme sequence. Optionally, the apparatus 40 further includes: a dictionary generation module, configured to add the generated mapping relationship between the combined word and the phoneme sequence to the initial pronunciation dictionary, so as to generate a target pronunciation dictionary. Regarding the apparatus in the above-mentioned embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment of the method, and will not be described in detail here. 5, which shows a schematic structural diagram of an electronic device 600 suitable for implementing an embodiment of the present disclosure. Terminal devices in the embodiments of the present disclosure may include, but are not limited to, mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablets), PMPs (portable multimedia players), vehicle-mounted terminals (eg, Mobile terminals such as car navigation terminals), etc., and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in FIG. 5 is only an example, and should not impose any limitations on the function and scope of use of the embodiments of the present disclosure. As shown in FIG. 5, the electronic device 600 may include a processing device (eg, a central processing unit, a graphics processor, etc.) 601, which may be loaded into random access according to a program stored in a read only memory (ROM) 602 or from a storage device 608 A program in the memory (RAM) 603 executes various appropriate actions and processes. In the RAM 603, various programs and data necessary for the operation of the electronic device 600 are also stored. The processing device 601 , the ROM 602 and the RAM 603 are connected to each other through a bus 604 . An input/output (I/O) interface 605 is also connected to bus 604 . Typically, the following devices can be connected to the I/O interface 605: Input devices 606 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration The output device 607 of the device, etc.; the storage device 608 including, for example, a magnetic tape, a hard disk, etc.; and the communication device 609. Communication means 609 may allow electronic device 600 to communicate wirelessly or by wire with other devices to exchange data. While FIG. 5 shows electronic device 600 having various means, it should be understood that not all of the illustrated means are required to be implemented or available. More or fewer devices may alternatively be implemented or provided. In particular, according to embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network via the communication device 609 , or from the storage device 608 , or from the ROM 602 . When the computer program is executed by the processing device 601, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed. It should be noted that the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or any combination of the above. More specific examples of computer readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the above. In the present disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present disclosure, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device . Program code embodied on a computer-readable medium may be transmitted using any suitable medium, including but not limited to: wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing. In some embodiments, the server can communicate using any currently known or future developed network protocol such as HTTP (HyperText Transfer Protocol), and can communicate with digital data in any form or medium (eg, , communication network) interconnection. Examples of communication networks include local area networks ("LAN"), wide area networks ("WAN"), the Internet (eg, the Internet), and peer-to-peer networks (eg, ad hoc peer-to-peer networks), and any currently known or future developed networks. The above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or may exist alone without being assembled into the electronic device. The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device causes the electronic device to: From the words contained in the initial pronunciation dictionary, obtain words that match the target part of speech set; for each target part of speech, determine at least one keyword corresponding to the target part of speech from a set of words that match the target part of speech; combine the keywords according to a preset word combination mode to obtain a plurality of combined words, wherein the preset word combination mode includes combining keywords belonging to the same target part of speech and combining keywords belonging to different target parts of speech; determining the phoneme sequence corresponding to each combined word to generate a combination The mapping relationship between words and phoneme sequences. Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages such as Java, Smalltalk, C++, and This includes conventional procedural programming languages such as "C" or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through the Internet connect). The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagram may represent a module, program segment, or part of code that contains one or more logic functions for implementing the specified executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block in the block diagrams and/or flowcharts, and combinations of blocks in the block diagrams and/or flowcharts, can be implemented with dedicated hardware-based systems that perform the specified functions or operations , or can be implemented using a combination of dedicated hardware and computer instructions. The modules involved in the embodiments of the present disclosure may be implemented in software or hardware. Wherein, the name of the module does not constitute a limitation of the module itself under certain circumstances, for example, the first acquisition module can also be described as "from the words contained in the initial pronunciation dictionary, acquire words that match the target part of speech. A collection of modules". The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs) Application Specific Standard Products (ASSPs) System on Chips (SOCs) Complex Programmable Logic Devices ( CPLD) and so on. In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. According to one or more embodiments of the present disclosure, a method for generating data is provided. The method includes: obtaining a word set consistent with a target part-of-speech from words contained in an initial pronunciation dictionary; for each target part-of-speech , determine at least one keyword corresponding to the target part-of-speech from the set of words that match the target part-of-speech; combine the keywords according to a preset word combination mode to obtain a plurality of combined words, wherein , the preset word combination method includes combining keywords belonging to the same target part of speech and combining keywords belonging to different target parts of speech; determining the phoneme sequence corresponding to each combined word to generate a mapping between the combined word and the phoneme sequence relation. According to one or more embodiments of the present disclosure, a data generation method is provided, wherein determining at least one keyword corresponding to the target part-of-speech from a set of words that match the target part-of-speech includes: For each word in the set of words that is consistent with the target part of speech, determine the word frequency of the word in the target corpus; Determine the word corresponding to the largest top N word frequencies as the keyword corresponding to the target part of speech, wherein , where N is a positive integer. According to one or more embodiments of the present disclosure, a data generation method is provided, wherein the keywords are combined according to a preset word combination mode to obtain a plurality of combined words, including at least one of the following : combining a first preset number of keywords belonging to different target parts of speech to obtain a combined word; combining a second preset number of keywords belonging to the same target part of speech to obtain a combined word. According to one or more embodiments of the present disclosure, a data generation method is provided, the method further comprising: Obtaining at least one of a word prefix or a word suffix; combining the keywords according to a preset word combination method to obtain a plurality of combined words, including at least one of the following: combining the word prefix with The keywords are combined in a front-to-back order to obtain a combined word; and the keyword and the word suffix are combined in a front-to-back order to obtain a combined word. According to one or more embodiments of the present disclosure, a data generation method is provided. After the step of combining the keywords according to a preset word combination mode to obtain a plurality of combined words, the method The method also includes: if there is a compound word that cannot form a syllable, deleting the compound word that cannot form a syllable from the plurality of compound words. According to one or more embodiments of the present disclosure, a data generation method is provided. The determining the phoneme sequence corresponding to each combined word to generate a mapping relationship between the combined word and the phoneme sequence includes: for each of the Combining words, perform the following operations: from the initial pronunciation dictionary, obtain the initial phonemes corresponding to each word that constitutes the combined word; combine the initial phonemes according to the arrangement order of the words in the combined word, and obtain the corresponding initial phonemes. The phoneme sequence corresponding to the combination word is described to generate the correspondence between the combination word and the phoneme sequence. According to one or more embodiments of the present disclosure, a data generation method is provided, the method further comprising: adding the generated mapping relationship between the combined word and the phoneme sequence to the initial pronunciation dictionary to generate a target pronunciation dictionary. According to one or more embodiments of the present disclosure, a data generation apparatus is provided, the apparatus includes: a first acquisition module, configured to acquire a word set that matches a target part of speech from words included in an initial pronunciation dictionary ; a first determination module for determining at least one keyword corresponding to the target part-of-speech from a set of words consistent with the target part-of-speech for each target part-of-speech; A word combination method, combining the keywords to obtain a plurality of combined words, wherein the preset word combination method includes combining keywords belonging to the same target part of speech and combining keywords belonging to different target parts of speech; The second determination module is used to determine the phoneme sequence corresponding to each combined word, so as to generate the mapping relationship between the combined word and the phoneme sequence. According to one or more embodiments of the present disclosure, there is provided a computer-readable medium on which a computer program is stored, and when the computer program is executed by a processing apparatus, implements the steps of the data generation method described in any embodiment of the present disclosure. According to one or more embodiments of the present disclosure, an electronic device is provided, including: A storage device, on which a computer program is stored; and a processing device, configured to execute the computer program in the storage device, so as to implement the steps of the data generation method described in any embodiment of the present disclosure. According to one or more embodiments of the present disclosure, there is provided a computer program, comprising: instructions that, when executed by a processor, cause the processor to perform the steps of the data generation method according to any embodiment of the present disclosure . According to one or more embodiments of the present disclosure, there is provided a computer program product comprising instructions that, when executed by a processor, cause the processor to perform the steps of the data generation method described in any embodiment of the present disclosure . The above description is merely a preferred embodiment of the present disclosure and an illustration of the technical principles employed. Those skilled in the art should understand that the scope of disclosure involved in the present disclosure is not limited to the technical solutions formed by the specific combination of the above-mentioned technical features, and should also cover, without departing from the above-mentioned disclosed concept, the above-mentioned technical features or Other technical solutions formed by any combination of its equivalent features. For example, a technical solution is formed by replacing the above features with the technical features disclosed in the present disclosure (but not limited to) with similar functions. Additionally, although operations are depicted in a particular order, this should not be construed as requiring that the operations be performed in the particular order shown or in a sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, although the above discussion contains several implementation-specific details, these should not be construed as limitations on the scope of the present disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Although the subject matter has been described in language specific to structural features and/or logical acts of method, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are merely example forms of implementing the claims. Regarding the apparatus in the above embodiment, the specific manner in which each module performs operations has been described in detail in the embodiment of the method, and will not be described in detail here.

Claims

Rights request

1. A data generation method, comprising: from words contained in an initial pronunciation dictionary, obtaining a set of words that is consistent with a target part of speech; for each target part of speech, determining from a set of words that are consistent with the target part of speech At least one keyword corresponding to the target part of speech is obtained; according to a preset word combination mode, the keywords are combined to obtain a plurality of combined words, wherein the preset word combination mode includes combining the words belonging to the same target part of speech The keywords are combined and the keywords belonging to different target parts of speech are combined; the phoneme sequence corresponding to each combined word is determined to generate a mapping relationship between the combined word and the phoneme sequence.

2. The data generation method according to claim 1, wherein the determining at least one keyword corresponding to the target part-of-speech from the set of words that match the target part-of-speech comprises: Determine the word frequency of the word in the target corpus for each word in the set of words that match the part of speech; Determine the word corresponding to the largest top N word frequencies as the keyword corresponding to the target part of speech, where N is positive Integer.

3. The data generation method according to claim 1, wherein the combination of the keywords according to a preset word combination mode to obtain a plurality of combined words, including at least one of the following: will belong to different A first preset number of keywords of the target part of speech are combined to obtain a combined word; and a second preset number of keywords belonging to the same target part of speech are combined to obtain a combined word.

4. The data generation method according to claim 1, further comprising: obtaining at least one of a word prefix or a word suffix; and combining the keywords according to a preset word combination to obtain multiple combinations A word, including at least one of the following: combining the word prefix and the keyword in a front-to-back order to obtain a combined word; combining the keyword and the word suffix in a front-to-back order , to get compound words.

5. The data generation method according to claim 1, wherein, after the step of combining the keywords according to a preset word combination mode to obtain a plurality of combined words, the method further comprises: If there is a compound word that cannot form a syllable, the compound word that cannot form a syllable is deleted from the plurality of compound words.

6. The data generation method according to claim 1, wherein the determining the phoneme sequence corresponding to each combined word to generate a mapping relationship between the combined word and the phoneme sequence comprises: for each of the combined words, executing The following operations are performed: from the initial pronunciation dictionary, obtain the initial phonemes corresponding to the words constituting the combined word; combine the initial phonemes according to the arrangement order of the words in the combined word, and obtain the corresponding initial phonemes of the combined word to generate the correspondence between the combined word and the phoneme sequence.

7. The data generation method according to claim 1, further comprising: adding the generated mapping relationship between the combined word and the phoneme sequence to the initial pronunciation dictionary to generate a target pronunciation dictionary.

8. A data generation device, comprising: a first acquisition module for acquiring a set of words that is consistent with a target part of speech from words contained in an initial pronunciation dictionary; a first determination module for each target part of speech , determine at least one keyword corresponding to the target part-of-speech from the set of words that are consistent with the target part-of-speech; a combination module, configured to combine the keywords according to a preset word combination mode to obtain multiple a combination word, wherein the preset word combination method includes combining keywords belonging to the same target part of speech and combining keywords belonging to different target parts of speech; the second determination module is used to determine the phoneme sequence corresponding to each combination word , to generate the mapping relationship between compound words and phoneme sequences.

9. A computer-readable medium on which a computer program is stored, and when the computer program is executed by a processing device, implements the steps of the data generation method according to any one of claims 1-7.

10. An electronic device, comprising: a storage device on which a computer program is stored; a processing device for executing the computer program in the storage device to realize the data in any one of claims 1-7 Generate the steps of the method.

11. A computer program comprising instructions which, when executed by a processor, cause the processor to perform the steps of the data generation method according to any one of claims 1-7.

12. A computer program product comprising instructions which, when executed by a processor, cause the processor to perform the steps of the data generation method according to any of claims 1-7.