WO2021127987A1 - 多音字预测方法及消歧方法、装置、设备及计算机可读存储介质 - Google Patents

多音字预测方法及消歧方法、装置、设备及计算机可读存储介质 Download PDF

Info

Publication number
WO2021127987A1
WO2021127987A1 PCT/CN2019/127956 CN2019127956W WO2021127987A1 WO 2021127987 A1 WO2021127987 A1 WO 2021127987A1 CN 2019127956 W CN2019127956 W CN 2019127956W WO 2021127987 A1 WO2021127987 A1 WO 2021127987A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
polyphonic
word
character
segmentation result
Prior art date
Application number
PCT/CN2019/127956
Other languages
English (en)
French (fr)
Inventor
白洛玉
李贤�
张皓
黄东延
丁万
熊友军
Original Assignee
深圳市优必选科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市优必选科技股份有限公司 filed Critical 深圳市优必选科技股份有限公司
Priority to PCT/CN2019/127956 priority Critical patent/WO2021127987A1/zh
Priority to CN201980003196.0A priority patent/CN113302683B/zh
Publication of WO2021127987A1 publication Critical patent/WO2021127987A1/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • G10L13/10Prosody rules derived from text; Stress or intonation

Definitions

  • This application relates to the technical field of speech synthesis, and in particular to a polyphonic word prediction method, a polyphonic word disambiguation method, a polyphonic word prediction device, a polyphonic word disambiguation device, computer equipment, and a computer-readable storage medium.
  • Polyphonic characters refer to a language basic unit such as Chinese characters, words, etc., with two or more pronunciations, for example, homographs, homographs and words. Different pronunciations usually express different semantics and usage.
  • the phenomenon of polyphones is more common in the corpus, and there are many phonemes, a wide range of causes, and the existing corpus coverage is limited.
  • the difference in pronunciation of polyphones directly affects the understanding and accuracy of polyphone texts. Therefore, the prediction and accuracy of polyphones Disambiguation is particularly important.
  • Prediction and disambiguation of polyphonic characters are techniques to predict and obtain the correct pronunciation of polyphonic characters when determining the pronunciation of a text.
  • the existing polyphone prediction and disambiguation methods have the following problems:
  • Non-polyphone as the prediction category, and output the prediction result of a multi-element sequence.
  • Non-polyphone words are likely to cause classification interference and complicated coding and decoding.
  • this application develops a multi-phone word prediction method and a disambiguation method that can use long-distance multi-phone word context information, and can construct a multi-element sequence to a unique prediction result, and also provides a multi-phone word A prediction device and a disambiguation device, as well as a computer device and a computer-readable storage medium capable of realizing the above polyphone word disambiguation method.
  • One of the technical means adopted in this application is to provide a method for predicting polyphonic characters, including:
  • the feature vector of the above text, the feature vector of the polyphonic text, and the feature vector of the following text are input into a polyphonic word prediction model to obtain a polyphonic word prediction result;
  • the polyphonic word prediction model includes a first neural network module, The second neural network module and the third neural network module; the first neural network module inputs the feature vector of the above text and obtains a first output vector, and the second neural network module inputs the feature of the polyphone text Vector and obtain the second output vector, the third neural network module inputs the feature vector of the following text and obtains the third output vector;
  • the polyphonic word prediction result is obtained by combining the first output vector and the second output vector The vector and the third output vector are spliced to obtain;
  • the polyphonic character prediction result includes the pronunciation probability of each pronunciation of the polyphonic character; the pronunciation of the polyphonic character in the text to be predicted is determined based on the pronunciation probability of each pronunciation of the polyphonic character.
  • Another technical means adopted in this application is to provide a method for disambiguation of polyphonic characters, including:
  • the polysyllabic character segmentation result refers to the segmentation result containing the polysyllabic character
  • the polyphonic word segmentation result is used as the text to be predicted, and the polyphonic word prediction method is used to compare all the results. Predict the result of polyphonic word segmentation.
  • Another technical method adopted in this application is to provide a polyphonic word prediction device, including:
  • a text acquisition module for acquiring the polyphone text in the text to be predicted, and the above text and/or the following text of the polyphone text in the text to be predicted;
  • the vector construction module is used to construct one or more feature vectors corresponding to each of the polyphonic text, the above text, and the below text;
  • the model prediction module is used to input the feature vector of the above text, the feature vector of the polyphone text, and the feature vector of the text below into a polyphone prediction model to obtain a polyphone prediction result;
  • the polyphone prediction model includes The first neural network module, the second neural network module, and the third neural network module;
  • the first neural network module inputs the feature vector of the above text and obtains the first output vector, and the second neural network module inputs the The feature vector of the polyphonic character text and obtain a second output vector,
  • the third neural network module inputs the feature vector of the following text and obtains the third output vector;
  • the prediction result of the polyphonic character includes each of the polyphonic characters The pronunciation probability of the pronunciation, and is obtained by concatenating the first output vector, the second output vector, and the third output vector;
  • the pronunciation determining module is configured to determine the pronunciation of the polyphonic character in the text to be predicted based on the pronunciation probability of each pronunciation of the polyphonic character.
  • Another technical means adopted in this application is to provide a multi-phone word disambiguation device, including:
  • the text segmentation module is used to segment the text to be disambiguated to obtain multiple segmentation results
  • the polyphonic character judgment module is used to judge whether each of the word segmentation results contains polyphonic characters
  • the word length determination module is used to determine whether the word length of the polysyllabic character segmentation result is greater than the preset word length; the polysyllabic character segmentation result refers to the word segmentation result containing the polysyllabic character;
  • the dictionary query module is configured to query a preset dictionary and determine whether the polyphonic word segmentation result exists in the preset dictionary when the word length of the polyphonic character segmentation result is greater than the preset word length;
  • the rule base verification module is used to find in the preset rule base whether there is a match with the feature information of the polyphonic character segmentation result in the case that the polyphonic character segmentation result does not exist in the preset dictionary Result;
  • the polyphonic character prediction device is used to segment the polyphonic character when there is no result that matches the feature information of the polyphonic word segmentation result in the preset rule library The result is used as the text to be predicted, and the word segmentation result of the polyphonic character is predicted.
  • Another technical means adopted in this application is to provide a computer device, including: a processor and a memory, the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the foregoing The steps of the polyphone prediction method.
  • Another technical means adopted in this application is to provide a computer device, including: a processor and a memory, the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the foregoing The steps of the disambiguation method for polyphonic characters.
  • Another technical means adopted in this application is to provide a computer-readable storage medium storing a computer program, and when the computer program is executed by a processor, the processor causes the processor to execute the steps of the above polyphonic word prediction method.
  • Another technical means adopted in this application is to provide a computer-readable storage medium storing a computer program, and when the computer program is executed by a processor, the processor causes the processor to execute the steps of the above polyphonic word prediction method.
  • the polyphonic word prediction method and disambiguation method, device, equipment, and computer-readable storage medium provided by the present application can obtain, feature utilization, and model prediction of long-distance context information of polyphonic words, which is beneficial to Improve the accuracy of pronunciation prediction for polyphonic characters.
  • the prediction result is the probability of each pronunciation of the polyphonic character.
  • the non-polyphonic character is not used as the prediction category, which can effectively avoid the interference of classification, and the coding and decoding are easy to implement.
  • Fig. 1 is a schematic diagram of the implementation process of a polyphone prediction method in an embodiment of the present application
  • FIG. 2 is a diagram of an implementation example of a polyphone prediction method in an embodiment of the present application
  • FIG. 3 is a schematic diagram of the implementation flow of step S30 in an embodiment of the present application.
  • FIG. 4 is a schematic diagram of the implementation flow of step S302 in an embodiment of the present application.
  • FIG. 5 is a diagram of an implementation example of step S302 in an embodiment of the present application.
  • FIG. 6 is a diagram of an implementation example of a training step of a polyphone prediction model in an embodiment of the present application
  • FIG. 7 is a schematic diagram of the implementation process of a method for disambiguation of polyphonic characters in an embodiment of the present application.
  • FIG. 8 is a structural block diagram of a polyphone word prediction device in an embodiment of the present application.
  • FIG. 9 is a structural block diagram of a polyphone word disambiguation device in an embodiment of the present application.
  • Figure 10 is a structural block diagram of a computer device in an embodiment of the present application.
  • Fig. 11 is an example diagram of output vectors in an embodiment of the present application.
  • a method for predicting polyphones is provided.
  • the execution body of the method for predicting polyphones is a device capable of implementing the method for predicting polyphones.
  • the device may include, but is not limited to, a terminal and a server. It may include, but is not limited to, desktop terminals and mobile terminals.
  • Desktop terminals include desktop computers, and mobile terminals include but are not limited to mobile phones, tablets, and notebook computers; servers include high-performance computers and high-performance computer clusters.
  • the polyphonic word prediction method, as shown in FIG. 1, may specifically include the following steps:
  • Step S20 Obtain the polyphone text in the text to be predicted, and the above text and/or the following text of the polyphone text in the text to be predicted.
  • the text to be predicted refers to a text containing one or more polyphones, and the polyphones may have two or more pronunciations.
  • the polyphonic character can be a Chinese character such as "Biography”, “chuán” in “Legend”, “zhuàn” in “Biography”, or a word, such as severe injury, which means “severe injury or death.” , Damage”, you can read “Zh ⁇ ng chu ⁇ ng”, when it means “re-founding”, you can read “Chóng "chuàng” can also be English words or other languages, vocabulary, sentences, etc. with two or more pronunciations.
  • the polyphonic text refers to the polyphonic character itself, for example, "Xiaomingshe (sh ⁇ ) must not leave Shenzhen", the polyphonic text here is " ⁇ ", and the above text refers to the polyphonic character located in the text to be predicted
  • the text in front of the text is "Xiao Ming”
  • the following text refers to the text behind the polyphonic text in the to-be-predicted text, and the following text here is "Do not leave Shenzhen".
  • step 20 obtains the polyphone text and the following text of the polyphone text. If the polyphone text is located at the end of the text to be predicted, the polyphone text has only the above text, and no following text.
  • step 20 obtains the polyphone text and the above text of the polyphone text.
  • the polyphone text is located in the middle of the text to be predicted, the polyphone text has the above text in front and the following text after it.
  • step 20 obtains the polyphone text and the above text of the polyphone text. And the text below.
  • Step S30 construct one or more feature vectors corresponding to each of the polyphone text, the above text, and the following text.
  • the polyphonic text is word-by-character to obtain the feature vector of each word
  • the above text is word-by-word to obtain the feature vector of each word
  • the following text is word-by-word to obtain the feature vector of each word.
  • the polyphonic text, the above text, or the following text can contain one word or multiple words, for example, "Xiaomingshe (sh ⁇ ) must not leave Shenzhen", here the polyphonic text " ⁇ " contains one Characters, then construct the feature vector corresponding to the polyphonic text, that is, construct the feature vector of "Shou”, the above text is that " ⁇ " contains two characters, then construct the feature vector corresponding to the above text, which is Construct the feature vector of the "small” character and the feature vector of the " ⁇ " character.
  • the feature vector corresponding to the following text that is, construct the feature of the "bu” character Vector, the eigenvector of the character “ ⁇ ”, the eigenvector of the character “Li”, the eigenvector of the character “Kai”, the eigenvector of the character “Shen”, and the eigenvector of the character "Zhen".
  • the multiple feature vectors corresponding to a word are formed into a composite vector, and the polyphonic text, the above text, or the
  • the composite vector of the multiple words can be input in the form of a vector matrix at most according to the sequence of the polyphone text, the above text, or the following text in the to-be-predicted text.
  • Phonetic prediction model when there are multiple feature vectors corresponding to a word, the multiple feature vectors corresponding to a word are formed into a composite vector, and the polyphonic text, the above text, or the composite vector of the multiple words can be input in the form of a vector matrix at most according to the sequence of the polyphone text, the above text, or the following text in the to-be-predicted text. Phonetic prediction model.
  • the feature vector can be a character vector, a part-of-speech vector of a character, a part-of-speech vector of a character before a character or a part-of-speech vector of a word, a postscript or a part-of-speech vector of a word, a position vector of the character, etc., of course, it can also be a polyphone text, Other feature vectors of the text above or below.
  • the word vector may be a word vector of each character contained in a polyphonic character text, the above text or the following text.
  • the part-of-speech vector can be a noun, an adjective, a verb, and so on.
  • the position vector of the word may be the relative position of the text where the word is located in the text to be predicted, etc.
  • the step 30 may include:
  • Step S301 Obtain the character feature information of the polyphonic character text, the above text, and the following text respectively;
  • the character feature information includes character information, part-of-speech information of a character, pre-character or part-of-speech information of a word, At least one of the subsequent characters of the word or the part-of-speech information of the word and the position information of the word;
  • Left poses represents left part-of-speech information, that is, the part-of-speech of the preceding character or word of the word or word, which can be represented by na_l, n, v for example.
  • Light poses represents right part-of-speech information, that is, the part of speech of the subsequent character or word of the word or word, which can be represented by v, a, na_r, for example.
  • “Loc” represents the position information of a word, and can be represented by left, mid, and right as an example.
  • Step S302 Convert the character feature information of the polyphone text, the above text, and the below text into corresponding ID information, respectively.
  • the characteristic information of the word "you" contained in the text above includes: the word information "you", the part-of-speech information "n” of the word, and the part-of-speech information of the previous word " "na_l” (indicating that there is no preceding character), the part-of-speech information "v” of the character's last character, and the position information "left” of the character in the text to be predicted.
  • the word2idx, pose2idx, and loc2idx shown in FIG. 5 represent the conversion of feature information to ID information.
  • the step of converting the character feature information of the polyphonic character text, the above text, and the following text into corresponding ID information, respectively may include:
  • step S302A a mapping dictionary between the character feature information and the ID information is established in advance.
  • the mapping dictionary stores the correspondence and the mapping relationship between the feature information of the word and the ID information, and when the feature information of the word is input into the mapping dictionary, it can be obtained from the mapping dictionary To the ID information corresponding to the character information of the word.
  • Step S302B Obtain ID information corresponding to different character feature information based on the mapping dictionary. Different said character feature information has different ID information, which can all be obtained through said mapping dictionary.
  • the ID information is vectorized to obtain one or more feature vectors corresponding to the polyphone text, the above text, and the following text, respectively.
  • the step of vectorizing the ID information may include: converting the ID information corresponding to the word information into a word vector through Word2Vec, the Word2Vecter being a means of converting a word into a vector; and converting the part of speech of the word.
  • the ID information corresponding to the information, the first character of the character or the part-of-speech information of the word, the latter character or the part-of-speech information of the word, and the position information of the character are converted into feature vectors through one-hot encoding, and the unique Hot encoding is the "One-Hot" shown in Figure 5, which is an encoding method that converts feature information into vectors.
  • Step S40 Input the feature vector of the above text, the feature vector of the polyphone text, and the feature vector of the text below into a polyphone prediction model to obtain a polyphone prediction result.
  • the polyphonic word prediction model includes a first neural network module, a second neural network module, and a third neural network module; the first neural network module inputs the feature vector of the above text and obtains a first output vector, the The second neural network module inputs the feature vector of the polyphone text and obtains a second output vector, and the third neural network module inputs the feature vector of the following text and obtains the third output vector; the prediction result of the polyphone is passed The first output vector, the second output vector, and the third output vector are spliced to obtain.
  • the first neural network module and the third neural network module may be a long and short-term memory neural network module (LSTM), and the second neural network module may be a deep neural network module (DNN).
  • LSTM long and short-term memory neural network module
  • DNN deep neural network module
  • Figure 2 shows an example diagram of the implementation of the polyphonic word prediction method in an embodiment of the present application.
  • the to-be-predicted text "excellent must be selected well" passes through the polyphonic word text and the upper After the text text and the following text are obtained and the feature information is expressed, the corresponding feature vector is generated and input to the polyphone prediction model.
  • the polyphone prediction model includes forward LSTM, DNN and backward LSTM to obtain the prediction result of polyphone .
  • FIG. 11 shows an example diagram of the prediction result of a polyphone in an embodiment of the present application. As shown in FIG.
  • This embodiment adopts a long-distance, low-interference network structure. Splicing the context information of polyphonic characters with its own information, fully utilizing the entire sentence information of the text to be predicted, and constructing a network of multi-element sequences to unique prediction results.
  • the prediction result is only the pronunciation of polyphonic characters, which not only ensures the unique output result, but also It can avoid the classification interference of non-polyphone characters and the complexity of encoding and decoding.
  • the prediction model of polyphonic characters is simplified and efficient.
  • the neural network model is used as a unified general classifier, which avoids the problems of large models and high decoding complexity caused by using too many classifiers.
  • multiple training texts containing polyphonic characters may be used as input, and the correct pronunciation of the polyphonic characters contained in the training text may be used as output to train the polyphonic word prediction model.
  • the polyphonic word prediction model can be obtained by training the polyphonic word prediction model including forward LSTM, DNN, and backward LSTM through a large number of training samples with clear pronunciation annotations. During training, the polyphonic word prediction model is first assigned to an initialization model, the training text containing the polyphonic word is input into the polyphonic word prediction model and the polyphonic word prediction result is obtained, and the polyphonic word prediction result is combined with the training text contained in the training text.
  • the prediction result of the polyphone can be calculated by cross-entropy
  • the correct pronunciation of the polyphone contained in the training text can be marked by the One-Hot method
  • the gradient descent method is used to readjust The parameters in the polyphone prediction model are trained for multiple times until the prediction result of the polyphone word is consistent with the correct pronunciation of the polyphone word contained in the training text.
  • the cross-entropy calculation method, One-Hot method, and gradient descent method here can all be replaced by other methods related to neural network model training.
  • the training step of the polyphone prediction model may include:
  • 2Cluster and divide the feature vector data of each training text according to the data length; adjust the data length of the feature vector data of each training text in each cluster to be consistent; adjust the data in each cluster
  • the feature vector data of the training text is input into the polyphonic word prediction model in batches;
  • step 1 and step 2 are performed in parallel, and different training texts can be processed in parallel.
  • the "feature vector data item" shown in FIG. 6 represents the feature vector data of each training text
  • the bucketing operation shown in FIG. 6 represents performing the feature vector data of each training text according to the data length.
  • Clustering specifically, the ones belonging to the shorter data length are grouped together, and the ones belonging to the longer data length are grouped together, that is, the feature vector data of the training text whose data lengths are not significantly different from each other are grouped together, specifically Add the divided feature vector data of the training text to the preset feature queue, and when the feature queue is full, the data length of the feature vector data of each training text in each cluster
  • the adjustment is consistent and then batch input into the polyphonic word prediction model
  • the padding in FIG. 6 refers to the data length adjustment operation
  • the packing refers to the batch input operation.
  • operations such as text extraction and vector construction are processed in parallel with the operations of vector batch input to the multi-phone word prediction model, which can effectively improve efficiency, is suitable for large-scale sample data training, and helps reduce model training cycles.
  • the reliability and efficiency of model training in this embodiment are relatively high.
  • a method for disambiguation of polyphonic characters is also provided, which may include the following steps:
  • step S1 word segmentation is performed on the text to be disambiguated to obtain multiple word segmentation results; the text to be disambiguated may or may not contain polyphonic characters, and may be a sentence, a language text, etc.
  • step S3 is executed to query a preset dictionary to obtain the pronunciation of the word segmentation result;
  • the preset dictionary may be a mapping between words, words, phrases, etc. and pronunciations Dictionary, word library, etc., that is, the pronunciation of the word, word or phrase can be directly found and determined in the preset dictionary;
  • Step S4 in the case that the word segmentation result contains a polysyllabic character, the word length of the polysyllabic character segmentation result is determined, and the word length of the polysyllabic character segmentation result is compared with a preset word length.
  • the polysyllabic word segmentation result refers to a word segmentation result that contains polysyllabic characters; the preset word length can be 1, which can then distinguish whether the polysyllabic character segmentation result is monosyllable or multi-syllable, which is greater than the preset word length
  • the result of multisyllabic word segmentation is multi-syllable, and the result of multisyllabic character segmentation equal to the preset word length is single syllable.
  • the preset word length can be set to other lengths according to specific needs.
  • Step S8 In the case that the polyphonic word segmentation result does not exist in the preset dictionary, search for a result that matches the feature information of the polyphonic word segmentation result in the preset rule library.
  • the preset rule library refers to a library that establishes rules for the correspondence between the feature information of polyphonic characters and the pronunciation of polyphonic characters; specifically, the features in the polyphonic character text can be extracted through statistics and established based on the correct pronunciation of the polyphonic character text Corresponding rules.
  • the feature information of polyphonic characters may include: polyphonic characters, part of speech of polyphonic characters, part of speech of preceding and following characters or words, relative position of polyphonic characters in the text, length of polyphonic characters, etc.
  • a support vector machine SVM
  • the polyphonic character pronunciation may be directly used to label the polyphonic character segmentation result.
  • step S11 is performed to compare the feature information of the preset rule library with the feature information of the polyphonic word segmentation result.
  • the matching result is used as the pronunciation of the polysyllabic word segmentation result;
  • Step S12 in the case that there is no result matching the feature information of the polyphonic word segmentation result in the preset rule library, it means that the preset rule library does not establish a rule for the polyphonic character segmentation result, then
  • the polyphonic character segmentation result is used as the text to be predicted, and the polyphonic character segmentation result is predicted by the polyphonic character prediction method of any one of the above embodiments.
  • dictionary query, rule library verification, deep learning and neural network prediction at least three polyphonic word prediction and disambiguation methods are combined and used in combination with effective logic, which can avoid using a single method in some cases. Limitations in predicting specific words.
  • a combination prediction of a dictionary, a rule library, and a neural network forms a method for disambiguation of polyphonic characters with high accuracy and easy maintenance.
  • step S6 is executed, and in the preset rule base Searching whether there is a result that matches the feature information of the polyphonic word segmentation result;
  • step S9 is executed to compare the feature information in the preset rule library with the feature information of the polyphonic word segmentation result.
  • the matching result is used as the pronunciation of the polysyllabic word segmentation result
  • This embodiment is aimed at the realization process of the prediction of the single-syllable multi-syllabic character whose word length is less than or equal to the preset word length, that is, the prediction of the single-syllable multi-syllabic character.
  • the correct pronunciation corresponding to the word segmentation result of the polyphonic character is supplemented to the preset dictionary and the preset rule library.
  • the correct pronunciation corresponding to the word segmentation result of the polyphonic character is used as a sample to train the polyphonic character prediction model.
  • the correct pronunciation corresponding to the polyphonic word segmentation result can be used as a new polyphonic character sample and prioritized in the preset dictionary and the preset rule library Supplements for quick maintenance.
  • the correct pronunciation corresponding to the polyphonic word segmentation result is used as a new polyphonic character sample to iterate and train the polyphonic word prediction model to achieve a stable improvement of the polyphonic word prediction model.
  • a device for predicting polyphones may include: a text acquisition module, a vector construction module, a model prediction module, and a pronunciation determination module; the text acquisition module is used to acquire The polyphone text in the predictive text, and the above text and/or the following text of the polyphone text in the text to be predicted; the vector construction module is used to construct the polyphone text and the above text One or more feature vectors corresponding to each of the following text; the model prediction module is used to input the feature vector of the above text, the feature vector of the polyphonic text, and the feature vector of the following text into multiple The phonetic word prediction model obtains the prediction result of the polyphonic word; the polyphonic word prediction model includes a first neural network module, a second neural network module, and a third neural network module; the first neural network module inputs the feature vector of the above text And obtain the first output vector, the second neural network module inputs the feature vector of the polyphonic text and obtains the second output vector, and the third neural network
  • a device for disambiguation of polyphonic characters may include: a text segmentation module, a polyphonic character judgment module, a word length determination module, a dictionary query module, a rule base verification module, and The polyphonic word prediction device according to any one of the above embodiments;
  • the text word segmentation module is used to segment the text to be disambiguated to obtain multiple word segmentation results;
  • the polyphonic character judgment module is used to judge whether each of the word segmentation results contains polyphony Phonetic characters; in the case that the word segmentation result does not contain polyphonic characters, the dictionary query module can be used to query a preset dictionary to obtain the pronunciation of the word segmentation result;
  • the word length determination module contains more than one word in the word segmentation result In the case of phonetic characters, the word length of the polysyllabic character segmentation result is determined; the polysyllabic character segmentation result means that the word segmentation result contains polysyllabic characters;
  • the dictionary query module is used to determine the
  • a computer device including a memory and a processor, the memory stores a computer program, and when the computer program is executed by the processor, the processor executes any of the above-mentioned embodiments.
  • the polyphonic word prediction method can also implement the polyphonic word disambiguation method described in any of the above embodiments.
  • Fig. 10 shows an internal structure diagram of a computer device in an embodiment.
  • the computer device may specifically be a terminal or a server.
  • the computer device includes a processor, a memory, and a network interface connected through a system bus.
  • the memory includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium of the computer device stores an operating system and may also store a computer program.
  • This application can be applied to a text to speech system.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Road (Synchlink) DRAM
  • SLDRAM synchronous chain Road (Synchlink) DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)

Abstract

一种多音字预测方法及消歧方法、装置、设备及计算机可读存储介质,该多音字预测方法包括如下步骤:获取待预测文本中的多音字文本、以及多音字文本在待预测文本中的上文文本和/或下文文本(S20);构建多音字文本、上文文本、下文文本各自对应的一个或多个特征向量(S30);将上文文本的特征向量、多音字文本的特征向量、下文文本的特征向量输入多音字预测模型获得多音字预测结果;多音字预测模型包括第一神经网络模块、第二神经网络模块和第三神经网络模块;第一神经网络模块输入上文文本的特征向量并得到第一输出向量,第二神经网络模块输入多音字文本的特征向量并得到第二输出向量,第三神经网络模块输入下文文本的特征向量并得到第三输出向量;多音字预测结果包括多音字的每种读音的发音概率,通过将第一输出向量、第二输出向量和第三输出向量进行拼接来获得(S40);基于多音字的每种读音的发音概率来确定多音字在待预测文本中的读音(S50)。有利于提高对多音字读音预测的准确度,有效避免分类干扰,编解码实现容易。

Description

多音字预测方法及消歧方法、装置、设备及计算机可读存储介质 技术领域
本申请涉及语音合成技术领域,具体涉及一种多音字预测方法、多音字消歧方法、多音字预测装置、多音字消歧装置、计算机设备及计算机可读存储介质。
背景技术
多音字是指一个语言基本单元如汉字、单词等具有两个或两个以上的读音,例如,同形异音字、同形异音词。不同的读音通常表达不同的语义和用法。多音字现象在语料中较为普遍,并且音项繁多、成因广泛、现有语料覆盖有限,同时,多音字读音的差异直接影响多音字文本的理解度和准确度,因此,对多音字的预测和消歧尤为重要。多音字预测和消歧是在进行文本读音确定时,预测和获取多音字正确读音的技术。
技术问题
现有的多音字预测及消歧方式存在如下问题:
1、仅采集多音字前后的一两个字或词作为片段特征预测,不能更好的利用长距离的上下文信息,容易对多音字的读音预测不准确;
2、将非多音字作为预测类别,输出为多元素序列的预测结果,非多音字容易造成分类干扰,编解码复杂。
技术解决方案
本申请针对以上问题的提出,而研制一种能够利用长距离的多音字上下文信息、以及可以构建多元素序列至唯一预测结果的多音字预测方法及消歧方法,同时还提供了一种多音字预测装置及消歧装置,以及能够实现上述多音字消歧方法的计算机设备及计算机可读存储介质。
本申请采用的一个技术手段是:提供一种多音字预测方法,包括:
获取待预测文本中的多音字文本、以及所述多音字文本在所述待预测文本中的上文文本和/或下文文本;
构建所述多音字文本、所述上文文本、所述下文文本各自对应的一个或多个特征向量;
将所述上文文本的特征向量、所述多音字文本的特征向量、所述下文文本的特征向量输入多音字预测模型获得多音字预测结果;所述多音字预测模型包括第一神经网络模块、第二神经网络模块和第三神经网络模块;所述第一神经网络模块输入所述上文文本的特征向量并得到第一输出向量,所述第二神经网络模块输入所述多音字文本的特征向量并得到第二输出向量,所述第三神经网络模块输入所述下文文本的特征向量并得到第三输出向量;所述多音字预测结果通过将所述第一输出向量、所述第二输出向量和所述第三输出向量进行拼接来获得;
所述多音字预测结果包括所述多音字的每种读音的发音概率;基于所述多音字的每种读音的发音概率来确定所述多音字在所述待预测文本中的读音。
本申请采用的另一个技术手段是:提供一种多音字消歧方法,包括:
对待消歧文本进行分词得到多个分词结果;
判断各个所述分词结果中是否包含多音字;
确定多音字分词结果的词长是否大于预设词长;所述多音字分词结果是指包含有多音字的分词结果;
在所述多音字分词结果的词长大于预设词长的情况下,查询并判断预设词典中是否存有所述多音字分词结果;
在所述多音字分词结果未存在于所述预设词典中的情况下,在预设规则库中查找是否存在与所述多音字分词结果的特征信息相匹配的结果;
在所述预设规则库中未存在与所述多音字分词结果的特征信息相匹配的结果的情况下,将所述多音字分词结果作为待预测文本,通过所述的多音字预测方法对所述多音字分词结果进行预测。
本申请采用的另一个技术手段是:提供一种多音字预测装置,包括:
文本获取模块,用于获取待预测文本中的多音字文本、以及所述多音字文本在所述待预测文本中的上文文本和/或下文文本;
向量构建模块,用于构建所述多音字文本、所述上文文本、所述下文文本各自对应的一个或多个特征向量;
模型预测模块,用于将所述上文文本的特征向量、所述多音字文本的特征向量、所述下文文本的特征向量输入多音字预测模型获得多音字预测结果;所述多音字预测模型包括第一神经网络模块、第二神经网络模块和第三神经网络模块;所述第一神经网络模块输入所述上文文本的特征向量并得到第一输出向量,所述第二神经网络模块输入所述多音字文本的特征向量并得到第二输出向量,所述第三神经网络模块输入所述下文文本的特征向量并得到第三输出向量;所述多音字预测结果包括所述多音字的每种读音的发音概率,并通过将所述第一输出向量、所述第二输出向量和所述第三输出向量进行拼接来获得;和
读音确定模块,用于基于所述多音字的每种读音的发音概率来确定所述多音字在所述待预测文本中的读音。
本申请采用的另一个技术手段是:提供一种多音字消歧装置,包括:
文本分词模块,用于对待消歧文本进行分词得到多个分词结果;
多音字判断模块,用于判断各个所述分词结果中是否包含多音字;
词长确定模块,用于确定多音字分词结果的词长是否大于预设词长;所述多音字分词结果是指包含有多音字的分词结果;
词典查询模块,用于在所述多音字分词结果的词长大于预设词长的情况下,查询预设词典并判断所述预设词典中是否存有所述多音字分词结果;
规则库校验模块,用于在所述多音字分词结果未存在于所述预设词典中的情况下,在预设规则库中查找是否存在与所述多音字分词结果的特征信息相匹配的结果;和
所述的多音字预测装置,所述多音字预测装置用于在所述预设规则库中未存在与所述多音字分词结果的特征信息相匹配的结果的情况下,将所述多音字分词结果作为待预测文本,对所述多音字分词结果进行预测。
本申请采用的另一个技术手段是:提供一种计算机设备,包括:处理器和存储器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行上述多音字预测方法的步骤。
本申请采用的另一个技术手段是:提供一种计算机设备,包括:处理器和存储器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行上述多音字消歧方法的步骤。
本申请采用的另一个技术手段是:提供一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行上述多音字预测方法的步骤。
本申请采用的另一个技术手段是:提供一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行上述多音字预测方法的步骤。
有益效果
实施本申请实施例,将具有如下有益效果:
本申请提供的多音字预测方法及消歧方法、装置、设备及计算机可读存储介质,所述多音字预测方法能够对多音字的长距离的上下文信息进行获取、特征利用和模型预测,有利于提高对多音字读音预测的准确度。预测结果为多音字每种读音的概率,未将非多音字作为预测类别,可以有效避免分类干扰,编解码实现容易。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
其中:
图1是本申请一个实施例中多音字预测方法的实现流程示意图;
图2是本申请一个实施例中多音字预测方法的实现示例图;
图3是本申请一个实施例中步骤S30的实现流程示意图;
图4是本申请一个实施例中步骤S302的实现流程示意图;
图5是本申请一个实施例中步骤S302的实现示例图;
图6是本申请一个实施例中多音字预测模型训练步骤的实现示例图;
图7是本申请一个实施例中多音字消歧方法的实现流程示意图;
图8是本申请一个实施例中多音字预测装置的结构框图;
图9是本申请一个实施例中多音字消歧装置的结构框图;
图10是本申请一个实施例中计算机设备的结构框图;
图11是本申请一个实施例中输出向量的示例图。
本发明的实施方式
为了使本申请的发明目的、技术方案及其技术效果更加清晰,以下结合附图和具体实施方式,对本申请进一步详细说明。应当理解的是,本说明书中描述的具体实施方式仅仅是为了解释本申请,并非为了限定本申请。在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。
在一个实施例中,提供了一种多音字预测方法,所述多音字预测方法的执行主体为能够实现所述多音字预测方法的设备,该设备可以包括但不限于终端和服务器,其中,终端可以包括但不限于台式终端和移动终端,台式终端包括台式电脑,移动终端包括但不限于手机、平板和笔记本电脑;服务器包括高性能计算机和高性能计算机集群。该多音字预测方法,如图1所示,具体可以包括如下步骤:
步骤S20,获取待预测文本中的多音字文本、以及所述多音字文本在所述待预测文本中的上文文本和/或下文文本。
所述待预测文本是指包含一个或多个多音字的文本,所述多音字可以具有两个或两个以上的读音。所述多音字可以为一个汉字如“传”,在“传说”中可以读“chuán”,在“传记”中可以读“zhuàn”,也可以是一个词语,如重创,在表示“受到重大伤亡、损害”时,可以读“Zhòng chuāng”,在表示“重新创办”时,可以读“Chóng chuàng”,还可以是具有两个或两个以上读音的英文单词或其他语言、词汇、语句等。
所述多音字文本是指多音字本身,例如“小明舍(shě)不得离开深圳”,这里的多音字文本为“舍”,上文文本是指在所述待预测文本中位于所述多音字文本前面的文本,这里的上文文本为“小明”,下文文本是指在所述待预测文本中位于所述多音字文本后面的文本,这里的下文文本为“不得离开深圳”。
针对多音字文本的预测和消歧,需要结合所述多音字文本的上下文语言学知识。现有技术中的多音字预测和消歧方式,通常只考量所述多音字文本的前后一、两词,而本实施例可以利用所述多音字文本的长距离的上文文本和/或下文文本。若多音字文本位于所述待预测文本的文本开头,则所述多音字文本只有下文文本,没有上文文本,此时步骤20获取的是多音字文本、以及多音字文本的下文文本。若多音字文本位于所述待预测文本的文本结尾,则所述多音字文本只有上文文本,没有下文文本,此时步骤20获取的是多音字文本、以及多音字文本的上文文本。当然,若多音字文本位于所述待预测文本的中间,则所述多音字文本前面具有上文文本,后面具有下文文本,此时步骤20获取的是多音字文本、多音字文本的上文文本和下文文本。
步骤S30,构建所述多音字文本、所述上文文本、所述下文文本各自对应的一个或多个特征向量。具体地,将所述多音字文本按字获取每个字的特征向量、将所述上文文本按字获取每个字的特征向量、将所述下文文本按字获取每个字的特征向量。所述多音字文本、所述上文文本或所述下文文本可以包含一个字,也可以包含多个字,例如“小明舍(shě)不得离开深圳”,这里的多音字文本“舍”包含一个字,则构建所述多音字文本对应的特征向量,即为构建“舍”的特征向量,上文文本为“小明”包含两个字,则构建所述上文文本对应的特征向量,即为构建“小”字的特征向量、“明”字的特征向量,下文文本为“不得离开深圳”包含六个字,则构建所述下文文本对应的特征向量,即为构建“不”字的特征向量、“得”字的特征向量、“离”字的特征向量、“开”字的特征向量、“深”字的特征向量、“圳”字的特征向量。如图2、图5所示,当一个字对应的特征向量有多个时,则将一个字对应的多个特征向量构成一个合成向量,则所述多音字文本、所述上文文本或所述下文文本包括多个字时,多个字的合成向量可以按照所述多音字文本、所述上文文本或所述下文文本在所述待预测文本中的前后顺序以向量矩阵的形式输入至多音字预测模型。所述特征向量可以为字向量、字的词性向量、字的前字或词的词性向量、字的后字或词的词性向量、所述字的位置向量等,当然也可以是多音字文本、上文文本或下文文本的其他特征向量。所述字向量可以为多音字文本、上文文本或下文文本包含的各字的字向量。所述词性向量可以为名词、形容词、动词等。所述字的位置向量可以为该字所在文本在所述待预测文本中的相对位置等。
在一个实施例中,如图3、图5所示,所述步骤30可以包括:
步骤S301,分别获得所述多音字文本、所述上文文本、所述下文文本的字特征信息;所述字特征信息包括字信息、字的词性信息、字的前字或词的词性信息、字的后字或词的词性信息、字的位置信息中的至少一种;
示例性地,如图5中的“words”表示字信息或词信息,可以为“优必选”、“好”、“厉害”,“优必选”因为包含有三个字,因此,在特征向量构建时,按照“优”、“必”、“选”逐字处理,“厉害”因为包含有两个字,因此,在特征向量构建时,按照“厉”、“害”逐字处理。“poses”表示词性信息,示例性地可以采用n、v、a表示。“Left poses”表示左词性信息,即该字或词的前字或词的词性,示例性地可以采用na_l、n、v表示。“right poses”表示右词性信息,即该字或词的后字或词的词性,示例性地可以采用v、a、na_r表示。“loc”表示字的位置信息,示例性地可以采用left、mid、right表示。
步骤S302,将所述多音字文本、所述上文文本、所述下文文本的字特征信息分别转换为相应的ID信息。示例性地,如图5所示,示例性地,上文文本包含的字“优”的特征信息包括:字信息“优”、字的词性信息“n”、字的前字词性信息“na_l”(表示没有前字)、字的后字词性信息“v”、字在所述待预测文本中的位置信息“left”。图5中示出的word2idx、pose2idx、loc2idx表示将特征信息向ID信息的转换。
在一个实施例中,如图4、图5所示,所述将所述多音字文本、所述上文文本、所述下文文本的字特征信息分别转换为相应的ID信息的步骤可以包括:
步骤S302A,预先建立所述字特征信息与所述ID信息之间的映射字典。
所述映射字典内存有所述字的特征信息与所述ID信息之间的对应关系和映射关系,当将所述字的特征信息输入至所述映射字典时,能够从所述映射字典中获取到与所述字的特征信息相对应的ID信息。
步骤S302B,基于所述映射字典获得不同所述字特征信息分别对应的ID信息。不同所述字特征信息具有不同的ID信息,均可以通过所述映射字典获得。
步骤S303,将所述ID信息进行向量化,得到所述多音字文本、所述上文文本、所述下文文本分别对应的一个或多个特征向量。进一步地,所述将所述ID信息进行向量化的步骤可以包括:所述字信息对应的ID信息通过Word2Vec转换为字向量,所述Word2Vecter为字转换成向量的手段;将所述字的词性信息、所述字的前字或词的词性信息、所述字的后字或词的词性信息、以及所述字的位置信息分别对应的ID信息通过独热编码转换为特征向量,所述独热编码即为图5中示出的“One-Hot”,是一种将特征信息转换为向量的编码手段。
步骤S40,将所述上文文本的特征向量、所述多音字文本的特征向量、所述下文文本的特征向量输入多音字预测模型获得多音字预测结果。所述多音字预测模型包括第一神经网络模块、第二神经网络模块和第三神经网络模块;所述第一神经网络模块输入所述上文文本的特征向量并得到第一输出向量,所述第二神经网络模块输入所述多音字文本的特征向量并得到第二输出向量,所述第三神经网络模块输入所述下文文本的特征向量并得到第三输出向量;所述多音字预测结果通过将所述第一输出向量、所述第二输出向量和所述第三输出向量进行拼接来获得。在一个实施例中,所述第一神经网络模块和所述第三神经网络模块可以为长短期记忆神经网络模块(LSTM)、所述第二神经网络模块可以为深度神经网络模块(DNN)。
步骤S50,基于所述多音字的每种读音的发音概率来确定所述多音字在所述待预测文本中的读音。
图2示出了本申请一个实施例中多音字预测方法的实现示例图,如图2所示,将所述待预测文本“优必选好厉害”依次经过所述多音字文本、所述上文文本和所述下文文本的获取和特征信息表达后,生成相应的特征向量输入至所述多音字预测模型,该多音字预测模型包括前向LSTM、DNN和后向LSTM,得到多音字预测结果。图11示出了本申请一个实施例中多音字预测结果的示例图,如图11所示,所述多音字预测结果表征所述多音字不同读音可能的概率大小,例如,表示“好”的读音为“hǎo”的概率为0.8、“好”的读音为“hào”的概率为0.2。进而可以选择发音概率最大的发音,即获得多音字“好”的读音“hao3”,作为多音字的发音标注,多音字的某个读音的概率比较高,采用该读音进行多音字的标注。
本实施例采用一种长距离、低干扰的网络结构。将多音字的上下文信息与自身信息拼接,完整的利用了待预测文本的整句信息,构建了多元素序列到唯一预测结果的网络,预测结果只有多音字读音,既保证了输出结果唯一,同时可以避免非多音字的分类干扰及编解码复杂度。多音字预测模型简化和高效。本实施例将神经网络模型作为统一的通用分类器,避免了采用过多分类器造成模型庞大,解码复杂度高的问题。
在一个实施例中,可以将多个包含多音字的训练文本作为输入,将所述训练文本包含的多音字的正确读音作为输出来对所述多音字预测模型进行训练。所述多音字预测模型可以通过大量的已有明确读音标注的训练样本对包括前向LSTM、DNN和后向LSTM的多音字预测模型进行训练获得。训练时,首先将所述多音字预测模型赋予一个初始化模型,将包含多音字的训练文本输入至所述多音字预测模型并获得多音字预测结果,将多音字预测结果与所述训练文本包含的多音字的正确读音进行误差计算,所述多音字预测结果可以通过交叉熵进行计算,所述训练文本包含的多音字的正确读音可以通过One-Hot方法进行标记,之后利用梯度下降方法,重新调节所述多音字预测模型内的参数,多次训练,直至所述多音字预测结果与训练文本包含的多音字的正确读音趋于一致。这里的交叉熵计算方法、One-Hot方法、梯度下降方法均可以采用神经网络模型训练有关的其他方法来替代。
在所述第一神经网络模块得到第一输出向量、所述第二神经网络模块得到第二输出向量、所述第三神经网络模块得到第三输出向量后,首先将所述第一输出向量、所述第二输出向量、所述第三输出向量拼接成一个向量,然后将拼接获得的向量做归一化处理,之后使用argmax函数解码向量,当然也可以采用其他向量解码方法来替代,该向量中概率最大位置对应的则为正确的读音。所述argmax函数的作用为获取向量中最大值所对应的索引。
在一个实施例中,如图6所示,所述多音字预测模型的训练步骤可以包括:
①通过文本迭代器获取多个训练文本,将所述训练文本作为待预测文本,依次执行所述多音字文本、所述上文文本、所述下文文本的获取步骤,以及所述多音字文本、所述上文文本、所述下文文本对应的特征向量的构建步骤,得到每一训练文本的特征向量数据;
②对各个所述训练文本的特征向量数据按照数据长度进行聚类划分;将每一聚类中的各所述训练文本的特征向量数据的数据长度调节一致;将每一聚类中的所述训练文本的特征向量数据批量输入至所述多音字预测模型;
所述步骤①与步骤②并行进行,并行进行处理的可以为不同的训练文本。
示例性地,图6中示出的“特征向量数据item”表示每一训练文本的特征向量数据,图6中示出的分桶操作表示对各个所述训练文本的特征向量数据按照数据长度进行聚类划分,具体地,属于较短数据长度的划分到一起,将属于较长数据长度的划分到一起,即将数据长度彼此差别不大的所述训练文本的特征向量数据划分到一起,具体地,将划分好的所述训练文本的特征向量数据添加至预设的特征队列,待所述特征队列被塞满时,将每一聚类中的各所述训练文本的特征向量数据的数据长度调节一致然后批量输入至所述多音字预测模型中,图6中的填充指的是数据长度调节操作,打包指的是批量输入操作。
本实施例将文本提取、向量构建等操作与向量批量输入多音字预测模型的操作并行处理,可以有效地提高效率,适用于大规模样本数据训练,有利于减小模型训练周期。本实施例的模型训练可靠性和效率均较高。
如图7所示,在一个实施例中,还提供了一种多音字消歧方法,可以包括如下步骤:
步骤S1,对待消歧文本进行分词得到多个分词结果;所述待消歧文本可能包含多音字,也可能不包含多音字,可以为一个语句、一个语言文本等。
步骤S2,判断各个所述分词结果中是否包含多音字。
在所述分词结果中不包含多音字的情况下,执行步骤S3,查询预设词典来获得所述分词结果的读音;所述预设词典可以为字、词、短语等与读音之间映射的词典、字词库等,即直接能够在所述预设词典中找到和确定所述字、词或短语的读音;
步骤S4,在所述分词结果中包含多音字的情况下,对多音字分词结果的词长进行确定,并将所述多音字分词结果的词长与预设词长进行比较。所述多音字分词结果是指包含有多音字的分词结果;所述预设词长可以为1,进而能够对所述多音字分词结果为单音节或多音节进行区分,大于预设词长的多音字分词结果为多音节,等于预设词长的多音字分词结果为单音节。当然所述预设词长可以根据具体需要来设定为其他长度。
步骤S5,在所述多音字分词结果的词长大于预设词长的情况下,即所述多音字分词结果为多音节,则查询并判断所述预设词典中是否存有所述多音字分词结果。
在所述多音字分词结果存在于所述预设词典中的情况下,执行步骤S7,查询所述预设词典来获得所述多音字分词结果的读音;即首先先查找预设词典中是否已存有该所述多音字分词结果的读音,如果在所述预设词典中能够查找得到,则直接使用该读音来对所述多音字分词结果进行标注。
步骤S8,在所述多音字分词结果未存在于所述预设词典中的情况下,则在预设规则库中查找是否存在与所述多音字分词结果的特征信息相匹配的结果。
所述预设规则库是指对多音字特征信息与多音字读音之间对应关系建立规则的库;具体地,可以通过统计提取多音字文本中的特征,并基于多音字文本的正确读音来建立对应规则。多音字特征信息可以为:多音字的字、多音字词性、前后字或词的词性、多音字在文本中相对位置、多音字长度等。当所述预设规则库的规则过多时,可以采用支持向量机(SVM)来解决之间冲突。若所述预设规则库中存在能够匹配所述多音字分词结果特征信息的多音字读音,则可以直接采用该多音字读音对所述多音字分词结果进行标注。
在所述预设规则库中存在与所述多音字分词结果的特征信息相匹配的结果的情况下,执行步骤S11,将所述预设规则库中与所述多音字分词结果的特征信息相匹配的结果作为所述多音词分词结果的读音;
步骤S12,在所述预设规则库中未存在与所述多音字分词结果的特征信息相匹配的结果的情况下,则说明所述预设规则库未对该多音字分词结果建立规则,则将所述多音字分词结果作为待预测文本,通过上述任一实施例的所述多音字预测方法对所述多音字分词结果进行预测。
本实施例将词典查询、规则库校验、利用深度学习和神经网络预测至少3种多音字预测和消歧方式,结合有效的逻辑进行融合使用,能够避免单一使用某一种方式在对某些特定字进行预测时的局限性。本实施例通过词典、规则库和神经网络的组合预测,形成了精确度高且易维护的多音字消歧方法。
在一个实施例中,在所述多音字分词结果的词长小于等于预设词长的情况下,即所述多音字分词结果为单音节,则执行步骤S6,在所述预设规则库中查找是否存在与所述多音字分词结果的特征信息相匹配的结果;
在所述预设规则库中存在与所述多音字分词结果的特征信息相匹配的结果的情况下,执行步骤S9,将所述预设规则库中与所述多音字分词结果的特征信息相匹配的结果作为所述多音词分词结果的读音;
在所述预设规则库中未存在与所述多音字分词结果的特征信息相匹配的结果的情况下,执行步骤S10,将所述多音字分词结果作为待预测文本,通过所述多音字预测方法对所述多音字分词结果进行预测。
本实施例是针对所述多音字分词结果的词长小于等于预设词长,即单音节多音字预测的实现过程。
在一个实施例中,在所述多音字预测方法未能得到正确读音的情况下,将对应所述多音字分词结果的正确读音补充至所述预设词典和所述预设规则库。
在一个实施例中,在所述多音字预测方法未能得到正确读音的情况下,利用对应所述多音字分词结果的正确读音作为样本以训练所述多音字预测模型。
在所述多音字预测方法未能有效进行多音字正确读音预测的情况下,可将对应所述多音字分词结果的正确读音作为新的多音字样本优先在预设词典和预设规则库中进行补充,以实现快速维护。同时,将对应所述多音字分词结果的正确读音作为新的多音字样本对多音字预测模型的迭代和训练以实现所述多音字预测模型的稳定改进。
如图8所示,在一个实施例中,还提供了一种多音字预测装置,可以包括:文本获取模块、向量构建模块、模型预测模块和读音确定模块;所述文本获取模块用于获取待预测文本中的多音字文本、以及所述多音字文本在所述待预测文本中的上文文本和/或下文文本;所述向量构建模块用于构建所述多音字文本、所述上文文本、所述下文文本各自对应的一个或多个特征向量;所述模型预测模块用于将所述上文文本的特征向量、所述多音字文本的特征向量、所述下文文本的特征向量输入多音字预测模型获得多音字预测结果;所述多音字预测模型包括第一神经网络模块、第二神经网络模块和第三神经网络模块;所述第一神经网络模块输入所述上文文本的特征向量并得到第一输出向量,所述第二神经网络模块输入所述多音字文本的特征向量并得到第二输出向量,所述第三神经网络模块输入所述下文文本的特征向量并得到第三输出向量;所述多音字预测结果包括所述多音字的每种读音的发音概率,并通过将所述第一输出向量、所述第二输出向量和所述第三输出向量进行拼接来获得;所述读音确定模块用于基于所述多音字的每种读音的发音概率来确定所述多音字在所述待预测文本中的读音。
如图9所示,在一个实施例中,还提供了一种多音字消歧装置,可以包括:文本分词模块、多音字判断模块、词长确定模块、词典查询模块、规则库校验模块和上述任一实施例所述的多音字预测装置;所述文本分词模块用于对待消歧文本进行分词得到多个分词结果;所述多音字判断模块用于判断各个所述分词结果中是否包含多音字;在所述分词结果中不包含多音字的情况下,可以利用所述词典查询模块查询预设词典来获得所述分词结果的读音;所述词长确定模块在所述分词结果中包含多音字的情况下,对多音字分词结果的词长进行确定;所述多音字分词结果是指该分词结果中包含多音字;所述词典查询模块用于在所述多音字分词结果的词长大于预设词长的情况下,查询预设词典并判断所述预设词典中是否存有所述多音字分词结果;在所述多音字分词结果存在于所述预设词典中的情况下,可以利用所述词典查询模块查询预设词典来获得所述多音字分词结果的读音;所述规则库校验模块用于在所述多音字分词结果未存在于所述预设词典中的情况下,在所述预设规则库中查找是否存在与所述多音字分词结果的特征信息相匹配的结果;在所述预设规则库中存在与所述多音字分词结果的特征信息相匹配的结果的情况下,所述规则库校验模块可以将所述预设规则库中与所述多音字分词结果的特征信息相匹配的结果作为所述多音词分词结果的读音;所述规则库校验模块还用于在所述多音字分词结果的词长小于等于预设词长的情况下,在所述预设规则库中查找是否存在与所述多音字分词结果的特征信息相匹配的结果;所述多音字预测装置用于在所述预设规则库中未存在与所述多音字分词结果的特征信息相匹配的结果的情况下,将所述多音字分词结果作为待预测文本,对所述多音字分词结果进行预测。
在一个实施例中,提出了一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行上述任一实施例所述的多音字预测方法,也可以执行上述任一实施例所述的多音字消歧方法。图10示出了一个实施例中计算机设备的内部结构图。该计算机设备具体可以是终端或服务器。如图10所示,该计算机设备包括通过系统总线连接的处理器、存储器和网络接口。其中,存储器包括非易失性存储介质和内存储器。该计算机设备的非易失性存储介质存储有操作系统,还可存储有计算机程序,该计算机程序被处理器执行时,可使得处理器实现多音字预测方法和/或多音字消歧方法。该内存储器中也可储存有计算机程序,该计算机程序被处理器执行时,可使得处理器执行多音字预测方法和/或多音字消歧方法。本领域技术人员可以理解,图10中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,提出了一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行上述任一实施例所述多音字预测方法的步骤,也可以执行上述任一实施例所述多音字消歧方法的步骤。本申请提供的多音字预测方法和/或多音字消歧方法可以实现为一种计算机程序的形式,计算机程序可在如图10所示的计算机设备上运行。计算机设备的存储器中可存储组成多音字预测装置和/或多音字消歧装置的各个程序模板。比如,文本获取模块、向量构建模块、文本分词模块、词典查询模块、规则库校验模块等。
本申请可以应用于语音合成系统(text to speech)。
需要说明的是,上述多音字预测方法、多音字消歧方法、多音字预测装置、多音字消歧装置、计算机设备及计算机可读存储介质属于一个总的发明构思,多音字预测方法、多音字消歧方法、多音字预测装置、多音字消歧装置、计算机设备及计算机可读存储介质实施例中的内容可相互适用。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink) DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上所述,仅为本申请较佳的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,根据本申请的技术方案及其发明构思加以等同替换或改变,都应涵盖在本申请的保护范围之内。此外,尽管本说明书中使用了一些特定的术语,但这些术语只是为了方便说明,并不对本申请构成任何限制。

Claims (18)

  1. 一种多音字预测方法,其特征在于,所述多音字预测方法包括如下步骤:
    获取待预测文本中的多音字文本、以及所述多音字文本在所述待预测文本中的上文文本和/或下文文本;
    构建所述多音字文本、所述上文文本、所述下文文本各自对应的一个或多个特征向量;
    将所述上文文本的特征向量、所述多音字文本的特征向量、所述下文文本的特征向量输入多音字预测模型获得多音字预测结果;所述多音字预测模型包括第一神经网络模块、第二神经网络模块和第三神经网络模块;所述第一神经网络模块输入所述上文文本的特征向量并得到第一输出向量,所述第二神经网络模块输入所述多音字文本的特征向量并得到第二输出向量,所述第三神经网络模块输入所述下文文本的特征向量并得到第三输出向量;所述多音字预测结果包括所述多音字的每种读音的发音概率,并通过将所述第一输出向量、所述第二输出向量和所述第三输出向量进行拼接来获得;
    基于所述多音字的每种读音的发音概率来确定所述多音字在所述待预测文本中的读音。
  2. 根据权利要求1所述的多音字预测方法,其特征在于,所述第一神经网络模块和所述第三神经网络模块为长短期记忆神经网络模块、所述第二神经网络模块为深度神经网络模块。
  3. 根据权利要求1所述的多音字预测方法,其特征在于,所述构建所述多音字文本、所述上文文本、所述下文文本各自对应的一个或多个特征向量的步骤包括:
    分别获得所述多音字文本、所述上文文本、所述下文文本的字特征信息;所述字特征信息包括字信息、字的词性信息、字的前字或词的词性信息、字的后字或词的词性信息、字的位置信息中的至少一种;
    将所述多音字文本、所述上文文本、所述下文文本的字特征信息分别转换为相应的ID信息;
    将所述ID信息进行向量化,得到所述多音字文本、所述上文文本、所述下文文本分别对应的一个或多个特征向量;在所述多音字文本、所述上文文本或所述下文文本对应有多个特征向量的情况下,将所述多个特征向量进行拼接得到合成的特征向量。
  4. 根据权利要求3所述的多音字预测方法,其特征在于,所述将所述多音字文本、所述上文文本、所述下文文本的字特征信息分别转换为相应的ID信息的步骤包括:
    预先建立所述字特征信息与所述ID信息之间的映射字典;
    基于所述映射字典获得不同所述字特征信息分别对应的ID信息。
  5. 根据权利要求3所述的多音字预测方法,其特征在于,所述将所述ID信息进行向量化的步骤包括:
    所述字信息对应的ID信息通过Word2Vec转换为字向量;
    所述字的词性信息、所述字的前字或词的词性信息、所述字的后字或词的词性信息、以及所述字的位置信息分别对应的ID信息通过独热编码转换为特征向量。
  6. 根据权利要求1所述的多音字预测方法,其特征在于,将多个包含多音字的训练文本作为输入,将所述训练文本包含的多音字的正确读音作为输出来对所述多音字预测模型进行训练。
  7. 根据权利要求6所述的多音字预测方法,其特征在于,所述多音字预测模型的训练步骤包括:
    ①通过文本迭代器获取多个训练文本,将所述训练文本作为待预测文本,依次执行所述多音字文本、所述上文文本、所述下文文本的获取步骤,以及所述多音字文本、所述上文文本、所述下文文本对应的特征向量的构建步骤,得到每一训练文本的特征向量数据;
    ②对各个所述训练文本的特征向量数据按照数据长度进行聚类划分;将每一聚类中的各所述训练文本的特征向量数据的数据长度调节一致;将每一聚类中的所述训练文本的特征向量数据批量输入至所述多音字预测模型;
    所述步骤①与步骤②并行进行。
  8. 一种多音字消歧方法,其特征在于,所述多音字消歧方法包括:
    对待消歧文本进行分词得到多个分词结果;
    判断各个所述分词结果中是否包含多音字;
    确定多音字分词结果的词长是否大于预设词长;所述多音字分词结果是指包含有多音字的分词结果;
    在所述多音字分词结果的词长大于预设词长的情况下,查询并判断预设词典中是否存有所述多音字分词结果;
    在所述多音字分词结果未存在于所述预设词典中的情况下,在预设规则库中查找是否存在与所述多音字分词结果的特征信息相匹配的结果;
    在所述预设规则库中未存在与所述多音字分词结果的特征信息相匹配的结果的情况下,将所述多音字分词结果作为待预测文本,通过权利要求1至7任一项所述的多音字预测方法对所述多音字分词结果进行预测。
  9. 根据权利要求8所述的多音字消歧方法,其特征在于,在所述分词结果中不包含多音字的情况下,查询预设词典来获得所述分词结果的读音。
  10. 根据权利要求8所述的多音字消歧方法,其特征在于,
    在所述多音字分词结果的词长小于等于预设词长的情况下,在所述预设规则库中查找是否存在与所述多音字分词结果的特征信息相匹配的结果;
    在所述预设规则库中未存在与所述多音字分词结果的特征信息相匹配的结果的情况下,将所述多音字分词结果作为待预测文本,通过所述多音字预测方法对所述多音字分词结果进行预测。
  11. 根据权利要求8或10所述的多音字消歧方法,其特征在于,在所述多音字预测方法未能得到正确读音的情况下,将对应所述多音字分词结果的正确读音补充至所述预设词典和所述预设规则库。
  12. 根据权利要求8或10所述的多音字消歧方法,其特征在于,在所述多音字预测方法未能得到正确读音的情况下,利用对应所述多音字分词结果的正确读音作为样本以训练所述多音字预测模型。
  13. 一种多音字预测装置,其特征在于,所述多音字预测装置包括:
    文本获取模块,用于获取待预测文本中的多音字文本、以及所述多音字文本在所述待预测文本中的上文文本和/或下文文本;
    向量构建模块,用于构建所述多音字文本、所述上文文本、所述下文文本各自对应的一个或多个特征向量;
    模型预测模块,用于将所述上文文本的特征向量、所述多音字文本的特征向量、所述下文文本的特征向量输入多音字预测模型获得多音字预测结果;所述多音字预测模型包括第一神经网络模块、第二神经网络模块和第三神经网络模块;所述第一神经网络模块输入所述上文文本的特征向量并得到第一输出向量,所述第二神经网络模块输入所述多音字文本的特征向量并得到第二输出向量,所述第三神经网络模块输入所述下文文本的特征向量并得到第三输出向量;所述多音字预测结果包括所述多音字的每种读音的发音概率,并通过将所述第一输出向量、所述第二输出向量和所述第三输出向量进行拼接来获得;和
    读音确定模块,用于基于所述多音字的每种读音的发音概率来确定所述多音字在所述待预测文本中的读音。
  14. 一种多音字消歧装置,其特征在于,所述多音字消歧装置包括:
    文本分词模块,用于对待消歧文本进行分词得到多个分词结果;
    多音字判断模块,用于判断各个所述分词结果中是否包含多音字;
    词长确定模块,用于确定多音字分词结果的词长是否大于预设词长;所述多音字分词结果是指包含有多音字的分词结果;
    词典查询模块,用于在所述多音字分词结果的词长大于预设词长的情况下,查询预设词典并判断所述预设词典中是否存有所述多音字分词结果;
    规则库校验模块,用于在所述多音字分词结果未存在于所述预设词典中的情况下,在预设规则库中查找是否存在与所述多音字分词结果的特征信息相匹配的结果;和
    权利要求13所述的多音字预测装置,所述多音字预测装置用于在所述预设规则库中未存在与所述多音字分词结果的特征信息相匹配的结果的情况下,将所述多音字分词结果作为待预测文本,对所述多音字分词结果进行预测。
  15. 一种计算机设备,其特征在于,所述计算机设备包括处理器和存储器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行权利要求1至7中任一项所述多音字预测方法的步骤。
  16. 一种计算机设备,其特征在于,所述计算机设备包括处理器和存储器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行权利要求8至12中任一项所述多音字消歧方法的步骤。
  17. 一种计算机可读存储介质,其特征在于,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行权利要求1至7中任一项所述多音字预测方法的步骤。
  18. 一种计算机可读存储介质,其特征在于,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行权利要求8至12中任一项所述多音字消歧方法的步骤。
PCT/CN2019/127956 2019-12-24 2019-12-24 多音字预测方法及消歧方法、装置、设备及计算机可读存储介质 WO2021127987A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2019/127956 WO2021127987A1 (zh) 2019-12-24 2019-12-24 多音字预测方法及消歧方法、装置、设备及计算机可读存储介质
CN201980003196.0A CN113302683B (zh) 2019-12-24 2019-12-24 多音字预测方法及消歧方法、装置、设备及计算机可读存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/127956 WO2021127987A1 (zh) 2019-12-24 2019-12-24 多音字预测方法及消歧方法、装置、设备及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2021127987A1 true WO2021127987A1 (zh) 2021-07-01

Family

ID=76573435

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/127956 WO2021127987A1 (zh) 2019-12-24 2019-12-24 多音字预测方法及消歧方法、装置、设备及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN113302683B (zh)
WO (1) WO2021127987A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486672A (zh) * 2021-07-27 2021-10-08 腾讯音乐娱乐科技(深圳)有限公司 多音字消歧方法及电子设备和计算机可读存储介质
CN114417832A (zh) * 2021-12-08 2022-04-29 马上消费金融股份有限公司 消歧方法、消歧模型的训练方法及装置
CN114662478A (zh) * 2022-03-23 2022-06-24 京东科技信息技术有限公司 发音预测方法、装置、设备及存储介质

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102915299A (zh) * 2012-10-23 2013-02-06 海信集团有限公司 一种分词方法及装置
CN105336322A (zh) * 2015-09-30 2016-02-17 百度在线网络技术(北京)有限公司 多音字模型训练方法、语音合成方法及装置
JP5936698B2 (ja) * 2012-08-27 2016-06-22 株式会社日立製作所 単語意味関係抽出装置
CN106803422A (zh) * 2015-11-26 2017-06-06 中国科学院声学研究所 一种基于长短时记忆网络的语言模型重估方法
CN107402933A (zh) * 2016-05-20 2017-11-28 富士通株式会社 实体多音字消歧方法和实体多音字消歧设备
CN107464559A (zh) * 2017-07-11 2017-12-12 中国科学院自动化研究所 基于汉语韵律结构和重音的联合预测模型构建方法及系统
CN107515850A (zh) * 2016-06-15 2017-12-26 阿里巴巴集团控股有限公司 确定多音字发音的方法、装置和系统
US20180293228A1 (en) * 2017-04-11 2018-10-11 Samsung Electronics Co., Ltd. Device and method for converting dialect into standard language
CN109117480A (zh) * 2018-08-17 2019-01-01 腾讯科技(深圳)有限公司 词预测方法、装置、计算机设备及存储介质
CN110277085A (zh) * 2019-06-25 2019-09-24 腾讯科技(深圳)有限公司 确定多音字发音的方法及装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105185372B (zh) * 2015-10-20 2017-03-22 百度在线网络技术(北京)有限公司 个性化多声学模型的训练方法、语音合成方法及装置
CN106910497B (zh) * 2015-12-22 2021-04-16 阿里巴巴集团控股有限公司 一种中文词语发音预测方法及装置
CN107729313B (zh) * 2017-09-25 2021-09-17 百度在线网络技术(北京)有限公司 基于深度神经网络的多音字读音的判别方法和装置
CN108804512B (zh) * 2018-04-20 2020-11-24 平安科技(深圳)有限公司 文本分类模型的生成装置、方法及计算机可读存储介质

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5936698B2 (ja) * 2012-08-27 2016-06-22 株式会社日立製作所 単語意味関係抽出装置
CN102915299A (zh) * 2012-10-23 2013-02-06 海信集团有限公司 一种分词方法及装置
CN105336322A (zh) * 2015-09-30 2016-02-17 百度在线网络技术(北京)有限公司 多音字模型训练方法、语音合成方法及装置
CN106803422A (zh) * 2015-11-26 2017-06-06 中国科学院声学研究所 一种基于长短时记忆网络的语言模型重估方法
CN107402933A (zh) * 2016-05-20 2017-11-28 富士通株式会社 实体多音字消歧方法和实体多音字消歧设备
CN107515850A (zh) * 2016-06-15 2017-12-26 阿里巴巴集团控股有限公司 确定多音字发音的方法、装置和系统
US20180293228A1 (en) * 2017-04-11 2018-10-11 Samsung Electronics Co., Ltd. Device and method for converting dialect into standard language
CN107464559A (zh) * 2017-07-11 2017-12-12 中国科学院自动化研究所 基于汉语韵律结构和重音的联合预测模型构建方法及系统
CN109117480A (zh) * 2018-08-17 2019-01-01 腾讯科技(深圳)有限公司 词预测方法、装置、计算机设备及存储介质
CN110277085A (zh) * 2019-06-25 2019-09-24 腾讯科技(深圳)有限公司 确定多音字发音的方法及装置

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486672A (zh) * 2021-07-27 2021-10-08 腾讯音乐娱乐科技(深圳)有限公司 多音字消歧方法及电子设备和计算机可读存储介质
CN114417832A (zh) * 2021-12-08 2022-04-29 马上消费金融股份有限公司 消歧方法、消歧模型的训练方法及装置
CN114417832B (zh) * 2021-12-08 2023-05-05 马上消费金融股份有限公司 消歧方法、消歧模型的训练方法及装置
CN114662478A (zh) * 2022-03-23 2022-06-24 京东科技信息技术有限公司 发音预测方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN113302683A (zh) 2021-08-24
CN113302683B (zh) 2023-08-04

Similar Documents

Publication Publication Date Title
JP5901001B1 (ja) 音響言語モデルトレーニングのための方法およびデバイス
US9508341B1 (en) Active learning for lexical annotations
Mansfield et al. Neural text normalization with subword units
KR102375115B1 (ko) 엔드-투-엔드 모델들에서 교차-언어 음성 인식을 위한 음소-기반 컨텍스트화
JP2022534390A (ja) ストリーミングエンドツーエンドモデルを用いる大規模多言語音声認識
WO2020062680A1 (zh) 基于双音节混搭的波形拼接方法、装置、设备及存储介质
WO2021127987A1 (zh) 多音字预测方法及消歧方法、装置、设备及计算机可读存储介质
Scharenborg et al. Building an ASR system for a low-research language through the adaptation of a high-resource language ASR system: preliminary results
CN110010136B (zh) 韵律预测模型的训练和文本分析方法、装置、介质和设备
US10109274B2 (en) Generation device, recognition device, generation method, and computer program product
Khare et al. Low Resource ASR: The Surprising Effectiveness of High Resource Transliteration.
Milde et al. Multitask sequence-to-sequence models for grapheme-to-phoneme conversion.
TW202020854A (zh) 語音辨識系統及其方法、與電腦程式產品
KR101735195B1 (ko) 운율 정보 기반의 자소열 음소열 변환 방법과 시스템 그리고 기록 매체
US20150178274A1 (en) Speech translation apparatus and speech translation method
US10614170B2 (en) Method of translating speech signal and electronic device employing the same
US9658999B2 (en) Language processing method and electronic device
CN105895076B (zh) 一种语音合成方法及系统
CN103823795B (zh) 机器翻译系统、机器翻译方法和与其一起使用的解码器
Manghat et al. Hybrid sub-word segmentation for handling long tail in morphologically rich low resource languages
CN113362809B (zh) 语音识别方法、装置和电子设备
Zia et al. PronouncUR: An urdu pronunciation lexicon generator
CN114783405A (zh) 一种语音合成方法、装置、电子设备及存储介质
KR20230064304A (ko) 자동 레이블링 장치 및 이를 이용한 발화 문장의 레이블링 방법
CN113673247A (zh) 基于深度学习的实体识别方法、装置、介质及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19957211

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19957211

Country of ref document: EP

Kind code of ref document: A1