WO2020228175A1 - 多音字预测方法、装置、设备及计算机可读存储介质 - Google Patents
多音字预测方法、装置、设备及计算机可读存储介质 Download PDFInfo
- Publication number
- WO2020228175A1 WO2020228175A1 PCT/CN2019/102446 CN2019102446W WO2020228175A1 WO 2020228175 A1 WO2020228175 A1 WO 2020228175A1 CN 2019102446 W CN2019102446 W CN 2019102446W WO 2020228175 A1 WO2020228175 A1 WO 2020228175A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- text
- polyphone
- polyphonic
- converted
- prediction model
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 75
- 239000013598 vector Substances 0.000 claims description 42
- 230000007246 mechanism Effects 0.000 claims description 14
- 238000001514 detection method Methods 0.000 claims description 6
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 230000008569 process Effects 0.000 description 15
- 238000013528 artificial neural network Methods 0.000 description 13
- 238000004891 communication Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 230000000306 recurrent effect Effects 0.000 description 5
- 125000004122 cyclic group Chemical group 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000006386 memory function Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/027—Concept to speech synthesisers; Generation of natural phrases from machine-based concepts
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
Definitions
- This application relates to the field of artificial intelligence technology, and in particular to a method, device, equipment and computer-readable storage medium for predicting polyphones.
- Speech synthesis also known as text-to-speech technology, can convert any text information into standard and smooth voice reading in real time, which is equivalent to putting an artificial mouth on a machine. It involves acoustics, linguistics, digital signal processing, computer science and other disciplines and technologies. It is a cutting-edge technology in the field of Chinese information processing. The main problem to be solved is how to convert text information into audible sound information, that is, let The machine speaks like a human.
- the main purpose of this application is to provide a polyphonic character prediction method, device, equipment and computer readable storage medium, which aims to solve the technical problem of low accuracy of text-to-speech conversion for Chinese character sentences involving polyphonic characters in the prior art .
- the present application provides a polyphonic character prediction method, which includes the following steps:
- the feature information is input to a target polyphone prediction model, and the target pronunciation of the polyphone in the text to be converted is output.
- the step of acquiring the text to be converted and detecting whether there are polyphones in the text to be converted includes:
- the step of obtaining feature information of the text to be converted includes:
- the attention mechanism is used to obtain feature information of the text to be converted in parallel.
- the target polyphone prediction model includes an encoder and a decoder
- the step of inputting the feature information into the target polyphone prediction model, and outputting the target pronunciation of the polyphone in the text to be converted includes:
- the content vector is decoded by the decoder, and the target pronunciation of the polyphone in the text to be converted is output.
- the step of training a preset polyphonic character prediction model based on the iterative training method through the training text and the original pronunciation corresponding to the training text to obtain the target polyphonic character prediction model includes:
- map value is greater than or equal to a preset threshold, use the preset polyphone prediction model as the target polyphone prediction model;
- map value is less than the preset threshold value, parameter adjustment is performed on the preset polyphone prediction model to obtain a new polyphone prediction model;
- the new polyphone prediction model is used as the preset polyphone prediction model, and the step of inputting the feature information into the preset polyphone prediction model is performed to obtain the prediction result of each polyphone in the training text.
- the feature information includes one or more of word vectors, character vectors, and part-of-speech feature vectors.
- the present application also provides a polyphone prediction device, the polyphone prediction device includes:
- An acquiring module used to acquire training text containing polyphonic characters and the original pronunciation of the polyphonic characters
- the training module is used to train a preset polyphonic word prediction model based on the iterative training method through the training text and the original pronunciation of the polyphonic word to obtain the target polyphonic word prediction model;
- the detection module is used to obtain the text to be converted and detect whether there are polyphonic characters in the text to be converted;
- a feature information obtaining module configured to obtain feature information of the text to be converted if there are polyphonic characters in the text to be converted;
- the prediction module is configured to input the characteristic information into a target polyphone prediction model, and output the target pronunciation of the polyphone in the text to be converted.
- the present application also provides a polyphonic word prediction device.
- the polyphonic word prediction device includes: a memory, a processor, and a polyphonic word prediction stored in the memory and running on the processor.
- the present application also provides a non-volatile computer-readable storage medium.
- the computer-readable storage medium stores a polyphonic word prediction program, which is implemented when the polyphonic word prediction program is executed by a processor The steps of the polyphonic word prediction method as described above.
- the training text containing the polyphonic character and the original pronunciation of the polyphonic character are obtained; through the training text and the original pronunciation of the polyphonic character, a preset polyphonic word prediction model is trained based on iterative training to obtain Target polyphone prediction model; obtain the text to be converted, and detect whether there are polyphone words in the text to be converted; if there are polyphone characters in the text to be converted, obtain feature information of the text to be converted; The information is input to the target polyphone prediction model, and the target pronunciation of the polyphone in the text to be converted is output.
- the pronunciation of the polyphone in the text to be converted is predicted by the target polyphone word prediction model, which improves the accuracy of predicting the polyphone word.
- FIG. 1 is a schematic diagram of the structure of a multi-phonetic word prediction device in a hardware operating environment involved in a solution of an embodiment of the application;
- FIG. 2 is a schematic flowchart of a first embodiment of a method for predicting polyphones according to this application;
- FIG. 3 is a schematic diagram of the results of a sequence-to-sequence model in an embodiment of the method for predicting polyphones according to the present application;
- FIG. 4 is a schematic diagram of functional modules of a first embodiment of a polyphone prediction device according to this application.
- FIG. 1 is a schematic diagram of the structure of a polyphone prediction device in a hardware operating environment involved in a solution of an embodiment of the application.
- the polyphonic word prediction device in the embodiment of this application may be a PC, or a terminal device such as a smart phone, a tablet computer, or a portable computer.
- the polyphone prediction device may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, and a communication bus 1002.
- the communication bus 1002 is used to implement connection and communication between these components.
- the user interface 1003 may include a display screen (Display) and an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
- the network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface).
- the memory 1005 may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as a magnetic disk memory.
- the memory 1005 may also be a storage device independent of the aforementioned processor 1001.
- the structure of the polyphone prediction device shown in FIG. 1 does not constitute a limitation on the polyphone prediction device, and may include more or less components than shown in the figure, or a combination of certain components, or different components.
- the layout of the components does not constitute a limitation on the polyphone prediction device, and may include more or less components than shown in the figure, or a combination of certain components, or different components. The layout of the components.
- a memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a polyphonic word prediction program.
- the network interface 1004 is mainly used to connect to a back-end server and perform data communication with the back-end server;
- the user interface 1003 is mainly used to connect to a client (user side) and perform data communication with the client;
- the processor 1001 may be used to call the polyphone prediction program stored in the memory 1005, and execute the steps of the following polyphone prediction methods in each embodiment.
- FIG. 2 is a schematic flowchart of a first embodiment of a polyphone word prediction method according to the present application.
- the polyphonic character prediction method of this application includes:
- Step S10 Obtain the training text containing the polyphone and the original pronunciation of the polyphone;
- the preset polyphonic word prediction model needs to be trained first, so as to obtain the target polyphonic word prediction model, including: a sentence containing one or more polyphonic characters with a word count of 10 to 15 words (training text) And the original pronunciation (that is, the correct pronunciation) of the polyphonic character in the sentence is a set of training data.
- the training data can be used as much as possible, for example, 1000 sets of training data can be obtained.
- Step S20 training a preset polyphonic word prediction model based on the iterative training method based on the training text and the original pronunciation of the polyphonic character to obtain a target polyphonic word prediction model.
- the preset polyphonic word prediction model selects the sequence-to-sequence model.
- the sequence-to-sequence model is an upgraded version of the cyclic neural network, which combines two cyclic neural networks.
- One neural network encoder
- the other recurrent neural network decoder
- the selected sequence-to-sequence model encoder and decoder parameter values are all initial values.
- the training process is the process of adjusting parameter values.
- the process of iterative training is: using an attention mechanism to obtain feature information of the training text in parallel; inputting the feature information into a preset polyphonic word prediction model to obtain the prediction result of the polyphonic word in the training text ; Determine whether the prediction result of the polyphonic word is consistent with the corresponding original pronunciation, and obtain the map value according to the judgment result; detect whether the map value is greater than or equal to the preset threshold; if the map value is greater than or equal to the preset threshold , The preset polyphone prediction model is used as the target polyphone prediction model; if the map value is less than the preset threshold, parameter adjustment is performed on the preset polyphone prediction model to obtain a new polyphone prediction model; The new polyphone prediction model is used as the preset polyphone prediction model, and the step of inputting the feature information into the preset polyphone prediction model is performed to obtain the prediction result of each polyphone in the training text.
- the training data set 1 is the training text 1 and the original pronunciation 1 of the polyphonic characters therein (that is, the correct pronunciation of the polyphonic characters in the training text 1 in the training text 1 )
- the training data set 2 is the training text 2 and the original pronunciation of the polyphonic word 2 (that is, the correct pronunciation of the polyphonic word in the training text 2 in the training text 2)
- the training data set 1000 is the training text 1000 And the original pronunciation 1000 of the polyphonic character therein (that is, the correct pronunciation of the polyphonic character in the training text 1000 in the training text 1000). Then, the feature information of the training text 1 to the training text 1000 are respectively obtained, and the feature information 1 to the feature information 1000 are obtained.
- the attention mechanism is used to obtain feature information of training text 1 to training text 1000 in parallel, and feature information 1 to feature information 1000 are obtained. Then, the feature information 1 to feature information 1000 are respectively input into the preset polyphonic word prediction model, and the prediction result corresponding to feature information 1, the prediction result corresponding to feature information 2 is obtained, the prediction result corresponding to feature information 1000 is obtained 1000, and then compare whether the prediction result 1 is consistent with the original pronunciation 1, whether the prediction result 2 is consistent with the original pronunciation 2... Whether the prediction result 1000 is consistent with the original pronunciation 1000. If the coincidence occurs X times, the current map value is 0.001X. The map reflects the pros and cons of the polyphone prediction model. The higher the map, the more accurate the prediction result of the current polyphone prediction model.
- a higher threshold may be set, such as 90%. If according to the above steps, the calculated map value is greater than or equal to 90%, the current polyphonic word prediction model is used as the target polyphonic word prediction model; otherwise, the encoder (cyclic neural network 1) and decoder in the sequence-to-sequence model Adjust the parameter values of (Circular Neural Network 2) (The implementation of parameter adjustment can refer to the existing technology.
- the neural network is essentially a calculation process. After receiving the input signal at the front end, it undergoes a layer of complex calculations at the end. Output the result.
- the current map value is 0.001Y. If 0.001Y is greater than or equal to 90%, the current polyphone prediction model is used as the target polyphone prediction model. Otherwise, the above steps are repeated until the map value is greater than When it is equal to the preset threshold, the corresponding polyphone prediction model is used as the target polyphone prediction model.
- Step S30 Obtain the text to be converted, and detect whether there are polyphonic characters in the text to be converted;
- the characters if they have two or more pronunciations, they are called polyphonic characters.
- the text to be converted After the text to be converted is obtained, it is detected whether one or more characters in the text to be converted have two or more pronunciations. If they exist, the text is polyphonic, that is, it is detected that there are polyphonic characters in the text to be converted .
- the text to be converted is: "I saw a tree", in which the word "kan” has two pronunciations of kan ( ⁇ ) and kan (four tones), that is, there is a polyphonic word " ⁇ " in the text to be converted.
- the word “Zhuan” has two pronunciations, chuan (three tones) and zhuan (four tones), that is, there is a polyphonic word " ⁇ " in the text to be converted.
- Step S40 if there are polyphonic characters in the text to be converted, obtain characteristic information of the text to be converted;
- Feature information refers to some information that can be used for machine recognition.
- feature information includes one or more of word vectors or word vectors, part-of-speech feature vectors, and word boundary feature vectors obtained with the granularity of words or characters; among them, In the case where multiple feature vectors are obtained, the multiple feature vectors are spliced to obtain feature information.
- the word vector can be an n-dimensional word vector, and the word vector can be a vector in the form of one-hot encoding.
- the construction method of one-hot encoding can be that if the size of the text is m, the vector corresponding to each word or character is expressed as m-dimensional, and the vector corresponding to the i-th word in the text is expressed as the i-th dimension is 1. , All other vectors with dimension 0.
- the feature data of the text includes a total of 7 feature vectors, and the dimension of each feature vector The number is the same, it is the size of the word table. Each word corresponds to the dimension of 1 according to the position in the word table. All other dimensions are 0.
- the attention mechanism is used to obtain the feature information of the text to be converted in parallel, and the attention mechanism is used to obtain the feature information of the text to be converted in parallel.
- the attention mechanism is used to obtain the feature information of the text to be converted in parallel.
- fewer computing resources are used. It can better capture the short-distance dependence information and the long-distance dependence information between each word in the text, thereby improving the prediction efficiency and accuracy.
- Step S50 Input the feature information into a target polyphone prediction model, and output the target pronunciation of the polyphone in the text to be converted.
- the feature information of the text to be converted is input into the trained target polyphonic word prediction model, and the prediction result of the polyphonic word can be obtained through the calculation process preset by the target polyphonic word prediction model and calculation based on the feature information. And use the prediction result as the target pronunciation of polyphonic characters.
- the fixed pinyin is directly used as the corresponding conversion result to obtain the corresponding pinyin of the text to be converted.
- the training text containing the polyphonic character and the original pronunciation of the polyphonic character are obtained; through the training text and the original pronunciation of the polyphonic character, a preset polyphonic word prediction model is trained based on an iterative training method, Obtain the target polyphone prediction model; obtain the text to be converted, and detect whether there are polyphone characters in the text to be converted; if there are polyphone characters in the text to be converted, obtain the characteristic information of the text to be converted; The characteristic information is input to the target polyphone prediction model, and the target pronunciation of the polyphone in the text to be converted is output.
- the target polyphone word prediction model predicts the pronunciation of the polyphone word in the text to be converted, which improves the accuracy of predicting the polyphone word.
- step S30 includes:
- a polyphonic character dictionary can be preset, and the polyphonic character dictionary contains polyphonic characters in Chinese characters (or polyphonic characters commonly used in Chinese characters).
- the polyphonic character dictionary contains polyphonic characters in Chinese characters (or polyphonic characters commonly used in Chinese characters).
- the target polyphonic word prediction model includes an encoder and a decoder, and step 50 includes:
- the feature information is encoded by the encoder to obtain a content vector; the content vector is decoded by the decoder to output the target pronunciation of the polyphone in the text to be converted.
- FIG. 3 is a schematic diagram of the result of the sequence-to-sequence model in an embodiment of the polyphone prediction method of this application.
- the sequence-to-sequence model is an upgraded version of the recurrent neural network, which combines two recurrent neural networks.
- One neural network encoder
- the other recurrent neural network decoder
- These two processes are called encoding and decoding processes respectively.
- the encoding process actually uses the memory function of the cyclic neural network, and the word vectors are sequentially input into the network through the sequence relationship of the context.
- this sentence is a sequence, and every word in this sequence is known, and the decoding process is equivalent to knowing nothing, input it into the network to get the first output as this sentence
- the first word of the words, then the first word obtained is used as the next input of the network, and the output obtained is used as the second word, continuously looping, in this way to get the pinyin of the final network output (that is, the prediction result) .
- the training text or the feature information of the text to be converted includes four, then feature information 1 to feature information 4 are sequentially input to the encoder for encoding, The content vector C is obtained, and then C is given to the decoder for decoding, and the prediction result is obtained.
- the prediction result is the pinyin of the polyphone in the training text or the text to be converted.
- FIG. 4 is a schematic diagram of functional modules of a first embodiment of a polyphone prediction apparatus according to the present application.
- the polyphone prediction device includes:
- the obtaining module 10 is used to obtain training text containing polyphonic characters and the original pronunciation of the polyphonic characters;
- the training module 20 is configured to train a preset polyphonic word prediction model based on the iterative training method through the training text and the original pronunciation of the polyphonic word to obtain a target polyphonic word prediction model;
- the detection module 30 is configured to obtain the text to be converted and detect whether there are polyphonic characters in the text to be converted;
- the feature information obtaining module 40 is configured to obtain feature information of the text to be converted if there are polyphonic characters in the text to be converted;
- the prediction module 50 is configured to input the feature information into a target polyphonic character prediction model, and output the target pronunciation of the polyphonic character in the text to be converted.
- the training text containing the polyphonic character and the original pronunciation of the polyphonic character are obtained; through the training text and the original pronunciation of the polyphonic character, a preset polyphonic word prediction model is trained based on an iterative training method, Obtain the target polyphone prediction model; obtain the text to be converted, and detect whether there are polyphone characters in the text to be converted; if there are polyphone characters in the text to be converted, obtain the characteristic information of the text to be converted; The characteristic information is input to the target polyphone prediction model, and the target pronunciation of the polyphone in the text to be converted is output.
- the target polyphone word prediction model predicts the pronunciation of the polyphone word in the text to be converted, which improves the accuracy of predicting the polyphone word.
- an embodiment of the present application also proposes a non-volatile computer-readable storage medium.
- the computer-readable storage medium stores a polyphonic word prediction program.
- the polyphonic word prediction program is executed by a processor, the above-mentioned polyphonic word is realized.
- the steps of the various embodiments of the prediction method are described.
- the characteristic information is input to a target polyphone prediction model, and the target pronunciation of the polyphone in the text to be converted is output.
- the following steps of the polyphone prediction method are also implemented:
- the following steps of the polyphone prediction method are also implemented:
- the content vector is decoded by the decoder, and the target pronunciation of the polyphone in the text to be converted is output.
- the following steps of the polyphone prediction method are also implemented:
- map value is greater than or equal to a preset threshold, use the preset polyphone prediction model as the target polyphone prediction model;
- map value is less than the preset threshold, parameter adjustment is performed on the preset polyphone prediction model to obtain a new polyphone prediction model;
- the new polyphone prediction model is used as the preset polyphone prediction model, and the step of inputting the feature information into the preset polyphone prediction model is performed to obtain the prediction result of each polyphone in the training text.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
本申请涉及人工智能技术领域,公开了一种多音字预测方法、装置、设备及计算机可读存储介质,多音字预测方法包括:获取包含多音字的训练文本以及所述多音字的原始发音;通过所述训练文本以及所述多音字的原始发音,基于迭代训练的方式对预置多音字预测模型进行训练,得到目标多音字预测模型;获取待转换文本,并检测所述待转换文本中是否存在多音字;若所述待转换文本中存在多音字,则获取所述待转换文本的特征信息;将所述特征信息输入目标多音字预测模型,输出所述多音字在所述待转换文本中的目标发音。通过本申请,根据待转换文本的特征信息,通过目标多音字预测模型预测多音字在待转换文本中的读音,提高了对多音字进行预测的准确度。
Description
本申请要求于2019年5月16日提交中国专利局、申请号为201910407702.4、发明名称为“多音字预测方法、装置、设备及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在申请中。
本申请涉及人工智能技术领域,尤其涉及一种多音字预测方法、装置、设备及计算机可读存储介质。
语音合成,又称文语转换(Text to Speech)技术,能将任意文字信息实时转化为标准流畅的语音朗读出来,相当于给机器装上了人工嘴巴。它涉及声学、语言学、数字信号处理、计算机科学等多个学科技术,是中文信息处理领域的一项前沿技术,解决的主要问题就是如何将文字信息转化为可听的声音信息,也即让机器像人一样开口说话。
对于汉字而言,汉字中的多音字约有一千个,其中常见多音字约200~300个。发明人意识到由于多音字在不用的语境下发音不同,导致在对包含多音字的汉字语句进行文语转换时,往往无法对多音字进行正确的转换,从而极大的影响了听者对合成声音语义的理解情况。
发明内容
本申请的主要目的在于提供一种多音字预测方法、装置、设备及计算机可读存储介质,旨在解决现有技术中对涉及多音字的汉字语句进行文语转换的准确度较低的技术问题。
为实现上述目的,本申请提供一种多音字预测方法,所述多音字预测方法包括以下步骤:
获取包含多音字的训练文本以及所述多音字的原始发音;
通过所述训练文本以及所述多音字的原始发音,基于迭代训练的方式对预置多音字预测模型进行训练,得到目标多音字预测模型;
获取待转换文本,并检测所述待转换文本中是否存在多音字;
若所述待转换文本中存在多音字,则获取所述待转换文本的特征信息;
将所述特征信息输入目标多音字预测模型,输出所述多音字在所述待转换文本中的目标发音。
可选地,所述获取待转换文本,并检测所述待转换文本中是否存在多音字的步骤包括:
获取待转换文本,并检测所述待转换文本中是否存在归属于预置的多音字字典的目标文字;
若存在归属于预置的多音字字典的目标文字,则确定所述待转换文本中存在多音字。
可选地,所述若所述待转换文本中存在多音字,则获取所述待转换文本的特征信息的步骤包括:
当所述待转换文本中存在多音字时,采用注意力机制并行式获取所述待转换文本的特征信息。
可选地,所述目标多音字预测模型包括编码器和解码器,将所述特征信息输入目标多音字预测模型,输出所述多音字在所述待转换文本中的目标发音的步骤包括:
通过所述编码器对所述特征信息进行编码,得到内容向量;
通过所述解码器对所述内容向量进行解码,输出所述多音字在所述待转换文本中的目标发音。
可选地,所述通过所述训练文本以及所述训练文本对应的原始发音,基于迭代训练的方式对预置多音字预测模型进行训练,得到目标多音字预测模型的步骤包括:
采用注意力机制并行式获取所述训练文本的特征信息;
将所述特征信息输入预置多音字预测模型,得到所述训练文本中多音字的预测结果;
判断所述多音字的预测结果与其对应的原始发音是否一致,并根据判断结果,得到map值;
检测所述map值是否大于或等于预设阈值;
若所述map值大于或等于预设阈值,则以所述预置多音字预测模型作为目标多音字预测模型;
若所述map值小于预设阈值,则对所述预置多音字预测模型进行参数调 整,得到新的多音字预测模型;
将所述新的多音字预测模型作为预置多音字预测模型,并执行将所述特征信息输入预置多音字预测模型,得到所述训练文本中每个多音字的预测结果的步骤。
可选地,所述特征信息包括词向量、字向量、词性特征向量中的一种或多种。
此外,为实现上述目的,本申请还提供一种多音字预测装置,所述多音字预测装置包括:
获取模块,用于获取包含多音字的训练文本以及所述多音字的原始发音;
训练模块,用于通过所述训练文本以及所述多音字的原始发音,基于迭代训练的方式对预置多音字预测模型进行训练,得到目标多音字预测模型;
检测模块,用于获取待转换文本,并检测所述待转换文本中是否存在多音字;
特征信息获取模块,用于若所述待转换文本中存在多音字,则获取所述待转换文本的特征信息;
预测模块,用于将所述特征信息输入目标多音字预测模型,输出所述多音字在所述待转换文本中的目标发音。
此外,为实现上述目的,本申请还提供一种多音字预测设备,所述多音字预测设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的多音字预测程序,所述多音字预测程序被所述处理器执行时实现如上所述的多音字预测方法的步骤。
此外,为实现上述目的,本申请还提供一种非易失性计算机可读存储介质,所述计算机可读存储介质上存储有多音字预测程序,所述多音字预测程序被处理器执行时实现如上所述的多音字预测方法的步骤。
本申请中,获取包含多音字的训练文本以及所述多音字的原始发音;通过所述训练文本以及所述多音字的原始发音,基于迭代训练的方式对预置多 音字预测模型进行训练,得到目标多音字预测模型;获取待转换文本,并检测所述待转换文本中是否存在多音字;若所述待转换文本中存在多音字,则获取所述待转换文本的特征信息;将所述特征信息输入目标多音字预测模型,输出所述多音字在所述待转换文本中的目标发音。通过本申请,根据待转换文本的特征信息,通过目标多音字预测模型预测多音字在待转换文本中的读音,提高了对多音字进行预测的准确度。
图1为本申请实施例方案涉及的硬件运行环境的多音字预测设备结构示意图;
图2为本申请多音字预测方法第一实施例的流程示意图;
图3为本申请多音字预测方法一实施例中序列到序列模型的结果示意图;
图4为本申请多音字预测装置第一实施例的功能模块示意图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
如图1所示,图1为本申请实施例方案涉及的硬件运行环境的多音字预测设备结构示意图。
本申请实施例多音字预测设备可以是PC,也可以是智能手机、平板电脑、便携计算机等终端设备。
如图1所示,该多音字预测设备可以包括:处理器1001,例如CPU,网络接口1004,用户接口1003,存储器1005,通信总线1002。其中,通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard),可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是稳定的存储器(non-volatile memory),例如磁盘存储器。存储器1005可选的还可以是独立 于前述处理器1001的存储装置。
本领域技术人员可以理解,图1中示出的多音字预测设备结构并不构成对多音字预测设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
如图1所示,作为一种计算机存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及多音字预测程序。
在图1所示的多音字预测设备中,网络接口1004主要用于连接后台服务器,与后台服务器进行数据通信;用户接口1003主要用于连接客户端(用户端),与客户端进行数据通信;而处理器1001可以用于调用存储器1005中存储的多音字预测程序,并执行以下多音字预测方法的各个实施例的步骤。
参照图2,图2为本申请多音字预测方法第一实施例的流程示意图。
在本申请多音字预测方法第一实施例中,本申请多音字预测方法包括:
步骤S10,获取包含多音字的训练文本以及所述多音字的原始发音;
本实施例中,首先需要对预置多音字预测模型进行训练,从而得到目标多音字预测模型,包括:以一句包含一个或多个多音字的字数在10至15个字的句子(训练文本)以及该多音字在句子中的原始发音(即正确发音)为一组训练数据。为了提高目标多音字预测模型的性能,使用的训练数据可以尽可能多一些,例如获取1000组训练数据。
步骤S20,通过所述训练文本以及所述多音字的原始发音,基于迭代训练的方式对预置多音字预测模型进行训练,得到目标多音字预测模型。
本实施例中,预置多音字预测模型选取序列到序列模型,序列到序列模型是循环神经网络的升级版,其联合了两个循环神经网络。一个神经网络(编码器)负责接收源句子的特征信息;另一个循环神经网络(解码器)负责将句子输出成对应的拼音。本实施例中,选取的序列到序列模型中编码器和解码器的参数值均为初始值。训练过程即调整参数值的过程。
一实施例中,迭代训练的过程为:采用注意力机制并行式获取所述训练文本的特征信息;将所述特征信息输入预置多音字预测模型,得到所述训练文本中多音字的预测结果;判断所述多音字的预测结果与其对应的原始发音是否一致,并根据判断结果,得到map值;检测所述map值是否大于或等于 预设阈值;若所述map值大于或等于预设阈值,则以所述预置多音字预测模型作为目标多音字预测模型;若所述map值小于预设阈值,则对所述预置多音字预测模型进行参数调整,得到新的多音字预测模型;将所述新的多音字预测模型作为预置多音字预测模型,并执行将所述特征信息输入预置多音字预测模型,得到所述训练文本中每个多音字的预测结果的步骤。
本实施例中,若用于训练的数据有1000组,其中,训练数据组1为训练文本1以及其中多音字的原始发音1(即训练文本1中的多音字在训练文本1中的正确发音),训练数据组2为训练文本2以及其中多音字的原始发音2(即训练文本2中的多音字在训练文本2中的正确发音)......训练数据组1000为训练文本1000以及其中多音字的原始发音1000(即训练文本1000中的多音字在训练文本1000中的正确发音)。则分别获取训练文本1~训练文本1000的特征信息,得到特征信息1至特征信息1000。本实施例中,采用注意力机制并行式获取训练文本1~训练文本1000的特征信息,得到特征信息1至特征信息1000。然后,分别将特征信息1至特征信息1000输入预置多音字预测模型,得到特征信息1对应的预测结果1、特征信息2对应的预测结果2......特征信息1000对应的预测结果1000,然后比较预测结果1与原始发音1是否一致、预测结果2与原始发音2是否一致......预测结果1000与原始发音1000是否一致。若一致的情况出现X次,则当前的map值为0.001X。map反映了多音字预测模型的优劣,map越高,说明当前的多音字预测模型的预测结果越准确。本实施例中,为了使得训练得到的目标多音字预测模型更优秀,可设置一较高的阈值,例如90%。若根据上述步骤,计算得到的map值大于或等于90%,则以当前的多音字预测模型作为目标多音字预测模型,否则,对序列到序列模型中编码器(循环神经网络1)和解码器(循环神经网络2)的参数值进行调整(参数调整的实施方式可参考现有技术,神经网络本质上是一个计算流程,在前端接收输入信号后,经过一层层复杂的运算,在最末端输出结果。然后将计算结果和正确结果相比较,得到误差,再根据误差通过相应计算方法改进网络内部的相关参数,使得网络下次再接收到同样的数据时,最终计算输出得到的结果与正确结果之间的误差能越来越小),得到新的序列到序列模型,然后再次分别将特征信息1至特征信息1000输入预置多音字预测模型,得到特征信息1对应的预测结果1`、特征信息2对应的预测结果2`...... 特征信息1000对应的预测结果1000`,然后比较预测结果1`与原始发音1是否一致、预测结果2`与原始发音2是否一致......预测结果1000`与原始发音1000是否一致。若一致的情况出现Y次,则当前的map值为0.001Y,若0.001Y大于或等于90%,则以当前的多音字预测模型作为目标多音字预测模型,否则重复上述步骤,直至map值大于或等于预设阈值时,将对应的多音字预测模型作为目标多音字预测模型。
步骤S30,获取待转换文本,并检测所述待转换文本中是否存在多音字;
本实施例中,对于一些文字而言,若该文字具备两种或两种以上的读音,则被称为多音字。在获取到待转换文本后,检测待转换文本中是否存在一个或多个文字具备两种或两种以上的读音,若存在,则该文字为多音字,即检测到待转换文本中存在多音字。例如,待转换文本为:“我看见了一棵树”,其中,“看”字有kan(一声)、kan(四声)这两种读音,即待转换文本中存在多音字“看”。若待转换文本为:“春节是传统节日”,其中“传”字有chuan(三声)、zhuan(四声)这两种读音,即待转换文本中存在多音字“传”。
步骤S40,若所述待转换文本中存在多音字,则获取所述待转换文本的特征信息;
本实施例中,若待转换文本中存在多音字,则获取待转换文本的特征信息。特征信息指可用于机器识别的一些信息,具体的,特征信息包括以词或字为粒度而得到的词向量或字向量、词性特征向量和词边界特征向量中的一种或多种;其中,在获取到多个特征向量到情况下,将多个特征向量进行拼接得到特征信息。其中,词向量可以为n维词向量,字向量可以为独热编码(one-hot)形式的向量。需要说明的是,独热编码的构造方法可以是,如果文本的规模为m,每个词或者字对应的向量表示为m维,文本中第i个词对应的向量表示为第i维为1,其他所有维为0的向量。以待转换文本为“我看见了一棵树”为例,若只以字的one-hot向量作为特征信息时,则该文本的特征数据包括一共7个特征向量,其中每个特征向量的维数一致,都是字表的大小,每个字按照在字表中的位置对应维为1其他所有维为0,则“看”的字向量是“0100000”,“了”的字向量是“0001000”。本实施例中,采用注意力机制并行式获取待转换文本的特征信息,采用注意力机制并行式获取待转换文本的特征信息,相较于CNN网络或RNN网络,能使用更少的计算资源, 更好的捕捉到文本中各个字词间的短距离依赖信息和长距离依赖信息,从而提升预测效率和准确率。
步骤S50,将所述特征信息输入目标多音字预测模型,输出所述多音字在所述待转换文本中的目标发音。
本实施例中,将待转换文本的特征信息输入训练好的目标多音字预测模型,通过目标多音字预测模型预设好的计算流程,基于特征信息进行计算,即可得到多音字的预测结果,并将该预测结果作为多音字的目标发音。非多音字,则直接以其固定拼音作为其对应的转换结果,从而得到待转换文本对应的拼音。
本实施例中,获取包含多音字的训练文本以及所述多音字的原始发音;通过所述训练文本以及所述多音字的原始发音,基于迭代训练的方式对预置多音字预测模型进行训练,得到目标多音字预测模型;获取待转换文本,并检测所述待转换文本中是否存在多音字;若所述待转换文本中存在多音字,则获取所述待转换文本的特征信息;将所述特征信息输入目标多音字预测模型,输出所述多音字在所述待转换文本中的目标发音。通过本实施例,根据待转换文本的特征信息,通过目标多音字预测模型预测多音字在待转换文本中的读音,提高了对多音字进行预测的准确度。
进一步地,在本申请多音字预测方法一实施例中,步骤S30包括:
获取待转换文本,并检测所述待转换文本中是否存在归属于预置的多音字字典的目标文字;
本实施例中,可预先设置多音字字典,该多音字字典中收录了汉字中的多音字(或汉字中常用的多音字)。在获取到待转换文本后,分别检索待转换文本中的每个字是否存在于预置的多音字字典。例如,待转换文本为“我看见了一棵树”,则分别检测“我”、“看”、“见”、“了”、“一”、“棵”、“树”这七个字是否在预置的多音字字典中存在。
若存在归属于预置的多音字字典的目标文字,则确定所述待转换文本中存在多音字。
本实施例中,以待转换文本为“我看见了一棵树”为例,通过检测,发现“看”归属于预置的多音字字典,则“看”为多音字,即待转换文本中存 在多音字。
进一步地,在本申请多音字预测方法一实施例中,目标多音字预测模型包括编码器和解码器,步骤50包括:
通过所述编码器对所述特征信息进行编码,得到内容向量;通过所述解码器对所述内容向量进行解码,输出所述多音字在所述待转换文本中的目标发音。
本实施例中,参照图3,图3为本申请多音字预测方法一实施例中序列到序列模型的结果示意图。如图3所示,序列到序列模型是循环神经网络的升级版,其联合了两个循环神经网络。一个神经网络(编码器)负责接收源句子的特征信息;另一个循环神经网络(解码器)负责将句子输出成翻译的语言。这两个过程分别称为编码和解码的过程。编码过程实际上使用了循环神经网络记忆的功能,通过上下文的序列关系,将词向量依次输入网络。对于循环神经网络,每一次网络都会输出一个结果,但是编码的不同之处在于,其只保留最后一个隐藏状态,相当于将整句话浓缩在一起,将其存为一个内容向量供后面的解码器使用。解码和编码网络结构几乎是一样的,唯一不同的是在解码过程中,是根据前面的结果来得到后面的结果。编码过程中输入一句话,这一句话就是一个序列,而且这个序列中的每个词都是已知的,而解码过程相当于什么也不知道,将其输入网络得到第一个输出作为这句话的第一个词,接着通过得到的第一个词作为网络的下一个输入,得到的输出作为第二个词,不断循环,通过这种方式来得到最后网络输出的拼音(即预测结果)。本实施例中,若以若只以字的one-hot向量作为特征信息,且训练文本或待转换文本的特征信息包括四个,则依次将特征信息1~特征信息4输入编码器进行编码,得到内容向量C,将后将C给到解码器进行解码,得到预测结果。该预测结果即训练文本或待转换文本中的多音字的拼音。
参照图4,图4为本申请多音字预测装置第一实施例的功能模块示意图。
在本申请多音字预测装置第一实施例中,多音字预测装置包括:
获取模块10,用于获取包含多音字的训练文本以及所述多音字的原始发音;
训练模块20,用于通过所述训练文本以及所述多音字的原始发音,基于迭代训练的方式对预置多音字预测模型进行训练,得到目标多音字预测模型;
检测模块30,用于获取待转换文本,并检测所述待转换文本中是否存在多音字;
特征信息获取模块40,用于若所述待转换文本中存在多音字,则获取所述待转换文本的特征信息;
预测模块50,用于将所述特征信息输入目标多音字预测模型,输出所述多音字在所述待转换文本中的目标发音。
本实施例中,获取包含多音字的训练文本以及所述多音字的原始发音;通过所述训练文本以及所述多音字的原始发音,基于迭代训练的方式对预置多音字预测模型进行训练,得到目标多音字预测模型;获取待转换文本,并检测所述待转换文本中是否存在多音字;若所述待转换文本中存在多音字,则获取所述待转换文本的特征信息;将所述特征信息输入目标多音字预测模型,输出所述多音字在所述待转换文本中的目标发音。通过本实施例,根据待转换文本的特征信息,通过目标多音字预测模型预测多音字在待转换文本中的读音,提高了对多音字进行预测的准确度。
此外,本申请实施例还提出一种非易失性计算机可读存储介质,所述计算机可读存储介质上存储有多音字预测程序,所述多音字预测程序被处理器执行时实现如上多音字预测方法的各个实施例的步骤。
本申请计算机可读存储介质的具体实施例与上述多音字预测方法的各个实施例基本相同,在此不做赘述。
可选的,在一具体实施例中,所述多音字预测程序被处理器执行时实现如下多音字预测方法的步骤:
获取包含多音字的训练文本以及所述多音字的原始发音;
通过所述训练文本以及所述多音字的原始发音,基于迭代训练的方式对预置多音字预测模型进行训练,得到目标多音字预测模型;
获取待转换文本,并检测所述待转换文本中是否存在多音字;
若所述待转换文本中存在多音字,则获取所述待转换文本的特征信息;
将所述特征信息输入目标多音字预测模型,输出所述多音字在所述待转 换文本中的目标发音。
可选的,在一具体实施例中,所述多音字预测程序被处理器执行时还实现如下多音字预测方法的步骤:
获取待转换文本,并检测所述待转换文本中是否存在归属于预置的多音字字典的目标文字;
若存在归属于预置的多音字字典的目标文字,则确定所述待转换文本中存在多音字。
可选的,在一具体实施例中,所述多音字预测程序被处理器执行时还实现如下多音字预测方法的步骤:
通过所述编码器对所述特征信息进行编码,得到内容向量;
通过所述解码器对所述内容向量进行解码,输出所述多音字在所述待转换文本中的目标发音。
可选的,在一具体实施例中,所述多音字预测程序被处理器执行时还实现如下多音字预测方法的步骤:
采用注意力机制并行式获取所述训练文本的特征信息;
将所述特征信息输入预置多音字预测模型,得到所述训练文本中多音字的预测结果;
判断所述多音字的预测结果与其对应的原始发音是否一致,并根据判断结果,得到map值;
检测所述map值是否大于或等于预设阈值;
若所述map值大于或等于预设阈值,则以所述预置多音字预测模型作为目标多音字预测模型;
若所述map值小于预设阈值,则对所述预置多音字预测模型进行参数调整,得到新的多音字预测模型;
将所述新的多音字预测模型作为预置多音字预测模型,并执行将所述特征信息输入预置多音字预测模型,得到所述训练文本中每个多音字的预测结果的步骤。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者系 统不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者系统所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者系统中还存在另外的相同要素。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。
Claims (20)
- 一种多音字预测方法,所述多音字预测方法包括以下步骤:获取包含多音字的训练文本以及所述多音字的原始发音;通过所述训练文本以及所述多音字的原始发音,基于迭代训练的方式对预置多音字预测模型进行训练,得到目标多音字预测模型;获取待转换文本,并检测所述待转换文本中是否存在多音字;若所述待转换文本中存在多音字,则获取所述待转换文本的特征信息;将所述特征信息输入目标多音字预测模型,输出所述多音字在所述待转换文本中的目标发音。
- 如权利要求1所述的多音字预测方法,所述获取待转换文本,并检测所述待转换文本中是否存在多音字的步骤包括:获取待转换文本,并检测所述待转换文本中是否存在归属于预置的多音字字典的目标文字;若存在归属于预置的多音字字典的目标文字,则确定所述待转换文本中存在多音字。
- 如权利要求1所述的多音字预测方法,所述若所述待转换文本中存在多音字,则获取所述待转换文本的特征信息的步骤包括:当所述待转换文本中存在多音字时,采用注意力机制并行式获取所述待转换文本的特征信息。
- 如权利要求1所述的多音字预测方法,所述目标多音字预测模型包括编码器和解码器,将所述特征信息输入目标多音字预测模型,输出所述多音字在所述待转换文本中的目标发音的步骤包括:通过所述编码器对所述特征信息进行编码,得到内容向量;通过所述解码器对所述内容向量进行解码,输出所述多音字在所述待转换文本中的目标发音。
- 如权利要求1所述的多音字预测方法,所述通过所述训练文本以及所述训练文本对应的原始发音,基于迭代训练的方式对预置多音字预测模型进行训练,得到目标多音字预测模型的步骤包括:采用注意力机制并行式获取所述训练文本的特征信息;将所述特征信息输入预置多音字预测模型,得到所述训练文本中多音字的预测结果;判断所述多音字的预测结果与其对应的原始发音是否一致,并根据判断结果,得到map值;检测所述map值是否大于或等于预设阈值;若所述map值大于或等于预设阈值,则以所述预置多音字预测模型作为目标多音字预测模型;若所述map值小于预设阈值,则对所述预置多音字预测模型进行参数调整,得到新的多音字预测模型;将所述新的多音字预测模型作为预置多音字预测模型,并执行将所述特征信息输入预置多音字预测模型,得到所述训练文本中每个多音字的预测结果的步骤。
- 如权利要求1所述的多音字预测方法,所述特征信息包括词向量、字向量、词性特征向量中的一种或多种。
- 一种多音字预测装置,所述多音字预测装置包括:获取模块,用于获取包含多音字的训练文本以及所述多音字的原始发音;训练模块,用于通过所述训练文本以及所述多音字的原始发音,基于迭代训练的方式对预置多音字预测模型进行训练,得到目标多音字预测模型;检测模块,用于获取待转换文本,并检测所述待转换文本中是否存在多音字;特征信息获取模块,用于若所述待转换文本中存在多音字,则获取所述待转换文本的特征信息;预测模块,用于将所述特征信息输入目标多音字预测模型,输出所述多音字在所述待转换文本中的目标发音。
- 如权利要求7所述的多音字预测装置,所述检测模块包括:检测单元,用于获取待转换文本,并检测所述待转换文本中是否存在归属于预置的多音字字典的目标文字;判定单元,用于若存在归属于预置的多音字字典的目标文字,则确定所述待转换文本中存在多音字。
- 如权利要求7所述的多音字预测装置,所述特征信息获取模块包括:特征信息获取单元,用于当所述待转换文本中存在多音字时,采用注意力机制并行式获取所述待转换文本的特征信息。
- 如权利要求7所述的多音字预测装置,所述预测模块包括:编码单元,用于通过所述编码器对所述特征信息进行编码,得到内容向量;预测单元,用于通过所述解码器对所述内容向量进行解码,输出所述多音字在所述待转换文本中的目标发音。
- 如权利要求7所述的多音字预测装置,所述选路模块包括:获取单元,用于采用注意力机制并行式获取所述训练文本的特征信息;预测单元,用于将所述特征信息输入预置多音字预测模型,得到所述训练文本中多音字的预测结果;map值获取单元,用于判断所述多音字的预测结果与其对应的原始发音是否一致,并根据判断结果,得到map值;数值检测单元,用于检测所述map值是否大于或等于预设阈值;认定单元,用于若所述map值大于或等于预设阈值,则以所述预置多音字预测模型作为目标多音字预测模型;调整单元,用于若所述map值小于预设阈值,则对所述预置多音字预测模型进行参数调整,得到新的多音字预测模型;步骤跳转单元,用于将所述新的多音字预测模型作为预置多音字预测模型,并执行将所述特征信息输入预置多音字预测模型,得到所述训练文本中 每个多音字的预测结果的步骤。
- 一种多音字预测设备,所述多音字预测设备包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的多音字预测程序,所述多音字预测程序被所述处理器执行时实现如下多音字预测方法的步骤:获取包含多音字的训练文本以及所述多音字的原始发音;通过所述训练文本以及所述多音字的原始发音,基于迭代训练的方式对预置多音字预测模型进行训练,得到目标多音字预测模型;获取待转换文本,并检测所述待转换文本中是否存在多音字;若所述待转换文本中存在多音字,则获取所述待转换文本的特征信息;将所述特征信息输入目标多音字预测模型,输出所述多音字在所述待转换文本中的目标发音。
- 如权利要求12所述的多音字预测设备,所述多音字预测程序被所述处理器执行时还实现如下多音字预测方法的步骤:获取待转换文本,并检测所述待转换文本中是否存在归属于预置的多音字字典的目标文字;若存在归属于预置的多音字字典的目标文字,则确定所述待转换文本中存在多音字。
- 如权利要求12所述的多音字预测设备,所述多音字预测程序被所述处理器执行时还实现如下多音字预测方法的步骤:当所述待转换文本中存在多音字时,采用注意力机制并行式获取所述待转换文本的特征信息。
- 如权利要求12所述的多音字预测设备,所述多音字预测程序被所述处理器执行时还实现如下多音字预测方法的步骤:通过所述编码器对所述特征信息进行编码,得到内容向量;通过所述解码器对所述内容向量进行解码,输出所述多音字在所述待转换文本中的目标发音。
- 如权利要求12所述的多音字预测设备,所述多音字预测程序被所述处理器执行时还实现如下多音字预测方法的步骤:采用注意力机制并行式获取所述训练文本的特征信息;将所述特征信息输入预置多音字预测模型,得到所述训练文本中多音字的预测结果;判断所述多音字的预测结果与其对应的原始发音是否一致,并根据判断结果,得到map值;检测所述map值是否大于或等于预设阈值;若所述map值大于或等于预设阈值,则以所述预置多音字预测模型作为目标多音字预测模型;若所述map值小于预设阈值,则对所述预置多音字预测模型进行参数调整,得到新的多音字预测模型;将所述新的多音字预测模型作为预置多音字预测模型,并执行将所述特征信息输入预置多音字预测模型,得到所述训练文本中每个多音字的预测结果的步骤。
- 一种非易失性计算机可读存储介质,所述计算机可读存储介质上存储有多音字预测程序,所述多音字预测程序被处理器执行时实现如下多音字预测方法的步骤:获取包含多音字的训练文本以及所述多音字的原始发音;通过所述训练文本以及所述多音字的原始发音,基于迭代训练的方式对预置多音字预测模型进行训练,得到目标多音字预测模型;获取待转换文本,并检测所述待转换文本中是否存在多音字;若所述待转换文本中存在多音字,则获取所述待转换文本的特征信息;将所述特征信息输入目标多音字预测模型,输出所述多音字在所述待转换文本中的目标发音。
- 如权利要求17所述的非易失性计算机可读存储介质,所述多音字预测程序被处理器执行时还实现如下多音字预测方法的步骤:获取待转换文本,并检测所述待转换文本中是否存在归属于预置的多音字字典的目标文字;若存在归属于预置的多音字字典的目标文字,则确定所述待转换文本中存在多音字。
- 如权利要求17所述的非易失性计算机可读存储介质,所述多音字预测程序被处理器执行时还实现如下多音字预测方法的步骤:通过所述编码器对所述特征信息进行编码,得到内容向量;通过所述解码器对所述内容向量进行解码,输出所述多音字在所述待转换文本中的目标发音。
- 如权利要求17所述的非易失性计算机可读存储介质,所述多音字预测程序被处理器执行时还实现如下多音字预测方法的步骤:采用注意力机制并行式获取所述训练文本的特征信息;将所述特征信息输入预置多音字预测模型,得到所述训练文本中多音字的预测结果;判断所述多音字的预测结果与其对应的原始发音是否一致,并根据判断结果,得到map值;检测所述map值是否大于或等于预设阈值;若所述map值大于或等于预设阈值,则以所述预置多音字预测模型作为目标多音字预测模型;若所述map值小于预设阈值,则对所述预置多音字预测模型进行参数调整,得到新的多音字预测模型;将所述新的多音字预测模型作为预置多音字预测模型,并执行将所述特征信息输入预置多音字预测模型,得到所述训练文本中每个多音字的预测结果的步骤。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910407702.4 | 2019-05-16 | ||
CN201910407702.4A CN110310619A (zh) | 2019-05-16 | 2019-05-16 | 多音字预测方法、装置、设备及计算机可读存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020228175A1 true WO2020228175A1 (zh) | 2020-11-19 |
Family
ID=68075447
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/102446 WO2020228175A1 (zh) | 2019-05-16 | 2019-08-26 | 多音字预测方法、装置、设备及计算机可读存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110310619A (zh) |
WO (1) | WO2020228175A1 (zh) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110767212B (zh) * | 2019-10-24 | 2022-04-26 | 百度在线网络技术(北京)有限公司 | 一种语音处理方法、装置和电子设备 |
CN112818657B (zh) * | 2019-11-15 | 2024-04-26 | 北京字节跳动网络技术有限公司 | 多音字读音的确定方法、装置、电子设备及存储介质 |
CN110956954B (zh) * | 2019-11-29 | 2020-12-11 | 百度在线网络技术(北京)有限公司 | 一种语音识别模型训练方法、装置以及电子设备 |
CN111144110B (zh) * | 2019-12-27 | 2024-06-04 | 科大讯飞股份有限公司 | 拼音标注方法、装置、服务器及存储介质 |
CN111506736B (zh) * | 2020-04-08 | 2023-08-08 | 北京百度网讯科技有限公司 | 文本发音获取方法、装置和电子设备 |
CN111798834B (zh) * | 2020-07-03 | 2022-03-15 | 北京字节跳动网络技术有限公司 | 多音字的识别方法、装置、可读介质和电子设备 |
CN113971947A (zh) * | 2020-07-24 | 2022-01-25 | 北京有限元科技有限公司 | 语音合成的方法、装置以及存储介质 |
CN112069816A (zh) * | 2020-09-14 | 2020-12-11 | 深圳市北科瑞声科技股份有限公司 | 中文标点符号添加方法和系统及设备 |
CN112348073B (zh) * | 2020-10-30 | 2024-05-17 | 北京达佳互联信息技术有限公司 | 一种多音字识别方法、装置、电子设备及存储介质 |
CN112818089B (zh) * | 2021-02-23 | 2022-06-03 | 掌阅科技股份有限公司 | 文本注音方法、电子设备及存储介质 |
CN113297346B (zh) * | 2021-06-28 | 2023-10-31 | 中国平安人寿保险股份有限公司 | 文本意图识别方法、装置、设备及存储介质 |
CN114333760B (zh) * | 2021-12-31 | 2023-06-02 | 科大讯飞股份有限公司 | 一种信息预测模块的构建方法、信息预测方法及相关设备 |
CN114742044A (zh) * | 2022-03-18 | 2022-07-12 | 联想(北京)有限公司 | 一种信息处理方法、装置和电子设备 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105336322A (zh) * | 2015-09-30 | 2016-02-17 | 百度在线网络技术(北京)有限公司 | 多音字模型训练方法、语音合成方法及装置 |
US20160358596A1 (en) * | 2015-06-08 | 2016-12-08 | Nuance Communications, Inc. | Process for improving pronunciation of proper nouns foreign to a target language text-to-speech system |
CN106935239A (zh) * | 2015-12-29 | 2017-07-07 | 阿里巴巴集团控股有限公司 | 一种发音词典的构建方法及装置 |
CN107515850A (zh) * | 2016-06-15 | 2017-12-26 | 阿里巴巴集团控股有限公司 | 确定多音字发音的方法、装置和系统 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107464559B (zh) * | 2017-07-11 | 2020-12-15 | 中国科学院自动化研究所 | 基于汉语韵律结构和重音的联合预测模型构建方法及系统 |
CN107680580B (zh) * | 2017-09-28 | 2020-08-18 | 百度在线网络技术(北京)有限公司 | 文本转换模型训练方法和装置、文本转换方法和装置 |
CN109033068B (zh) * | 2018-06-14 | 2022-07-12 | 北京慧闻科技(集团)有限公司 | 基于注意力机制的用于阅读理解的方法、装置和电子设备 |
CN109754778B (zh) * | 2019-01-17 | 2023-05-30 | 平安科技(深圳)有限公司 | 文本的语音合成方法、装置和计算机设备 |
-
2019
- 2019-05-16 CN CN201910407702.4A patent/CN110310619A/zh active Pending
- 2019-08-26 WO PCT/CN2019/102446 patent/WO2020228175A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160358596A1 (en) * | 2015-06-08 | 2016-12-08 | Nuance Communications, Inc. | Process for improving pronunciation of proper nouns foreign to a target language text-to-speech system |
CN105336322A (zh) * | 2015-09-30 | 2016-02-17 | 百度在线网络技术(北京)有限公司 | 多音字模型训练方法、语音合成方法及装置 |
CN106935239A (zh) * | 2015-12-29 | 2017-07-07 | 阿里巴巴集团控股有限公司 | 一种发音词典的构建方法及装置 |
CN107515850A (zh) * | 2016-06-15 | 2017-12-26 | 阿里巴巴集团控股有限公司 | 确定多音字发音的方法、装置和系统 |
Also Published As
Publication number | Publication date |
---|---|
CN110310619A (zh) | 2019-10-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020228175A1 (zh) | 多音字预测方法、装置、设备及计算机可读存储介质 | |
US11205444B2 (en) | Utilizing bi-directional recurrent encoders with multi-hop attention for speech emotion recognition | |
JP6923332B2 (ja) | 自動通訳方法及び装置 | |
US11450313B2 (en) | Determining phonetic relationships | |
US8972260B2 (en) | Speech recognition using multiple language models | |
JP2021018797A (ja) | 対話の交互方法、装置、コンピュータ可読記憶媒体、及びプログラム | |
US12008336B2 (en) | Multimodal translation method, apparatus, electronic device and computer-readable storage medium | |
CN113327575B (zh) | 一种语音合成方法、装置、计算机设备和存储介质 | |
WO2021179910A1 (zh) | 文本语音的前端转换方法、装置、设备和存储介质 | |
CN111930900B (zh) | 标准发音生成方法及相关装置 | |
US11532301B1 (en) | Natural language processing | |
US10714087B2 (en) | Speech control for complex commands | |
CN110827803A (zh) | 方言发音词典的构建方法、装置、设备及可读存储介质 | |
CN112802444A (zh) | 语音合成方法、装置、设备及存储介质 | |
CN112669842A (zh) | 人机对话控制方法、装置、计算机设备及存储介质 | |
CN113327574A (zh) | 一种语音合成方法、装置、计算机设备和存储介质 | |
KR20240065125A (ko) | 희귀 단어 스피치 인식을 위한 대규모 언어 모델 데이터 선택 | |
US11626107B1 (en) | Natural language processing | |
US11620978B2 (en) | Automatic interpretation apparatus and method | |
US9483265B2 (en) | Vectorized lookup of floating point values | |
WO2021228084A1 (zh) | 语音数据识别方法、设备及介质 | |
KR20120052591A (ko) | 연속어 음성인식 시스템에서 오류수정 장치 및 방법 | |
CN116597809A (zh) | 多音字消歧方法、装置、电子设备及可读存储介质 | |
KR20190023169A (ko) | 음소열의 편집 거리를 이용한 웨이크업 단어 선정 방법 | |
CN115410558A (zh) | 集外词处理方法、电子设备和存储介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19928438 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19928438 Country of ref document: EP Kind code of ref document: A1 |