CN112580335A - Method and device for disambiguating polyphone - Google Patents

Method and device for disambiguating polyphone Download PDF

Info

Publication number
CN112580335A
CN112580335A CN202011581165.4A CN202011581165A CN112580335A CN 112580335 A CN112580335 A CN 112580335A CN 202011581165 A CN202011581165 A CN 202011581165A CN 112580335 A CN112580335 A CN 112580335A
Authority
CN
China
Prior art keywords
data
polyphone
text
sentence
detected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011581165.4A
Other languages
Chinese (zh)
Other versions
CN112580335B (en
Inventor
庞帅
袁晟君
李宸
杨辰雨
庄磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CCB Finetech Co Ltd
Original Assignee
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CCB Finetech Co Ltd filed Critical CCB Finetech Co Ltd
Priority to CN202011581165.4A priority Critical patent/CN112580335B/en
Publication of CN112580335A publication Critical patent/CN112580335A/en
Application granted granted Critical
Publication of CN112580335B publication Critical patent/CN112580335B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention provides a method and a device for disambiguating polyphone, wherein the method comprises the following steps: acquiring text data of a sentence to be detected containing polyphones; according to the sentence text data to be detected, inquiring a four-level word list corresponding to polyphones contained in the sentence text data to be detected in a pre-constructed four-level polyphone word list; obtaining a candidate pronunciation data set corresponding to polyphones contained in the sentence text data to be detected; combining the sentence text data to be detected and the candidate pronunciation data set, and inputting a text matching model; and determining the pronunciation of polyphones contained in the text data of the sentence to be detected according to the output result of the text matching model. The sentence text data to be detected can be matched with the candidate pronunciations of the polyphone one by one, and compared with a classification method, the recognition accuracy rate on the unconventional pronunciation can be improved. All polyphones and corresponding pronunciation data are imported, and even if rare pronunciation which is not seen in a text matching model is not found, the rare pronunciation can be accurately identified.

Description

Method and device for disambiguating polyphone
Technical Field
The invention relates to the field of voice recognition, in particular to a method and a device for disambiguating polyphone.
Background
In speech recognition systems, word-to-speech conversion is one of the essential modules, the accuracy of which directly affects the intelligibility of the recognized speech. In the mandarin chinese speech synthesis system, the task of converting the character and pronunciation is to convert the character sequence into corresponding phonetic sequence. In most cases, the pronunciation conversion is to search the current word in the dictionary and match it with the corresponding pinyin. However, there are words in Mandarin that correspond to multiple pinyins. For example, the word "good" reads "hao (3 sound)" in "good score" and "hao (4 sound)" in "good guest". The key and difficulty of character-sound conversion is how to solve the problem of one character with multiple sounds.
About 200 polyphones are common in Mandarin, each polyphone has a conventional pronunciation and an irregular pronunciation, and polyphone disambiguation needs to be performed in order to determine the polyphones in the voice data as correct characters, wherein the polyphone disambiguation means that the polyphones in the data are predicted as correct pinyin sequences in a voice synthesis system. In the prior art, when performing polyphonic disambiguation, a model based on a classification method is adopted, but in a real scene, the pronunciation of polyphonic characters is very unbalanced, for example, the pronunciation distribution statistics shown in table 1:
Figure BDA0002865066270000011
because the quantity of the abnormal pronunciation data is far less than that of the conventional pronunciation data, the performance of the model based on the classification method in the small sample data of the abnormal pronunciation is poor, and for the pronunciation data outside the data set which is not seen by the model, for example, in a real scene, a polyphone has rare pronunciation, and the training corpus may not be covered. Such as the word "discount" pronounced as "zhe 1 sound", the model based on the classification method cannot recognize.
Therefore, the existing polyphone disambiguation method has low accuracy in polyphone recognition.
Disclosure of Invention
The embodiment of the invention provides a polyphone disambiguation method which is used for improving the accuracy rate of identifying polyphones and comprises the following steps:
acquiring text data of a sentence to be detected containing polyphones;
according to the sentence text data to be detected, inquiring a four-level word list corresponding to polyphones contained in the sentence text data to be detected in a pre-constructed four-level polyphone word list; the four-level polyphone word list records the association relationship between a plurality of polyphones and the four-level word list corresponding to each polyphone;
obtaining a candidate pronunciation data set corresponding to polyphones contained in the sentence text data to be detected according to a four-level word list corresponding to the polyphones contained in the sentence text data to be detected;
combining the sentence text data to be detected and the candidate pronunciation data set, and inputting a text matching model; the text matching model is a pre-constructed BERT model used for determining the adaptation degree between the text data of the sentence to be detected and the candidate pronunciation of the polyphone;
and determining the pronunciation of polyphones contained in the text data of the sentence to be detected according to the output result of the text matching model.
In specific implementation, the four-level word list corresponding to each polyphone comprises:
the text of each polyphone, different pronunciations of each polyphone, paraphrase information corresponding to the different pronunciations of each polyphone, and common phrases corresponding to the different pronunciations of each polyphone.
In an embodiment of the present invention, a method for disambiguating polyphonic characters is further provided, further comprising:
iteratively executing the following steps until the output efficiency of the text matching model and/or the accuracy of the output result meet preset requirements or the iteration number exceeds a preset value:
updating the four-level polyphone word list according to the output efficiency of the text matching model and/or the accuracy of the output result;
obtaining a candidate pronunciation data set corresponding to polyphones contained in the text data of the sentence to be detected again by using the updated four-level polyphone word list;
and re-determining the output result of the text matching model according to the newly obtained candidate pronunciation data set.
In the specific embodiment of the present invention, updating the four-level polyphonic vocabulary according to the output efficiency of the text matching model and/or the accuracy of the output result includes:
and adjusting, adding or deleting paraphrase information corresponding to different pronunciations of each polyphone and common phrases corresponding to different pronunciations of each polyphone in the four-level polyphone word list according to the output efficiency of the text matching model and/or the accuracy of an output result.
In specific implementation, the obtaining of a candidate pronunciation data set corresponding to polyphones contained in the sentence text data to be detected according to a four-level word list corresponding to polyphones contained in the sentence text data to be detected includes:
determining a plurality of candidate pronunciations corresponding to polyphones contained in the sentence text data to be detected, paraphrase information corresponding to each candidate pronunciation and a common phrase according to a four-level word list corresponding to the polyphones contained in the sentence text data to be detected;
determining each candidate pronunciation data subset according to a plurality of candidate pronunciations corresponding to polyphones contained in the sentence text data to be detected and paraphrase information and common phrases corresponding to each candidate pronunciation;
and combining a plurality of candidate pronunciation data subsets to obtain the candidate pronunciation data set.
In specific implementation, after combining the sentence text data to be detected and the candidate pronunciation data set, inputting a text matching model, including:
and splicing the sentence text data to be detected and each candidate pronunciation data subset one by one, and inputting a text matching model.
In a specific embodiment, the process of establishing the text matching model includes:
acquiring a plurality of training data and correct pronunciations corresponding to the training data; the training data includes: sentence text data containing polyphones and a plurality of candidate pronunciation data sets corresponding to the contained polyphones;
determining the adaptation degree between the sentence text data and each candidate pronunciation data set according to the correct pronunciation corresponding to each training data;
and (3) performing deep machine learning to construct a text matching model by taking training data as input of the BERT model and the adaptation degree between the sentence text data and each candidate pronunciation data set as output of the BERT model.
In a specific implementation process, the process of establishing the text matching model further includes:
inputting a plurality of training data into the constructed text matching model to obtain the output results of the text matching model corresponding to the plurality of training data;
and adjusting the constructed text matching model according to the correct pronunciation corresponding to the plurality of training data and the text matching model output result corresponding to the plurality of training data.
In a specific embodiment, determining the pronunciation of the polyphone included in the text data of the sentence to be detected according to the output result of the text matching model includes:
sorting the adaptation degree between the output sentence text data to be detected of the text matching model and each candidate pronunciation of the polyphone;
and determining the candidate pronunciation with the top ranking as the pronunciation of the polyphone contained in the text data of the sentence to be detected.
The embodiment of the invention also provides a polyphone disambiguation device, which is used for improving the accuracy rate of identifying polyphones and comprises the following components:
the data acquisition module is used for acquiring the text data of the sentence to be detected containing the polyphone;
the four-level word list determining module is used for inquiring a pre-constructed four-level polyphone word list according to the sentence text data to be detected to obtain a four-level word list corresponding to polyphones contained in the sentence text data to be detected; the four-level polyphone word list records the association relationship between a plurality of polyphones and the four-level word list corresponding to each polyphone;
the candidate pronunciation data set determining module is used for obtaining a candidate pronunciation data set corresponding to polyphones contained in the sentence text data to be detected according to a four-level word list corresponding to the polyphones contained in the sentence text data to be detected;
the text matching module is used for inputting a text matching model after combining the text data of the sentence to be detected and the candidate pronunciation data set; the text matching model is a pre-constructed BERT model used for determining the adaptation degree between the text data of the sentence to be detected and the candidate pronunciation of the polyphone;
and the pronunciation determining module is used for determining the pronunciation of the polyphone contained in the text data of the sentence to be detected according to the output result of the text matching model.
In the embodiment of the present invention, the four-level word list corresponding to each polyphone includes:
the text of each polyphone, different pronunciations of each polyphone, paraphrase information corresponding to the different pronunciations of each polyphone, and common phrases corresponding to the different pronunciations of each polyphone.
In a specific embodiment, the method further comprises: the four-level polyphone vocabulary iteration updating module is used for:
iteratively executing the following steps until the output efficiency of the text matching model and/or the accuracy of the output result meet preset requirements or the iteration number exceeds a preset value:
updating the four-level polyphone word list according to the output efficiency of the text matching model and/or the accuracy of the output result;
obtaining a candidate pronunciation data set corresponding to polyphones contained in the text data of the sentence to be detected again by using the updated four-level polyphone word list;
and re-determining the output result of the text matching model according to the newly obtained candidate pronunciation data set.
In specific implementation, the updating of the four-level polyphone vocabulary according to the output efficiency of the text matching model and/or the accuracy of the output result comprises:
and adjusting, adding or deleting paraphrase information corresponding to different pronunciations of each polyphone and common phrases corresponding to different pronunciations of each polyphone in the four-level polyphone word list according to the output efficiency of the text matching model and/or the accuracy of an output result.
In an embodiment of the present invention, the candidate pronunciation data set determining module is specifically configured to:
determining a plurality of candidate pronunciations corresponding to polyphones contained in the sentence text data to be detected, paraphrase information corresponding to each candidate pronunciation and a common phrase according to a four-level word list corresponding to the polyphones contained in the sentence text data to be detected;
determining each candidate pronunciation data subset according to a plurality of candidate pronunciations corresponding to polyphones contained in the sentence text data to be detected and paraphrase information and common phrases corresponding to each candidate pronunciation;
and combining a plurality of candidate pronunciation data subsets to obtain the candidate pronunciation data set.
In a specific embodiment of the present invention, the text matching module is specifically configured to:
and splicing the sentence text data to be detected and each candidate pronunciation data subset one by one, and inputting a text matching model.
In a specific embodiment of the present invention, the text matching module includes: a text matching model construction unit for:
acquiring a plurality of training data and correct pronunciations corresponding to the training data; the training data includes: sentence text data containing polyphones and a plurality of candidate pronunciation data sets corresponding to the contained polyphones;
determining the adaptation degree between the sentence text data and each candidate pronunciation data set according to the correct pronunciation corresponding to each training data;
and (3) performing deep machine learning to construct a text matching model by taking training data as input of the BERT model and the adaptation degree between the sentence text data and each candidate pronunciation data set as output of the BERT model.
In specific implementation, the text matching model construction unit is further configured to:
inputting a plurality of training data into the constructed text matching model to obtain the output results of the text matching model corresponding to the plurality of training data;
and adjusting the constructed text matching model according to the correct pronunciation corresponding to the plurality of training data and the text matching model output result corresponding to the plurality of training data.
In a specific implementation, the pronunciation determining module is specifically configured to:
sorting the adaptation degree between the output sentence text data to be detected of the text matching model and each candidate pronunciation of the polyphone;
and determining the candidate pronunciation with the top ranking as the pronunciation of the polyphone contained in the text data of the sentence to be detected.
The embodiment of the invention also provides computer equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor realizes the polyphonic disambiguation method when executing the computer program.
Embodiments of the present invention also provide a computer-readable storage medium storing a computer program for executing the above-described polyphonic disambiguation method.
In the embodiment of the invention, the text data of the sentence to be detected containing polyphone is obtained; according to the sentence text data to be detected, inquiring a four-level word list corresponding to polyphones contained in the sentence text data to be detected in a pre-constructed four-level polyphone word list; wherein, the four-level polyphone word list records the association relationship between a plurality of polyphones and the four-level word list corresponding to each polyphone; obtaining a candidate pronunciation data set corresponding to polyphones contained in the sentence text data to be detected according to a four-level word list corresponding to the polyphones contained in the sentence text data to be detected; combining the sentence text data to be detected and the candidate pronunciation data set, and inputting a text matching model; the text matching model is a pre-constructed BERT model used for determining the adaptation degree between the text data of the sentence to be detected and the candidate pronunciation of the polyphone; and determining the pronunciation of polyphones contained in the text data of the sentence to be detected according to the output result of the text matching model. The text matching model is built by the BERT model, so that the text data of the sentence to be detected and the candidate pronunciations of the polyphones can be matched one by one, and compared with a classification method, the recognition accuracy rate on the unconventional pronunciation can be improved. By constructing a four-level polyphone word list in advance and importing all polyphones and corresponding pronunciation data, even if rare pronunciations which are not seen in a text matching model are not available, the polyphones can be accurately identified, and therefore the accuracy rate of the polyphones is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a diagram illustrating a method for disambiguating polyphonic characters in an embodiment of the present invention.
Fig. 2 is a schematic diagram of a specific implementation method of step 103 in an embodiment of the present invention.
Fig. 3 is a schematic diagram of a process of establishing a text matching model according to an embodiment of the present invention.
FIG. 4 is a diagram illustrating another process of building a text matching model according to an embodiment of the present invention.
Fig. 5 is a schematic diagram of a specific implementation method of step 105 in an embodiment of the present invention.
FIG. 6 is a diagram illustrating a method for disambiguating polyphonic characters in an embodiment of the present invention.
FIG. 7 is a block diagram of a system for polyphonic disambiguation in an implementation of the present invention.
FIG. 8 is a diagram illustrating a polyphonic disambiguation apparatus according to an embodiment of the present invention.
FIG. 9 is a schematic diagram of a polyphonic disambiguation apparatus according to an embodiment of the present invention.
FIG. 10 is a diagram of an electronic device for disambiguation of polyphonic characters in an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides a polyphone disambiguation method for improving the identification accuracy of polyphones, and as shown in figure 1, the method comprises the following steps:
step 101: acquiring text data of a sentence to be detected containing polyphones;
step 102: according to the sentence text data to be detected, inquiring a four-level word list corresponding to polyphones contained in the sentence text data to be detected in a pre-constructed four-level polyphone word list;
step 103: obtaining a candidate pronunciation data set corresponding to polyphones contained in the sentence text data to be detected according to a four-level word list corresponding to the polyphones contained in the sentence text data to be detected;
step 104: combining the sentence text data to be detected and the candidate pronunciation data set, and inputting a text matching model;
step 105: and determining the pronunciation of polyphones contained in the text data of the sentence to be detected according to the output result of the text matching model.
Wherein, the four-level polyphone word list records the association relationship between a plurality of polyphones and the four-level word list corresponding to each polyphone; the text matching model is a pre-constructed BERT model used for determining the adaptation degree between the text data of the sentence to be detected and the candidate pronunciation of the polyphone.
As can be known from the flow shown in fig. 1, in the embodiment of the present invention, text data of a sentence to be detected including polyphones is obtained; according to the sentence text data to be detected, inquiring a four-level word list corresponding to polyphones contained in the sentence text data to be detected in a pre-constructed four-level polyphone word list; wherein, the four-level polyphone word list records the association relationship between a plurality of polyphones and the four-level word list corresponding to each polyphone; obtaining a candidate pronunciation data set corresponding to polyphones contained in the sentence text data to be detected according to a four-level word list corresponding to the polyphones contained in the sentence text data to be detected; combining the sentence text data to be detected and the candidate pronunciation data set, and inputting a text matching model; the text matching model is a pre-constructed BERT model used for determining the adaptation degree between the text data of the sentence to be detected and the candidate pronunciation of the polyphone; and determining the pronunciation of polyphones contained in the text data of the sentence to be detected according to the output result of the text matching model. The text matching model is built by the BERT model, so that the text data of the sentence to be detected and the candidate pronunciations of the polyphones can be matched one by one, and compared with a classification method, the recognition accuracy rate on the unconventional pronunciation can be improved. By constructing a four-level polyphone word list in advance and importing all polyphones and corresponding pronunciation data, even if rare pronunciations which are not seen in a text matching model are not available, the polyphones can be accurately identified, and therefore the accuracy rate of the polyphones is improved.
When the method is specifically implemented, firstly, the text data of the sentence to be detected containing the polyphone is obtained.
After obtaining the sentence text data to be detected containing polyphones, according to the sentence text data to be detected, inquiring in a pre-constructed four-level polyphone word list to obtain a four-level word list corresponding to the polyphones contained in the sentence text data to be detected. In a specific embodiment, the four-level polyphone word list records the association relationship between a plurality of polyphones and the four-level word list corresponding to each polyphone. Wherein, the four-level word list corresponding to each polyphone comprises:
the text of each polyphone, different pronunciations of each polyphone, paraphrase information corresponding to the different pronunciations of each polyphone, and common phrases corresponding to the different pronunciations of each polyphone.
The problem of low recognition accuracy of the model data set external data based on the classification model in the prior art is solved by constructing a four-level polyphone word list associated with polyphone-pronunciation-explanation-word groups and introducing the prior knowledge of polyphone.
After obtaining the four-level word list corresponding to the polyphones included in the sentence text data to be detected, according to the four-level word list corresponding to the polyphones included in the sentence text data to be detected, a candidate pronunciation data set corresponding to the polyphones included in the sentence text data to be detected is obtained, and a specific process is shown in fig. 2 and includes:
step 201: determining a plurality of candidate pronunciations corresponding to polyphones contained in the sentence text data to be detected, paraphrase information corresponding to each candidate pronunciation and a common phrase according to a four-level word list corresponding to the polyphones contained in the sentence text data to be detected;
step 202: determining each candidate pronunciation data subset according to a plurality of candidate pronunciations corresponding to polyphones contained in the sentence text data to be detected and paraphrase information and common phrases corresponding to each candidate pronunciation;
step 203: and combining the plurality of candidate pronunciation data subsets to obtain a candidate pronunciation data set.
And after the candidate pronunciation data set is obtained, combining the sentence text data to be detected and the candidate pronunciation data set, and inputting the text matching model. The text matching model is a pre-constructed BERT model used for determining the adaptation degree between the text data of the sentence to be detected and the candidate pronunciation of the polyphone. The BERT model is called Bidirectional Encoder reproduction from transforms, i.e., encoders for Bidirectional Transformers, because decoders cannot obtain the information to be predicted. The main innovation point of the model is based on a pre-train method, namely two methods of mask LM and Next sequence Prediction are used for capturing the representation of the word and Sentence level respectively.
In a specific embodiment, the process of establishing the text matching model, as shown in fig. 3, includes:
step 301: acquiring a plurality of training data and correct pronunciations corresponding to the training data;
step 302: determining the adaptation degree between the sentence text data and each candidate pronunciation data set according to the correct pronunciation corresponding to each training data;
step 303: and (3) performing deep machine learning to construct a text matching model by taking training data as input of the BERT model and the adaptation degree between the sentence text data and each candidate pronunciation data set as output of the BERT model.
Wherein the training data comprises: sentence text data containing polyphones and a plurality of candidate pronunciation data sets corresponding to the contained polyphones.
Because the BERT model can realize one-to-one matching, the text data of the sentence to be detected and the candidate pronunciations of the polyphones can be matched one by one, and compared with a classification method, the identification accuracy rate on the unconventional pronunciations can be improved.
In order to improve the prediction accuracy of the established text matching model, the establishing process of the text matching model shown in fig. 4 further includes, on the basis of fig. 3:
step 401: inputting a plurality of training data into the constructed text matching model to obtain the output results of the text matching model corresponding to the plurality of training data;
step 402: and adjusting the constructed text matching model according to the correct pronunciation corresponding to the plurality of training data and the text matching model output result corresponding to the plurality of training data.
After combining the sentence text data to be detected and the candidate pronunciation data set, inputting a text matching model, wherein the text matching model comprises the following steps: and splicing the sentence text data to be detected and each candidate pronunciation data subset one by one, and inputting a text matching model.
And after the text matching model is input, determining the pronunciation of the polyphone contained in the text data of the sentence to be detected according to the output result of the text matching model. The specific implementation process, as shown in fig. 5, includes:
step 501: sorting the adaptation degree between the output sentence text data to be detected of the text matching model and each candidate pronunciation of the polyphone;
step 502: and determining the candidate pronunciation with the top ranking as the pronunciation of the polyphone contained in the text data of the sentence to be detected.
For example, the text data of the sentence to be detected containing the polyphone is A, the candidate pronunciation data set is { B, C, D }, namely the candidate pronunciation data subsets are B, C and D, AB, AC and AD are respectively spelled out to be used as the input of the text matching model, finally, the fitness scores of AB, AC and AD are respectively output, and the highest one of the AB, AC and AD is taken as the pronunciation determination result of the polyphone.
Because the pre-constructed four-level polyphone vocabulary is originated from the dictionary, and the data is more comprehensive, but in practical application, paraphrases or phrases corresponding to some pronunciations are not commonly used, and the four-level polyphone vocabulary is too complex and can slow down the speed of determining the polyphone pronunciations, therefore, the embodiment of the invention also provides a polyphone disambiguation method, as shown in fig. 6, on the basis of fig. 1, the method further comprises:
step 601: iteratively executing the following steps until the output efficiency of the text matching model and/or the accuracy of the output result meet preset requirements or the iteration number exceeds a preset value:
step 602: updating the four-level polyphone word list according to the output efficiency of the text matching model and/or the accuracy of the output result;
step 603: obtaining a candidate pronunciation data set corresponding to polyphones contained in the text data of the sentence to be detected again by using the updated four-level polyphone word list;
step 604: and re-determining the output result of the text matching model according to the newly obtained candidate pronunciation data set.
The output efficiency of the text matching model refers to the speed of obtaining an output result after the text matching model operates, and in specific implementation, if the output efficiency is too low and/or the accuracy of the output result is low, the four-level polyphone vocabulary is updated in an iterative manner. The specific judgment criteria of too low output efficiency and/or low accuracy of the output result are specifically set according to the actual situation, and are not described herein again. Updating the four-level polyphone word list according to the output efficiency of the text matching model and/or the accuracy of the output result, which specifically comprises the following steps:
and adjusting, adding or deleting paraphrase information corresponding to different pronunciations of each polyphone and common phrases corresponding to different pronunciations of each polyphone in the four-level polyphone word list according to the output efficiency of the text matching model and/or the accuracy of an output result.
Through the updating of the four-level polyphone word list, the text matching model does not need to be trained and deployed again, and the problem that the iterative upgrading of the text matching model is slow is solved.
To better illustrate the method of disambiguating polyphonic characters provided by embodiments of the present invention, a detailed description is provided with a specific implementation.
The specific implementation constructs a system for disambiguating polyphone according to the method for disambiguating polyphone provided by the embodiment of the invention, and the frame of the system is shown in fig. 7 and comprises three major parts:
firstly, a four-level polyphone word list associated with polyphone-pronunciation-explanation-word groups is constructed in an iteration mode, and the problem of slow iteration upgrading of the model is solved by updating the four-level polyphone word list without retraining a deployment model.
Firstly, 190 common polyphones are selected, corresponding explanations and phrases are sorted, and a four-level polyphone word list is constructed.
For example, the constructed four-level vocabulary of the polyphone "Buddha" is shown in table 2:
TABLE 2 four-level word list of Buddha
Figure BDA0002865066270000111
And secondly, matching the sentences containing the polyphones with the polyphone word lists one by one based on a matching mode, selecting the polyphone pinyin with the highest matching degree, and determining the polyphone pinyin as the polyphone pronunciation. By matching and introducing external prior knowledge, the problem of poor accuracy performance on a small sample data set is solved.
Compared with the traditional method, the text matching model based on deep learning can automatically extract the relation between words from a large number of samples, and can describe the text matching problem more finely by combining the structural information in phrase matching and the hierarchical characteristic of text matching.
Although the text matching model based on deep learning can greatly improve the matching accuracy, the text matching model respectively performs one-to-one matching, and if the candidate set is excessive, a long waiting time is usually required. However, in the scenario of polyphonic disambiguation, usually one polyphonic word corresponds to only 2-4 pronunciations, i.e. there are only 2-4 candidate sets, which are not affected in temporal performance. The text matching model based on deep learning is very suitable for the scene.
And thirdly, constructing a text matching model based on the BERT, and finely adjusting the text matching process by using a pre-training model BERT.
In order to verify that the polyphone disambiguation method provided by the invention can effectively improve the accuracy of determining the polyphone pronunciation, the specific implementation is also experimentally verified. The test data set used for experimental verification is from a data set containing polyphones in a real business scenario.
In order to avoid the contingency of the experimental results, three different random seeds are set, and the mean value of the three experimental results is calculated to serve as the experimental result. The evaluation index used here is the accuracy P, which is specifically defined as Precision ═ a ≠ B |/| a |, where a denotes the polyphonic pronunciation recognized by the polyphonic disambiguation model and B denotes the true polyphonic pronunciation in the text corpus.
The text matching model uses a BERT model, and the used environment is python 3.6; the size of the batchsize used in the training process is 64; dropout is set to 0.1; the number of training rounds is 3.
The following comparative experiments were designed:
compared with the existing method based on classification, the method is verified in a normal polyphone test set and an unconventional pronunciation test data set respectively.
The experimental results are as follows: in the unconventional polyphone test data set, the performance of the polyphone disambiguation method provided by the invention is far superior to that of the existing method based on classification.
The experimental data and conclusions are shown in table 3:
TABLE 3 comparison of the results
Figure BDA0002865066270000121
As can be seen from the above table, the polyphone disambiguation method provided by the embodiment of the invention solves the problem of low recognition accuracy of the model in unseen rare pronunciations by constructing a four-level polyphone vocabulary and introducing the prior knowledge of polyphones. By updating the four-level polyphone vocabulary information, the deployment model does not need to be retrained, and the problem of slow iterative upgrade of the model based on deep learning is solved. By constructing a text matching model and in a one-to-one matching mode, the problem that the recognition accuracy of the model on a small sample data set of unconventional pronunciation is low is solved.
Based on the same inventive concept, embodiments of the present invention further provide a polyphonic disambiguation apparatus, and since the principle of the problem solved by the polyphonic disambiguation apparatus is similar to that of the polyphonic disambiguation method, the implementation of the polyphonic disambiguation apparatus can refer to the implementation of the polyphonic disambiguation method, and repeated parts are not repeated, and the specific structure is shown in fig. 8:
a data obtaining module 801, configured to obtain text data of a sentence to be detected, where the sentence includes polyphones;
a level four word list determining module 802, configured to query, according to the statement text data to be detected, a level four word list corresponding to the polyphones included in the statement text data to be detected in a pre-constructed level four polyphone word list; wherein, the four-level polyphone word list records the association relationship between a plurality of polyphones and the four-level word list corresponding to each polyphone;
a candidate pronunciation data set determining module 803, configured to obtain a candidate pronunciation data set corresponding to the polyphones included in the sentence text data to be detected according to the four-level word list corresponding to the polyphones included in the sentence text data to be detected;
the text matching module 804 is used for inputting a text matching model after combining the text data of the sentence to be detected and the candidate pronunciation data set; the text matching model is a pre-constructed BERT model used for determining the adaptation degree between the text data of the sentence to be detected and the candidate pronunciation of the polyphone;
and a pronunciation determining module 805, configured to determine, according to an output result of the text matching model, a pronunciation of a polyphone included in the text data of the sentence to be detected.
In the embodiment of the present invention, the four-level word list corresponding to each polyphone includes:
the text of each polyphone, different pronunciations of each polyphone, paraphrase information corresponding to the different pronunciations of each polyphone, and common phrases corresponding to the different pronunciations of each polyphone.
In an embodiment of the present invention, the candidate pronunciation data set determining module 803 is specifically configured to:
determining a plurality of candidate pronunciations corresponding to polyphones contained in the sentence text data to be detected, paraphrase information corresponding to each candidate pronunciation and a common phrase according to a four-level word list corresponding to the polyphones contained in the sentence text data to be detected;
determining each candidate pronunciation data subset according to a plurality of candidate pronunciations corresponding to polyphones contained in the sentence text data to be detected and paraphrase information and common phrases corresponding to each candidate pronunciation;
and combining the plurality of candidate pronunciation data subsets to obtain a candidate pronunciation data set.
In an embodiment of the present invention, the text matching module 804 is specifically configured to:
and splicing the sentence text data to be detected and each candidate pronunciation data subset one by one, and inputting a text matching model.
In an embodiment of the present invention, the text matching module 804 includes: a text matching model construction unit for:
acquiring a plurality of training data and correct pronunciations corresponding to the training data;
wherein the training data comprises: sentence text data containing polyphones and a plurality of candidate pronunciation data sets corresponding to the contained polyphones;
determining the adaptation degree between the sentence text data and each candidate pronunciation data set according to the correct pronunciation corresponding to each training data;
and (3) performing deep machine learning to construct a text matching model by taking training data as input of the BERT model and the adaptation degree between the sentence text data and each candidate pronunciation data set as output of the BERT model.
When the method is specifically implemented, the text matching model construction unit is further configured to:
inputting a plurality of training data into the constructed text matching model to obtain the output results of the text matching model corresponding to the plurality of training data;
and adjusting the constructed text matching model according to the correct pronunciation corresponding to the plurality of training data and the text matching model output result corresponding to the plurality of training data.
In specific implementation, the pronunciation determining module 805 is specifically configured to:
sorting the adaptation degree between the output sentence text data to be detected of the text matching model and each candidate pronunciation of the polyphone;
and determining the candidate pronunciation with the top ranking as the pronunciation of the polyphone contained in the text data of the sentence to be detected.
In a specific embodiment, there is further provided a polyphonic disambiguation apparatus, as shown in fig. 9, on the basis of fig. 8, further comprising:
a fourth-level polyphone vocabulary iteration updating module 901, configured to:
iteratively executing the following steps until the output efficiency of the text matching model and/or the accuracy of the output result meet preset requirements or the iteration number exceeds a preset value:
updating the four-level polyphone word list according to the output efficiency of the text matching model and/or the accuracy of the output result;
obtaining a candidate pronunciation data set corresponding to polyphones contained in the text data of the sentence to be detected again by using the updated four-level polyphone word list;
and re-determining the output result of the text matching model according to the newly obtained candidate pronunciation data set.
When the method is specifically implemented, the four-level polyphone word list is updated according to the output efficiency of the text matching model and/or the accuracy of the output result, and the method comprises the following steps:
and adjusting, adding or deleting paraphrase information corresponding to different pronunciations of each polyphone and common phrases corresponding to different pronunciations of each polyphone in the four-level polyphone word list according to the output efficiency of the text matching model and/or the accuracy of an output result.
The embodiment of the electronic device for implementing all or part of the contents in the polyphonic disambiguation method provided by the embodiment of the invention specifically comprises the following contents:
a processor (processor), a memory (memory), a communication Interface (Communications Interface), and a bus; the processor, the memory and the communication interface complete mutual communication through the bus; the communication interface is used for realizing information transmission between related devices; the electronic device may be a desktop computer, a tablet computer, a mobile terminal, and the like, but the embodiment is not limited thereto. In this embodiment, the electronic device may be implemented with reference to the embodiment for implementing the method for disambiguating a polyphonic character and the embodiment for implementing the apparatus for disambiguating a polyphonic character in the embodiments, and the contents thereof are incorporated herein, and repeated descriptions thereof are omitted.
Fig. 10 is a schematic block diagram of a system configuration of an electronic apparatus 1000 according to an embodiment of the present application. As shown in fig. 10, the electronic device 1000 may include a central processing unit 1001 and a memory 1002; the memory 1002 is coupled to the cpu 1001. Notably, this fig. 10 is exemplary; other types of structures may also be used in addition to or in place of the structure to implement telecommunications or other functions.
In one embodiment, the polyphonic disambiguation function may be integrated into the CPU 1001. The cpu 1001 may be configured to perform the following control:
acquiring text data of a sentence to be detected containing polyphones;
according to the sentence text data to be detected, inquiring a four-level word list corresponding to polyphones contained in the sentence text data to be detected in a pre-constructed four-level polyphone word list;
obtaining a candidate pronunciation data set corresponding to polyphones contained in the sentence text data to be detected according to a four-level word list corresponding to the polyphones contained in the sentence text data to be detected;
combining the sentence text data to be detected and the candidate pronunciation data set, and inputting a text matching model;
and determining the pronunciation of polyphones contained in the text data of the sentence to be detected according to the output result of the text matching model.
As can be seen from the above description, in the electronic device provided in the embodiment of the present application, text data of a sentence to be detected, which includes polyphones, is obtained; according to the sentence text data to be detected, inquiring a four-level word list corresponding to polyphones contained in the sentence text data to be detected in a pre-constructed four-level polyphone word list; wherein, the four-level polyphone word list records the association relationship between a plurality of polyphones and the four-level word list corresponding to each polyphone; obtaining a candidate pronunciation data set corresponding to polyphones contained in the sentence text data to be detected according to a four-level word list corresponding to the polyphones contained in the sentence text data to be detected; combining the sentence text data to be detected and the candidate pronunciation data set, and inputting a text matching model; the text matching model is a pre-constructed BERT model used for determining the adaptation degree between the text data of the sentence to be detected and the candidate pronunciation of the polyphone; and determining the pronunciation of polyphones contained in the text data of the sentence to be detected according to the output result of the text matching model. The text matching model is built by the BERT model, so that the text data of the sentence to be detected and the candidate pronunciations of the polyphones can be matched one by one, and compared with a classification method, the recognition accuracy rate on the unconventional pronunciation can be improved. By constructing a four-level polyphone word list in advance and importing all polyphones and corresponding pronunciation data, even if rare pronunciations which are not seen in a text matching model are not available, the polyphones can be accurately identified, and therefore the accuracy rate of the polyphones is improved.
In another embodiment, the polyphonic disambiguation apparatus may be configured separately from the cpu 1001, for example, the polyphonic disambiguation apparatus may be configured as a chip connected to the cpu 1001, and the polyphonic disambiguation function is realized by the control of the cpu.
As shown in fig. 10, the electronic device 1000 may further include: a communication module 1003, an input unit 1004, an audio processor 1005, a display 1006, a power supply 1007. It is noted that the electronic device 1000 does not necessarily include all of the components shown in FIG. 10; furthermore, the electronic device 1000 may also comprise components not shown in fig. 10, which may be referred to in the prior art.
As shown in fig. 10, the central processing unit 1001, sometimes referred to as a controller or operation control, may include a microprocessor or other processor device and/or logic device, and the central processing unit 1001 receives input and controls the operation of the various components of the electronic device 1000.
The memory 1002 may be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information relating to the failure may be stored, and a program for executing the information may be stored. And the cpu 1001 can execute the program stored in the memory 1002 to realize information storage or processing, or the like.
The input unit 1004 provides input to the cpu 1001. The input unit 1004 is, for example, a key or a touch input device. The power supply 1007 is used to provide power to the electronic device 1000. The display 1006 is used for displaying display objects such as images and characters. The display may be, for example, an LCD display, but is not limited thereto.
The memory 1002 may be a solid state memory such as Read Only Memory (ROM), Random Access Memory (RAM), a SIM card, or the like. There may also be a memory that holds information even when power is off, can be selectively erased, and is provided with more data, an example of which is sometimes called an EPROM or the like. The memory 1002 may also be some other type of device. Memory 1002 includes buffer memory 1021 (sometimes referred to as a buffer). The memory 1002 may include an application/function storage part 1022, the application/function storage part 1022 being used for storing application programs and function programs or a flow for executing the operation of the electronic device 1000 by the central processing unit 1001.
The memory 1002 may also include a data store 1023, the data store 1023 being used to store data such as contacts, digital data, pictures, sounds and/or any other data used by the electronic device. Driver storage 1024 of memory 1002 may include various drivers for the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging applications, directory applications, etc.).
The communication module 1003 is a transmitter/receiver 1003 that transmits and receives signals via an antenna 1008. A communication module (transmitter/receiver) 1003 is coupled to the central processor 1001 to provide an input signal and receive an output signal, which may be the same as the case of a conventional mobile communication terminal.
Based on different communication technologies, a plurality of communication modules 1003, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, may be provided in the same electronic device. The communication module (transmitter/receiver) 1003 is also coupled to a speaker 1009 and a microphone 1010 via an audio processor 1005 to provide audio output via the speaker 1009 and receive audio input from the microphone 1010 to implement general telecommunications functions. The audio processor 1005 may include any suitable buffers, decoders, amplifiers and so forth. In addition, the audio processor 1005 is also coupled to the central processor 1001, so that sound can be recorded locally through the microphone 1010, and so that locally stored sound can be played through the speaker 1009.
An embodiment of the present invention further provides a computer-readable storage medium storing a computer program for executing the method for disambiguating polyphonic characters.
In summary, the method and the device for disambiguating polyphonic characters provided by the embodiments of the present invention have the following advantages:
obtaining text data of a sentence to be detected containing polyphones; according to the sentence text data to be detected, inquiring a four-level word list corresponding to polyphones contained in the sentence text data to be detected in a pre-constructed four-level polyphone word list; wherein, the four-level polyphone word list records the association relationship between a plurality of polyphones and the four-level word list corresponding to each polyphone; obtaining a candidate pronunciation data set corresponding to polyphones contained in the sentence text data to be detected according to a four-level word list corresponding to the polyphones contained in the sentence text data to be detected; combining the sentence text data to be detected and the candidate pronunciation data set to be used as the input of a text matching model; the text matching model is a pre-constructed BERT model used for determining the adaptation degree between the text data of the sentence to be detected and the candidate pronunciation of the polyphone; and determining the pronunciation of polyphones contained in the text data of the sentence to be detected according to the output result of the text matching model. The text matching model is built by the BERT model, so that the text data of the sentence to be detected and the candidate pronunciations of the polyphones can be matched one by one, and compared with a classification method, the recognition accuracy rate on the unconventional pronunciation can be improved. By constructing a four-level polyphone word list in advance and importing all polyphones and corresponding pronunciation data, even if rare pronunciations which are not seen in a text matching model are not available, the polyphones can be accurately identified, and therefore the accuracy rate of the polyphones is improved.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment. In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. The terms "upper", "lower", and the like, indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience in describing the present invention and simplifying the description, but do not indicate or imply that the referred devices or elements must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention. Unless expressly stated or limited otherwise, the terms "mounted," "connected," and "connected" are intended to be inclusive and mean, for example, that they may be fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations. It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict. The present invention is not limited to any single aspect, nor is it limited to any single embodiment, nor is it limited to any combination and/or permutation of these aspects and/or embodiments. Moreover, each aspect and/or embodiment of the present invention may be utilized alone or in combination with one or more other aspects and/or embodiments thereof.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims (20)

1. A method of polyphonic disambiguation comprising:
acquiring text data of a sentence to be detected containing polyphones;
according to the sentence text data to be detected, inquiring a four-level word list corresponding to polyphones contained in the sentence text data to be detected in a pre-constructed four-level polyphone word list; the four-level polyphone word list records the association relationship between a plurality of polyphones and the four-level word list corresponding to each polyphone;
obtaining a candidate pronunciation data set corresponding to polyphones contained in the sentence text data to be detected according to a four-level word list corresponding to the polyphones contained in the sentence text data to be detected;
combining the sentence text data to be detected and the candidate pronunciation data set, and inputting a text matching model; the text matching model is a pre-constructed BERT model used for determining the adaptation degree between the text data of the sentence to be detected and the candidate pronunciation of the polyphone;
and determining the pronunciation of polyphones contained in the text data of the sentence to be detected according to the output result of the text matching model.
2. The method of claim 1, wherein each polyphone corresponds to a four level vocabulary of words comprising:
the text of each polyphone, different pronunciations of each polyphone, paraphrase information corresponding to the different pronunciations of each polyphone, and common phrases corresponding to the different pronunciations of each polyphone.
3. The method of claim 2, further comprising:
iteratively executing the following steps until the output efficiency of the text matching model and/or the accuracy of the output result meet preset requirements or the iteration number exceeds a preset value:
updating the four-level polyphone word list according to the output efficiency of the text matching model and/or the accuracy of the output result;
obtaining a candidate pronunciation data set corresponding to polyphones contained in the text data of the sentence to be detected again by using the updated four-level polyphone word list;
and re-determining the output result of the text matching model according to the newly obtained candidate pronunciation data set.
4. The method of claim 3, wherein updating the four-level polyphonic vocabulary based on output efficiency of the text matching model and/or accuracy of output results comprises:
and adjusting, adding or deleting paraphrase information corresponding to different pronunciations of each polyphone and common phrases corresponding to different pronunciations of each polyphone in the four-level polyphone word list according to the output efficiency of the text matching model and/or the accuracy of an output result.
5. The method of claim 2, wherein obtaining a candidate pronunciation data set corresponding to polyphones contained in the sentence text data to be detected according to a four-level word list corresponding to polyphones contained in the sentence text data to be detected comprises:
determining a plurality of candidate pronunciations corresponding to polyphones contained in the sentence text data to be detected, paraphrase information corresponding to each candidate pronunciation and a common phrase according to a four-level word list corresponding to the polyphones contained in the sentence text data to be detected;
determining each candidate pronunciation data subset according to a plurality of candidate pronunciations corresponding to polyphones contained in the sentence text data to be detected and paraphrase information and common phrases corresponding to each candidate pronunciation;
and combining a plurality of candidate pronunciation data subsets to obtain the candidate pronunciation data set.
6. The method of claim 5, wherein inputting a text matching model after combining the sentence text data to be detected and the candidate pronunciation data set comprises:
and splicing the sentence text data to be detected and each candidate pronunciation data subset one by one, and inputting a text matching model.
7. The method of claim 1, wherein the process of building the text matching model comprises:
acquiring a plurality of training data and correct pronunciations corresponding to the training data; the training data includes: sentence text data containing polyphones and a plurality of candidate pronunciation data sets corresponding to the contained polyphones;
determining the adaptation degree between the sentence text data and each candidate pronunciation data set according to the correct pronunciation corresponding to each training data;
and (3) performing deep machine learning to construct a text matching model by taking training data as input of the BERT model and the adaptation degree between the sentence text data and each candidate pronunciation data set as output of the BERT model.
8. The method of claim 7, further comprising:
inputting a plurality of training data into the constructed text matching model to obtain the output results of the text matching model corresponding to the plurality of training data;
and adjusting the constructed text matching model according to the correct pronunciation corresponding to the plurality of training data and the text matching model output result corresponding to the plurality of training data.
9. The method of claim 1, wherein determining the reading of polyphones contained in the text data of the sentence to be detected according to the output result of the text matching model comprises:
sorting the adaptation degree between the output sentence text data to be detected of the text matching model and each candidate pronunciation of the polyphone;
and determining the candidate pronunciation with the top ranking as the pronunciation of the polyphone contained in the text data of the sentence to be detected.
10. A polyphonic disambiguating apparatus comprising:
the data acquisition module is used for acquiring the text data of the sentence to be detected containing the polyphone;
the four-level word list determining module is used for inquiring a pre-constructed four-level polyphone word list according to the sentence text data to be detected to obtain a four-level word list corresponding to polyphones contained in the sentence text data to be detected; the four-level polyphone word list records the association relationship between a plurality of polyphones and the four-level word list corresponding to each polyphone;
the candidate pronunciation data set determining module is used for obtaining a candidate pronunciation data set corresponding to polyphones contained in the sentence text data to be detected according to a four-level word list corresponding to the polyphones contained in the sentence text data to be detected;
the text matching module is used for inputting a text matching model after combining the text data of the sentence to be detected and the candidate pronunciation data set; the text matching model is a pre-constructed BERT model used for determining the adaptation degree between the text data of the sentence to be detected and the candidate pronunciation of the polyphone;
and the pronunciation determining module is used for determining the pronunciation of the polyphone contained in the text data of the sentence to be detected according to the output result of the text matching model.
11. The apparatus of claim 10, wherein each polyphone corresponds to a four level vocabulary of words comprising:
the text of each polyphone, different pronunciations of each polyphone, paraphrase information corresponding to the different pronunciations of each polyphone, and common phrases corresponding to the different pronunciations of each polyphone.
12. The apparatus of claim 11, further comprising: the four-level polyphone vocabulary iteration updating module is used for:
iteratively executing the following steps until the output efficiency of the text matching model and/or the accuracy of the output result meet preset requirements or the iteration number exceeds a preset value:
updating the four-level polyphone word list according to the output efficiency of the text matching model and/or the accuracy of the output result;
obtaining a candidate pronunciation data set corresponding to polyphones contained in the text data of the sentence to be detected again by using the updated four-level polyphone word list;
and re-determining the output result of the text matching model according to the newly obtained candidate pronunciation data set.
13. The apparatus of claim 12, wherein updating the level four polyphonic vocabulary based on output efficiency of a text matching model and/or accuracy of output results comprises:
and adjusting, adding or deleting paraphrase information corresponding to different pronunciations of each polyphone and common phrases corresponding to different pronunciations of each polyphone in the four-level polyphone word list according to the output efficiency of the text matching model and/or the accuracy of an output result.
14. The apparatus as claimed in claim 11, wherein the candidate pronunciation data set determination module is specifically configured to:
determining a plurality of candidate pronunciations corresponding to polyphones contained in the sentence text data to be detected, paraphrase information corresponding to each candidate pronunciation and a common phrase according to a four-level word list corresponding to the polyphones contained in the sentence text data to be detected;
determining each candidate pronunciation data subset according to a plurality of candidate pronunciations corresponding to polyphones contained in the sentence text data to be detected and paraphrase information and common phrases corresponding to each candidate pronunciation;
and combining a plurality of candidate pronunciation data subsets to obtain the candidate pronunciation data set.
15. The apparatus of claim 14, wherein the text matching module is specifically configured to:
and splicing the sentence text data to be detected and each candidate pronunciation data subset one by one, and inputting a text matching model.
16. The apparatus of claim 10, wherein the text matching module comprises: a text matching model construction unit for:
acquiring a plurality of training data and correct pronunciations corresponding to the training data; the training data includes: sentence text data containing polyphones and a plurality of candidate pronunciation data sets corresponding to the contained polyphones;
determining the adaptation degree between the sentence text data and each candidate pronunciation data set according to the correct pronunciation corresponding to each training data;
and (3) performing deep machine learning to construct a text matching model by taking training data as input of the BERT model and the adaptation degree between the sentence text data and each candidate pronunciation data set as output of the BERT model.
17. The apparatus of claim 16, wherein the text matching model building unit is further configured to:
inputting a plurality of training data into the constructed text matching model to obtain the output results of the text matching model corresponding to the plurality of training data;
and adjusting the constructed text matching model according to the correct pronunciation corresponding to the plurality of training data and the text matching model output result corresponding to the plurality of training data.
18. The apparatus of claim 10, wherein the pronunciation determination module is specifically configured to:
sorting the adaptation degree between the output sentence text data to be detected of the text matching model and each candidate pronunciation of the polyphone;
and determining the candidate pronunciation with the top ranking as the pronunciation of the polyphone contained in the text data of the sentence to be detected.
19. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 1 to 9 when executing the computer program.
20. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for executing the method of any one of claims 1 to 9.
CN202011581165.4A 2020-12-28 2020-12-28 Method and device for disambiguating polyphone Active CN112580335B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011581165.4A CN112580335B (en) 2020-12-28 2020-12-28 Method and device for disambiguating polyphone

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011581165.4A CN112580335B (en) 2020-12-28 2020-12-28 Method and device for disambiguating polyphone

Publications (2)

Publication Number Publication Date
CN112580335A true CN112580335A (en) 2021-03-30
CN112580335B CN112580335B (en) 2023-03-24

Family

ID=75140285

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011581165.4A Active CN112580335B (en) 2020-12-28 2020-12-28 Method and device for disambiguating polyphone

Country Status (1)

Country Link
CN (1) CN112580335B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113380223A (en) * 2021-05-26 2021-09-10 标贝(北京)科技有限公司 Method, device, system and storage medium for disambiguating polyphone
CN113672144A (en) * 2021-09-06 2021-11-19 北京搜狗科技发展有限公司 Data processing method and device
CN114417832A (en) * 2021-12-08 2022-04-29 马上消费金融股份有限公司 Disambiguation method, and training method and device of disambiguation model
CN114742044A (en) * 2022-03-18 2022-07-12 联想(北京)有限公司 Information processing method and device and electronic equipment
CN117975937A (en) * 2024-01-18 2024-05-03 中移雄安信息通信科技有限公司 Multi-tone word voice processing method and device and readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105336322A (en) * 2015-09-30 2016-02-17 百度在线网络技术(北京)有限公司 Polyphone model training method, and speech synthesis method and device
CN107515850A (en) * 2016-06-15 2017-12-26 阿里巴巴集团控股有限公司 Determine the methods, devices and systems of polyphone pronunciation
WO2019085640A1 (en) * 2017-10-31 2019-05-09 株式会社Ntt都科摩 Word meaning disambiguation method and device, word meaning expansion method, apparatus and device, and computer-readable storage medium
CN111599340A (en) * 2020-07-27 2020-08-28 南京硅基智能科技有限公司 Polyphone pronunciation prediction method and device and computer readable storage medium
CN111611810A (en) * 2020-05-29 2020-09-01 河北数云堂智能科技有限公司 Polyphone pronunciation disambiguation device and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105336322A (en) * 2015-09-30 2016-02-17 百度在线网络技术(北京)有限公司 Polyphone model training method, and speech synthesis method and device
CN107515850A (en) * 2016-06-15 2017-12-26 阿里巴巴集团控股有限公司 Determine the methods, devices and systems of polyphone pronunciation
WO2019085640A1 (en) * 2017-10-31 2019-05-09 株式会社Ntt都科摩 Word meaning disambiguation method and device, word meaning expansion method, apparatus and device, and computer-readable storage medium
CN111611810A (en) * 2020-05-29 2020-09-01 河北数云堂智能科技有限公司 Polyphone pronunciation disambiguation device and method
CN111599340A (en) * 2020-07-27 2020-08-28 南京硅基智能科技有限公司 Polyphone pronunciation prediction method and device and computer readable storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张力等: "中文TTS系统中多音字的一种解决方案", 《计算机应用与软件》 *
范明等: "汉语字音转换中的多层面多音字读音消歧", 《计算机工程与应用》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113380223A (en) * 2021-05-26 2021-09-10 标贝(北京)科技有限公司 Method, device, system and storage medium for disambiguating polyphone
CN113672144A (en) * 2021-09-06 2021-11-19 北京搜狗科技发展有限公司 Data processing method and device
CN114417832A (en) * 2021-12-08 2022-04-29 马上消费金融股份有限公司 Disambiguation method, and training method and device of disambiguation model
CN114417832B (en) * 2021-12-08 2023-05-05 马上消费金融股份有限公司 Disambiguation method, training method and device of disambiguation model
CN114742044A (en) * 2022-03-18 2022-07-12 联想(北京)有限公司 Information processing method and device and electronic equipment
CN117975937A (en) * 2024-01-18 2024-05-03 中移雄安信息通信科技有限公司 Multi-tone word voice processing method and device and readable storage medium

Also Published As

Publication number Publication date
CN112580335B (en) 2023-03-24

Similar Documents

Publication Publication Date Title
CN112580335B (en) Method and device for disambiguating polyphone
US7552045B2 (en) Method, apparatus and computer program product for providing flexible text based language identification
KR100769029B1 (en) Method and system for voice recognition of names in multiple languages
KR101586890B1 (en) Input processing method and apparatus
CN101542590A (en) Method, apparatus and computer program product for providing a language based interactive multimedia system
WO2014190732A1 (en) Method and apparatus for building a language model
EP2092514A2 (en) Content selection using speech recognition
CN103903619A (en) Method and system for improving accuracy of speech recognition
CN103677729A (en) Voice input method and system
JP6806662B2 (en) Speech synthesis system, statistical model generator, speech synthesizer, speech synthesis method
CN108417222B (en) Weighted finite state transducer decoding system and speech recognition system
CN111079423A (en) Method for generating dictation, reading and reporting audio, electronic equipment and storage medium
WO2014183411A1 (en) Method, apparatus and speech synthesis system for classifying unvoiced and voiced sound
Le et al. G2G: TTS-driven pronunciation learning for graphemic hybrid ASR
CN114783424A (en) Text corpus screening method, device, equipment and storage medium
CN101377726A (en) Input method combining speech recognition with stroke recognition and terminal thereof
CN112133285B (en) Speech recognition method, device, storage medium and electronic equipment
CN111357049A (en) Automatic speech recognition device and method
CN114596846A (en) Processing method and device for speech recognition text, electronic equipment and storage medium
CN113160804B (en) Hybrid voice recognition method and device, storage medium and electronic device
CN111489742B (en) Acoustic model training method, voice recognition device and electronic equipment
CN118379987B (en) Speech recognition method, device, related equipment and computer program product
CN113505612B (en) Multi-user dialogue voice real-time translation method, device, equipment and storage medium
US20140343934A1 (en) Method, Apparatus, and Speech Synthesis System for Classifying Unvoiced and Voiced Sound
WO2024124697A1 (en) Speech recognition method, apparatus and device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant