CN110211571B - Sentence fault detection method, sentence fault detection device and computer readable storage medium - Google Patents
Sentence fault detection method, sentence fault detection device and computer readable storage medium Download PDFInfo
- Publication number
- CN110211571B CN110211571B CN201910343889.6A CN201910343889A CN110211571B CN 110211571 B CN110211571 B CN 110211571B CN 201910343889 A CN201910343889 A CN 201910343889A CN 110211571 B CN110211571 B CN 110211571B
- Authority
- CN
- China
- Prior art keywords
- sentence
- preset
- target sentence
- words
- likelihood probability
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 31
- 238000000034 method Methods 0.000 claims abstract description 39
- 238000012549 training Methods 0.000 claims description 40
- 238000005516 engineering process Methods 0.000 claims description 17
- 238000013135 deep learning Methods 0.000 claims description 5
- 239000000203 mixture Substances 0.000 abstract 1
- 238000012545 processing Methods 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 230000007935 neutral effect Effects 0.000 description 4
- 230000000306 recurrent effect Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000003058 natural language processing Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000037361 pathway Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000013550 semantic technology Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0631—Creating reference templates; Clustering
- G10L2015/0633—Creating reference templates; Clustering using lexical or orthographic knowledge sources
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Signal Processing (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to the technical field of voice semantics, and discloses a method for detecting a mispronounced sentence, which comprises the following steps: acquiring a target sentence; identifying i word compositions contained in the target sentence; sequentially inputting the i words into a pre-trained language model according to the sequence in the target sentence, and calculating the confusion degree and/or log likelihood probability of the target sentence through the language model; and judging the target sentence as a wrong sentence when the confusion degree of the target sentence is larger than the preset confusion degree and/or the log likelihood probability of the target sentence is smaller than the preset log likelihood probability. The invention also provides a sentence error detection device and a computer readable storage medium. The invention can identify whether the sentence is a wrong sentence or not.
Description
Technical Field
The present invention relates to the field of speech semantic technologies, and in particular, to a method and apparatus for detecting a sentence error, and a computer readable storage medium.
Background
With the development of technology, automatic speech recognition (Automatic Speech Recognition, ASR) technology, which is a technology for converting human speech into text, is increasingly used. In the application process of the ASR technology, substitution, insertion or deletion errors are unavoidable in the ASR recognition result due to the influence of background noise or the influence of pronunciation of a speaker, such as dialect, accent, faster speaking, idiom habit and the like. These recognition errors may cause problems of misorder, mismatching, unknown semantics, mislogic statement and the like in the recognition sentences, so that the missentences are formed. These mistakes are not only difficult to understand and analyze, but also present great difficulties for subsequent natural language processing (Natural Language Processing, NLP) applications. In addition to the sentences obtained by ASR techniques, sentences manually entered in the computer may also have errors. Therefore, the recognition of the correctness of the sentence has a certain practical meaning and necessity.
Disclosure of Invention
The invention provides a method and a device for detecting a wrong sentence and a computer readable storage medium, which mainly aim to identify whether the sentence is the wrong sentence or not.
In order to achieve the above object, the present invention further provides a method for detecting a sentence error, the method comprising:
acquiring a target sentence obtained by an automatic voice recognition technology;
acquiring an ith text contained in the target sentence, and judging whether a word matched with the ith text exists in a preset dictionary, wherein the initial value of i is 1, and i is a positive integer;
if the words matched with the ith section of characters do not exist in the preset dictionary, adjusting the word number of the ith section of characters, and judging whether the words matched with the ith section of characters exist in the preset dictionary;
if the words matched with the ith word segment exist in the preset dictionary, determining the ith word segment as the ith word segment of the target sentence, enabling i=i+1, acquiring the ith word segment contained in the target sentence, and judging whether the words matched with the ith word segment exist in the preset dictionary;
when the total word number of i words is the same as the total word number of the target sentence, determining that the target sentence consists of the i words;
Sequentially inputting the i words into a pre-trained language model according to the sequence in the target sentence, and calculating the confusion degree and/or log likelihood probability of the target sentence through the language model;
and judging the target sentence as a wrong sentence when the confusion degree of the target sentence is larger than a preset confusion degree and/or the log likelihood probability of the target sentence is smaller than a preset log likelihood probability.
Optionally, the sequentially inputting the i words into the pre-trained language model according to the sequence in the target sentence includes:
judging whether preset keywords exist in the i words or not;
if the i words have preset keywords, words, except the preset keywords, in the i words are sequentially input into a pre-trained language model according to the sequence in the target sentence.
Optionally, when the confusion degree of the target sentence is greater than a preset confusion degree and/or the log likelihood probability of the target sentence is less than a preset log likelihood probability, before judging that the target sentence is a wrong sentence, the method further includes:
determining the preset confusion degree and/or the preset log likelihood probability;
the determining the preset confusion degree and/or the determining the preset log likelihood probability specifically comprises:
Determining the preset confusion degree and/or determining the preset log likelihood probability comprises:
obtaining a training sample for training the language model, wherein the training sample comprises a positive sample and a negative sample;
obtaining the confusion degree of the positive sample and the log likelihood probability of the positive sample; and
obtaining the confusion degree of the negative sample and the log likelihood probability of the negative sample;
obtaining a confusion degree histogram according to the confusion degree of the positive sample and the confusion degree of the negative sample, and obtaining the preset confusion degree through the confusion degree histogram; and
and acquiring a log-likelihood probability histogram according to the log-likelihood probability of the positive sample and the log-likelihood probability of the negative sample, and acquiring the preset log-likelihood probability through the log-likelihood probability histogram.
Optionally, the language model is a deep learning language model or a statistical-based language model.
Optionally, the method further comprises:
and if the target sentence is a wrong sentence, sending a wrong sentence reminding message.
In addition, in order to achieve the above object, the present invention also provides an apparatus for detecting a sentence in a sentence, the apparatus comprising a memory and a processor, wherein the memory stores a sentence detecting program capable of running on the processor, and the sentence detecting program, when executed by the processor, performs the steps of:
Acquiring a target sentence obtained by an automatic voice recognition technology;
acquiring an ith text contained in the target sentence, and judging whether a word matched with the ith text exists in a preset dictionary, wherein the initial value of i is 1, and i is a positive integer;
if the words matched with the ith section of characters do not exist in the preset dictionary, adjusting the word number of the ith section of characters, and judging whether the words matched with the ith section of characters exist in the preset dictionary;
if the words matched with the ith word segment exist in the preset dictionary, determining the ith word segment as the ith word segment of the target sentence, enabling i=i+1, acquiring the ith word segment contained in the target sentence, and judging whether the words matched with the ith word segment exist in the preset dictionary;
when the total word number of i words is the same as the total word number of the target sentence, determining that the target sentence consists of the i words;
sequentially inputting the i words into a pre-trained language model according to the sequence in the target sentence, and calculating the confusion degree and/or log likelihood probability of the target sentence through the language model;
and judging the target sentence as a wrong sentence when the confusion degree of the target sentence is larger than a preset confusion degree and/or the log likelihood probability of the target sentence is smaller than a preset log likelihood probability.
Optionally, the sequentially inputting the i words into the pre-trained language model according to the sequence in the target sentence includes:
judging whether preset keywords exist in the i words or not;
if the i words have preset keywords, words, except the preset keywords, in the i words are sequentially input into a pre-trained language model according to the sequence in the target sentence.
Optionally, the sentence detection program is executed by the processor, and further implements the following steps:
when the confusion degree of the target sentence is larger than a preset confusion degree and/or the log likelihood probability of the target sentence is smaller than a preset log likelihood probability, determining the preset confusion degree and/or determining the preset log likelihood probability before judging that the target sentence is a wrong sentence;
the determining the preset confusion degree and/or the determining the preset log likelihood probability specifically comprises:
obtaining a training sample for training the language model, wherein the training sample comprises a positive sample and a negative sample;
obtaining the confusion degree of the positive sample and the log likelihood probability of the positive sample; and
obtaining the confusion degree of the negative sample and the log likelihood probability of the negative sample;
Obtaining a confusion degree histogram according to the confusion degree of the positive sample and the confusion degree of the negative sample, and obtaining the preset confusion degree through the confusion degree histogram; and
and acquiring a log-likelihood probability histogram according to the log-likelihood probability of the positive sample and the log-likelihood probability of the negative sample, and acquiring the preset log-likelihood probability through the log-likelihood probability histogram.
Optionally, the language model is a deep learning language model or a statistical-based language model.
Optionally, the sentence detection program may be executed by the processor, and further implement the following steps:
and if the target sentence is a wrong sentence, sending a wrong sentence reminding message.
In addition, to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a sentence detection program executable by one or more processors to implement the steps of the sentence detection method as described above.
The invention provides a method, a device and a computer readable storage medium for detecting a wrong sentence, which are used for acquiring a target sentence obtained by an automatic voice recognition technology; acquiring an ith text contained in the target sentence, and judging whether a word matched with the ith text exists in a preset dictionary, wherein the initial value of i is 1, and i is a positive integer; if the words matched with the ith section of characters do not exist in the preset dictionary, adjusting the word number of the ith section of characters, and judging whether the words matched with the ith section of characters exist in the preset dictionary; if the words matched with the ith word segment exist in the preset dictionary, determining the ith word segment as the ith word segment of the target sentence, enabling i=i+1, acquiring the ith word segment contained in the target sentence, and judging whether the words matched with the ith word segment exist in the preset dictionary; when the total word number of i words is the same as the total word number of the target sentence, determining that the target sentence consists of the i words; sequentially inputting the i words into a pre-trained language model according to the sequence in the target sentence, and calculating the confusion degree and/or log likelihood probability of the target sentence through the language model; when the confusion degree of the target sentence is larger than the preset confusion degree and/or the log likelihood probability of the target sentence is smaller than the preset log likelihood probability, judging that the target sentence is a wrong sentence, thereby realizing the purpose of identifying whether the sentence is the wrong sentence.
Drawings
FIG. 1 is a flowchart illustrating a method for detecting a sentence in accordance with an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating an internal structure of a sentence detecting apparatus according to an embodiment of the present invention;
fig. 3 is a schematic block diagram of a sentence detection program in a sentence detection apparatus according to an embodiment of the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The invention provides a method for detecting a mispronounced sentence. Referring to fig. 1, a flow chart of a method for detecting a sentence in error according to an embodiment of the invention is shown. The method may be performed by an apparatus, which may be implemented in software and/or hardware.
In this embodiment, the method for detecting a sentence error includes:
step S10, obtaining a target sentence obtained through an automatic voice recognition technology.
In this embodiment, the target sentence obtained by the automatic speech recognition (automatic speech recognition, ASR) technology may be one sentence or a plurality of sentences, and each sentence may be a long sentence or a short sentence. In other embodiments, the target statement may also be a statement entered through other pathways.
Step S20, acquiring an ith text contained in the target sentence, and judging whether a word matched with the ith text exists in a preset dictionary, wherein the initial value of i is 1, and i is a positive integer.
When the ith text included in the target sentence is acquired, the ith text may be acquired sequentially in the left-to-right order (i.e. front-to-back), or may be acquired sequentially in the right-to-left order (i.e. back-to-front).
The number of words in each piece of text acquired may be the same or different.
Preferably, the number of words of the i-th text is the same as the number of words of the longest text in the preset dictionary.
And step S30, if no word matched with the ith section of characters exists in the preset dictionary, adjusting the number of characters of the ith section of characters, and judging whether the word matched with the ith section of characters exists in the preset dictionary.
Step S40, if there is a word matching with the i-th word in the preset dictionary, determining that the i-th word is the i-th word of the target sentence, letting i=i+1, obtaining the i-th word contained in the target sentence, and judging whether there is a word matching with the i-th word in the preset dictionary.
And S50, when the total word number of the i words is the same as the total word number of the target sentence, determining that the target sentence consists of the i words.
For example, for a sentence "i love autumn". If the word number of the longest word in the dictionary is 3, acquiring the first 3 words of 'I love' in 'I love autumn' as a first section, and matching the first section of 'I love' with the words in the dictionary; if the matching is unsuccessful, reducing 'I love' by one word to obtain 'I love', and matching 'I happiness' with words in a dictionary, if the matching is unsuccessful, directly determining 'I' as a single word; then, selecting words in the dictionary for matching in 'favorite autumn', if the matching is unsuccessful, reducing one word for matching in 'favorite autumn', namely, matching the words in the dictionary in 'favorite' if the matching is successful, determining that the words are favorite as one word, and similarly, matching the words in the dictionary in 'autumn', if the matching is successful, determining that the words in 'autumn' are one word. Then, after word segmentation processing is carried out on the sentence, the sentence is obtained to be respectively composed of three words of I'm, favorite and autumn.
Through the steps, the words contained in the target sentence can be rapidly identified, and rapid sentence misplacement detection is facilitated.
Step S60, the i words are sequentially input into a pre-trained language model according to the sequence in the target sentence, and the confusion degree and/or the log likelihood probability of the target sentence are calculated through the language model.
The language model may be a deep learning language model, e.g. a feed forward neural network (feedforward neutral network), a recurrent neural network (recurrent neutral network, RNN), or a statistical based language model, e.g. an N-gram (N-gram).
In this embodiment, a training set (the training set is a positive sample) may be formed by obtaining a correct word sequence (i.e., a correct sentence), then, the selected language model is trained by the sentences in the training set to obtain parameters of the language model, and the trained language model may identify the probability of occurrence of a word sequence.
From a statistical point of view, a sentence s in natural language may be composed of any word string (composed of several words in a certain word order), but the probability of occurrence P(s) of the sentence is small. For example, assume that there are the following sentences s1 and s2:
s1: i just eat dinner
s2: just me has eaten dinner
Obviously, for Chinese, s1 is a correct sentence, and s2 is an erroneous sentence, so for Chinese, the probability of occurrence of sentence s1 is greater than that of sentence s2, i.e. P (s 1) > P (s 2).
If there is one sentence s composed of m words, the probability P (W1, W2, …, wm) of the sentence s is:
P(W1,W2,…,Wm)=P(W1)P(W1|W2)P(W3|W1,W2)…P(Wm|W1,W2,…Wm-1)
P (W1), P (w1|w2), P (w3|w1, W2), etc. in the above formula can be calculated by using a pre-trained language model, and the probability P (W1, W2, …, wm) of the sentence s can be calculated after the values are calculated. Since the probability is a decimal value in the range of [0,1], a very small decimal value can be obtained by multiplying a plurality of probability values, and errors can be generated, we calculate the logarithmic value to obtain log likelihood probability logprob:
logprob=log(P(W1,W2,…,Wm))
logprob symbolizes the likelihood of a sentence appearing. The larger the logprob, the more likely it is that the sentence will appear, and the smaller the logprob, the less likely it will appear.
Another parameter that symbolizes the size of the probability of occurrence of a sentence is ppl (confusion), which is defined as the geometric average of the reciprocal of the probability of occurrence of a sentence, defined as:
ppl=10^(-logprob/(-logprob-OOVs+1))
OOVs in the above formula are the number of out-of-domain words in a sentence (out-of-domain words refer to words outside the dictionary). And, if ppl is smaller, the likelihood of occurrence of the sentence is large, and if ppl is larger, the likelihood of occurrence of the sentence is smaller.
Further, in another embodiment of the present invention, the sequentially inputting the i words into the pre-trained language model according to the order in the target sentence includes:
Judging whether preset keywords exist in the i words or not;
if the i words have preset keywords, words, except the preset keywords, in the i words are sequentially input into a pre-trained language model according to the sequence in the target sentence.
In this embodiment, the preset keyword may be a stop word, for example: "feed", "you good", "thank you", "you" and the like.
The preset keyword may also be an exclamation word, for example: "o", "ya", "wa" and other words.
The preset keywords can be other types of words which do not affect the semantics of the sentences, and can be preset according to actual requirements.
In the embodiment, the preset keywords are removed, so that the accuracy of the detection result is not affected, the obtained target sentences can be simplified, and the detection speed is improved.
And step S70, judging that the target sentence is a wrong sentence when the confusion degree of the target sentence is larger than the preset confusion degree and/or the log likelihood probability of the target sentence is smaller than the preset log likelihood probability.
In this embodiment, the erroneous sentence is a wrong sentence, including a word string of a sick sentence or a non-sentence.
The preset log likelihood probability may be a value that is preset to distinguish whether the log likelihood probability of the target sentence is high or low, and when the log likelihood probability of the target sentence is higher than the preset log likelihood probability, the log likelihood probability of the target sentence is high, and the target sentence is judged to be a correct sentence; when the log-likelihood probability of the target sentence is lower than the preset log-likelihood probability, the log-likelihood probability of the target sentence is low, and the target sentence is judged to be an erroneous sentence.
Similarly, the preset confusion degree may be a preset value for distinguishing whether the confusion degree of the sentence to be detected is high or low, when the confusion degree of the target sentence is higher than the preset confusion degree, the confusion degree of the target sentence is high, the target sentence is judged to be an erroneous sentence, and when the confusion degree of the target sentence is lower than the preset confusion degree, the confusion degree of the sentence to be detected is low, and the target sentence is judged to be a correct sentence.
In a possible embodiment, when the confusion degree of the target sentence is greater than a preset confusion degree and/or the log likelihood probability of the target sentence is less than a preset log likelihood probability, before determining that the target sentence is a wrong sentence, the method further includes:
determining the preset confusion degree and/or determining the preset log likelihood probability.
The determining the preset confusion degree and/or the determining the preset log likelihood probability specifically comprises:
obtaining a training sample for training the language model, wherein the training sample comprises a positive sample and a negative sample;
obtaining the confusion degree of the positive sample and the log likelihood probability of the positive sample; and
obtaining the confusion degree of the negative sample and the log likelihood probability of the negative sample;
obtaining a confusion degree histogram according to the confusion degree of the positive sample and the confusion degree of the negative sample, and obtaining the preset confusion degree through the confusion degree histogram; and
and acquiring a log-likelihood probability histogram according to the log-likelihood probability of the positive sample and the log-likelihood probability of the negative sample, and acquiring the preset log-likelihood probability through the log-likelihood probability histogram.
In this embodiment, the preset confusion degree is determined by a confusion degree histogram of a training sample, and the preset log likelihood probability is determined by a log likelihood probability histogram of the training sample.
Wherein the confusion histogram of the training sample includes a correct sentence and an incorrect sentence confusion histogram calculated from the training sample (i.e., a training set composed of sentences recognized by the ASR), which reflect the confusion distribution of the correct sentence and the confusion distribution of the incorrect sentence.
In the present embodiment, the preset confusion may be determined from the confusion distribution of the correct sentence, and the confusion distribution of the wrong sentence. Then, whether the confusion degree of the target sentence is within the confusion degree range of the correct sentence or the confusion degree range of the error sentence is determined, so that whether the target sentence is the correct sentence or the error sentence is judged.
Specifically, in the training sample, the confusion of the correct sentence and the confusion of the wrong sentence may be calculated. And then determining the confusion distribution of the correct sentences and the confusion distribution of the wrong sentences, thereby determining a confusion threshold for distinguishing the correct sentences from the wrong sentences, namely, the preset confusion.
For example, the confusion (ppl) distribution of correct sentences is:
ppl interval | Number of pieces | Percentage of |
[2000,+∞] | 5 | 0.125% |
[1500,2000] | 1 | 0.025% |
[1000,1500] | 5 | 0.125% |
[750,1000] | 11 | 0.276% |
[500,750] | 28 | 0.702% |
[250,500] | 217 | 5.441% |
[100,250] | 1241 | 31.18% |
[0,100] | 2480 | 62.187% |
The confusion (ppl) distribution of erroneous sentences is:
ppl interval | Number of pieces | Percentage of |
[2000,+∞] | 86 | 5.923% |
[1500,2000] | 43 | 2.961% |
[1000,1500] | 111 | 7.645% |
[750,1000] | 107 | 7.369% |
[500,750] | 204 | 14.05% |
[250,500] | 501 | 34.504% |
[100,250] | 345 | 23.76% |
[0,100] | 55 | 3.788% |
From the above histograms, it can be seen that for the correct sentence, the duty cycle of the correct sentence is 93.367% when ppl is less than 250.
For erroneous sentences, the duty cycle of erroneous sentences is 72.425% when ppl is greater than 250.
Therefore, when ppl is 250, a sentence can be distinguished well, and the preset confusion degree (preset ppl) is determined to be 250. In this embodiment, when the ppl of the target sentence is greater than the preset ppl, the target sentence is determined to be an error sentence.
Likewise, the acquisition method of the preset log likelihood probability (logprob) is similar to the acquisition method of the preset confusion. The logprob histogram of the training sample includes a correct sentence and a wrong sentence logprob histogram calculated from the training sample. They reflect the log-likelihood probability distribution of the correct sentence and the log-likelihood probability distribution of the incorrect sentence.
For example, the log likelihood probability (logprob) distribution of the correct sentence is:
logprob interval | Number of pieces | Percentage of |
[-∞,-4.0] | 1 | 0.0251% |
[-4.0,-3.5] | 0 | 0 |
[-3.5,-3.0] | 14 | 0.351% |
[-3.0,-2.5] | 122 | 3.0591% |
[-2.5,-2.0] | 1371 | 34.378% |
[-2.0,-1.5] | 1740 | 43.631% |
[-1.5,-1.0] | 673 | 16.876% |
[-1.0,0] | 67 | 1.68% |
The log likelihood probability (logprob) distribution of the erroneous sentence is:
logprob interval | Number of pieces | Percentage of |
[-∞,-4.0] | 8 | 0.551% |
[-4.0,-3.5] | 31 | 2.135% |
[-3.5,-3.0] | 200 | 13.774% |
[-3.0,-2.5] | 656 | 45.179% |
[-2.5,-2.0] | 502 | 34.573% |
[-2.0,-1.5] | 52 | 3.581% |
[-1.5,-1.0] | 3 | 0.207% |
[-1.0,0] | 0 | 0 |
From the above histograms, it can be seen that for correct sentences, the duty cycle of correct sentences is 96.566% when logprob is greater than-2.5, and for incorrect sentences, the duty cycle of incorrect sentences is 61.639% when logprob is less than-2.5.
Therefore, when the logprob is-2.5, a sentence can be better distinguished, and the preset logprob is determined to be-2.5. In this embodiment, when the logprob of the target is smaller than the logprob threshold, the target sentence is determined to be an erroneous sentence.
According to the method for detecting the mispronounced sentence, which is provided by the embodiment, a target sentence obtained through an automatic voice recognition technology is obtained; acquiring an ith text contained in the target sentence, and judging whether a word matched with the ith text exists in a preset dictionary, wherein the initial value of i is 1, and i is a positive integer; if the words matched with the ith section of characters do not exist in the preset dictionary, adjusting the word number of the ith section of characters, and judging whether the words matched with the ith section of characters exist in the preset dictionary; if the words matched with the ith word segment exist in the preset dictionary, determining the ith word segment as the ith word segment of the target sentence, enabling i=i+1, acquiring the ith word segment contained in the target sentence, and judging whether the words matched with the ith word segment exist in the preset dictionary; when the total word number of i words is the same as the total word number of the target sentence, determining that the target sentence consists of the i words; sequentially inputting the i words into a pre-trained language model according to the sequence in the target sentence, and calculating the confusion degree and/or log likelihood probability of the target sentence through the language model; when the confusion degree of the target sentence is larger than the preset confusion degree and/or the log likelihood probability of the target sentence is smaller than the preset log likelihood probability, judging that the target sentence is a wrong sentence, thereby realizing the purpose of identifying whether the sentence is the wrong sentence.
Further, in another embodiment of the method of the present invention, the method further comprises the steps of:
and if the target sentence is a wrong sentence, sending a wrong sentence reminding message.
When the target sentence is judged to be a wrong sentence, a prompt can be sent to a user. For example, when the sentence converted by voice is to be further processed by natural language, if the sentence is judged to be wrong, the user can be reminded that the target sentence is a wrong sentence in a manner of popup window error reporting on the display device.
In this embodiment, by sending the alert, the user can quickly learn whether and which error sentences exist, and further perform the subsequent operation.
The invention also provides a device for detecting the mispronounced sentence. Referring to fig. 2, an internal structure diagram of a sentence detecting device according to an embodiment of the invention is shown.
In this embodiment, the sentence-error detecting apparatus 1 may be a PC (Personal Computer ), or may be a terminal device such as a smart phone, a tablet computer, or a portable computer. The sentence detection device 1 comprises at least a memory 11, a processor 12, a communication bus 13, and a network interface 14.
The memory 11 includes at least one type of readable storage medium including flash memory, a hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the sentence detection device 1, for example a hard disk of the sentence detection device 1. The memory 11 may also be an external storage device of the sentence detection apparatus 1 in other embodiments, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the sentence detection apparatus 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the sentence detection apparatus 1. The memory 11 may be used not only for storing application software installed in the sentence detection apparatus 1 and various types of data, for example, codes of the sentence detection program 01, but also for temporarily storing data that has been output or is to be output.
The processor 12 may in some embodiments be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor or other data processing chip for executing program code or processing data stored in the memory 11, such as executing the sentence detection program 01, etc.
The communication bus 13 is used to enable connection communication between these components.
The network interface 14 may optionally comprise a standard wired interface, a wireless interface (e.g. WI-FI interface), typically used to establish a communication connection between the apparatus 1 and other electronic devices.
Optionally, the device 1 may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an Organic Light-Emitting Diode (OLED) touch, or the like. The display may also be referred to as a display screen or a display unit, as appropriate, for displaying information processed in the sentence detection device 1 and for displaying a visual user interface.
Fig. 2 shows only the sentence detection apparatus 1 having the components 11-14 and the sentence detection program 01, it will be understood by those skilled in the art that the structure shown in fig. 2 does not constitute a limitation of the sentence detection apparatus 1, and may include fewer or more components than shown, or may combine certain components, or may be a different arrangement of components.
In the embodiment of the apparatus 1 shown in fig. 2, the memory 11 stores a sentence detection program 01; the processor 12 performs the following steps when executing the sentence detection program 01 stored in the memory 11:
and acquiring a target sentence obtained by an automatic voice recognition technology.
In this embodiment, the target sentence obtained by the automatic speech recognition (automatic speech recognition, ASR) technology may be one sentence or a plurality of sentences, and each sentence may be a long sentence or a short sentence. In other embodiments, the target statement may also be a statement entered through other pathways.
And acquiring an ith text contained in the target sentence, and judging whether a word matched with the ith text exists in a preset dictionary, wherein the initial value of i is 1, and i is a positive integer.
When the ith text included in the target sentence is acquired, the ith text may be acquired sequentially in the left-to-right order (i.e. front-to-back), or may be acquired sequentially in the right-to-left order (i.e. back-to-front).
The number of words in each piece of text acquired may be the same or different.
Preferably, the number of words of the i-th text is the same as the number of words of the longest text in the preset dictionary.
If the words matched with the ith section of characters do not exist in the preset dictionary, the word number of the ith section of characters is adjusted, and whether the words matched with the ith section of characters exist in the preset dictionary is judged.
If the words matched with the ith word segment exist in the preset dictionary, determining the ith word segment as the ith word segment of the target sentence, enabling i=i+1, acquiring the ith word segment contained in the target sentence, and judging whether the words matched with the ith word segment exist in the preset dictionary.
And when the total word number of the i words is the same as the total word number of the target sentence, determining that the target sentence consists of the i words.
For example, for a sentence "i love autumn". If the word number of the longest word in the dictionary is 3, acquiring the first 3 words of 'I love' in 'I love autumn' as a first section, and matching the first section of 'I love' with the words in the dictionary; if the matching is unsuccessful, reducing 'I love' by one word to obtain 'I love', and matching 'I happiness' with words in a dictionary, if the matching is unsuccessful, directly determining 'I' as a single word; then, selecting words in the dictionary for matching in 'favorite autumn', if the matching is unsuccessful, reducing one word for matching in 'favorite autumn', namely, matching the words in the dictionary in 'favorite' if the matching is successful, determining that the words are favorite as one word, and similarly, matching the words in the dictionary in 'autumn', if the matching is successful, determining that the words in 'autumn' are one word. Then, after word segmentation processing is carried out on the sentence, the sentence is obtained to be respectively composed of three words of I'm, favorite and autumn.
Through the steps, the words contained in the target sentence can be rapidly identified, and rapid sentence misplacement detection is facilitated.
And sequentially inputting the i words into a pre-trained language model according to the sequence in the target sentence, and calculating the confusion degree and/or the log likelihood probability of the target sentence through the language model.
The language model may be a deep learning language model, e.g. a feed forward neural network (feedforward neutral network), a recurrent neural network (recurrent neutral network, RNN), or a statistical based language model, e.g. an N-gram (N-gram).
In this embodiment, a training set (the training set is a positive sample) may be formed by obtaining a correct word sequence (i.e., a correct sentence), then, the selected language model is trained by the sentences in the training set to obtain parameters of the language model, and the trained language model may identify the probability of occurrence of a word sequence.
From a statistical point of view, a sentence s in natural language may be composed of any word string (composed of several words in a certain word order), but the probability of occurrence P(s) of the sentence is small. For example, assume that there are the following sentences s1 and s2:
s1: i just eat dinner
s2: just me has eaten dinner
Obviously, for Chinese, s1 is a correct sentence, and s2 is an erroneous sentence, so for Chinese, the probability of occurrence of sentence s1 is greater than that of sentence s2, i.e. P (s 1) > P (s 2).
If there is one sentence s composed of m words, the probability P (W1, W2, …, wm) of the sentence s is:
P(W1,W2,…,Wm)=P(W1)P(W1|W2)P(W3|W1,W2)…P(Wm|W1,W2,…Wm-1)
p (W1), P (w1|w2), P (w3|w1, W2), etc. in the above formula can be calculated by using a pre-trained language model, and the probability P (W1, W2, …, wm) of the sentence s can be calculated after the values are calculated. Since the probability is a decimal value in the range of [0,1], a very small decimal value can be obtained by multiplying a plurality of probability values, and errors can be generated, we calculate the logarithmic value to obtain log likelihood probability logprob:
logprob=log(P(W1,W2,…,Wm))
logprob symbolizes the size of the likelihood of occurrence of a sentence. The larger the logprob, the more likely it is that the sentence will appear, and the smaller the logprob, the less likely it will appear.
Another parameter that symbolizes the size of the probability of occurrence of a sentence is ppl (confusion), which is defined as the geometric average of the reciprocal of the probability of occurrence of a sentence, defined as:
ppl=10^(-logprob/(-logprob-OOVs+1))
OOVs in the above formula are the number of out-of-domain words in a sentence (out-of-domain words refer to words outside the dictionary). And, if ppl is smaller, the likelihood of occurrence of the sentence is large, and if ppl is larger, the likelihood of occurrence of the sentence is smaller.
Further, in another embodiment of the present invention, the sequentially inputting the i words into the pre-trained language model according to the order in the target sentence includes:
judging whether preset keywords exist in the i words or not;
if the i words have preset keywords, words, except the preset keywords, in the i words are sequentially input into a pre-trained language model according to the sequence in the target sentence.
In this embodiment, the preset keyword may be a stop word, for example: "feed", "you good", "thank you", "you" and the like.
The preset keyword may also be an exclamation word, for example: "o", "ya", "wa" and other words.
The preset keywords can be other types of words which do not affect the semantics of the sentences, and can be preset according to actual requirements.
In the embodiment, the preset keywords are removed, so that the accuracy of the detection result is not affected, the obtained target sentences can be simplified, and the detection speed is improved.
And judging the target sentence as a wrong sentence when the confusion degree of the target sentence is larger than a preset confusion degree and/or the log likelihood probability of the target sentence is smaller than a preset log likelihood probability.
In this embodiment, the erroneous sentence is a wrong sentence, including a word string of a sick sentence or a non-sentence.
The preset log likelihood probability may be a value that is preset to distinguish whether the log likelihood probability of the target sentence is high or low, and when the log likelihood probability of the target sentence is higher than the preset log likelihood probability, the log likelihood probability of the target sentence is high, and the target sentence is judged to be a correct sentence; when the log-likelihood probability of the target sentence is lower than the preset log-likelihood probability, the log-likelihood probability of the target sentence is low, and the target sentence is judged to be an erroneous sentence.
Similarly, the preset confusion degree may be a preset value for distinguishing whether the confusion degree of the sentence to be detected is high or low, when the confusion degree of the target sentence is higher than the preset confusion degree, the confusion degree of the target sentence is high, the target sentence is judged to be an erroneous sentence, and when the confusion degree of the target sentence is lower than the preset confusion degree, the confusion degree of the sentence to be detected is low, and the target sentence is judged to be a correct sentence.
In a possible embodiment, when the confusion degree of the target sentence is greater than a preset confusion degree and/or the log likelihood probability of the target sentence is less than a preset log likelihood probability, before the target sentence is judged to be a wrong sentence, the following steps are further implemented:
determining the preset confusion degree and/or determining the preset log likelihood probability.
The determining the preset confusion degree and/or the determining the preset log likelihood probability specifically comprises:
obtaining a training sample for training the language model, wherein the training sample comprises a positive sample and a negative sample;
obtaining the confusion degree of the positive sample and the log likelihood probability of the positive sample; and
obtaining the confusion degree of the negative sample and the log likelihood probability of the negative sample;
obtaining a confusion degree histogram according to the confusion degree of the positive sample and the confusion degree of the negative sample, and obtaining the preset confusion degree through the confusion degree histogram; and
and acquiring a log-likelihood probability histogram according to the log-likelihood probability of the positive sample and the log-likelihood probability of the negative sample, and acquiring the preset log-likelihood probability through the log-likelihood probability histogram.
In this embodiment, the preset confusion degree is determined by a confusion degree histogram of a training sample, and the preset log likelihood probability is determined by a log likelihood probability histogram of the training sample.
Wherein the confusion histogram of the training sample includes a correct sentence and an incorrect sentence confusion histogram calculated from the training sample (i.e., a training set composed of sentences recognized by the ASR), which reflect the confusion distribution of the correct sentence and the confusion distribution of the incorrect sentence.
In the present embodiment, the preset confusion may be determined from the confusion distribution of the correct sentence, and the confusion distribution of the wrong sentence. Then, whether the confusion degree of the target sentence is within the confusion degree range of the correct sentence or the confusion degree range of the error sentence is determined, so that whether the target sentence is the correct sentence or the error sentence is judged.
Specifically, in the training sample, the confusion of the correct sentence and the confusion of the wrong sentence may be calculated. And then determining the confusion distribution of the correct sentences and the confusion distribution of the wrong sentences, thereby determining a confusion threshold for distinguishing the correct sentences from the wrong sentences, namely, the preset confusion.
For example, the confusion (ppl) distribution of correct sentences is:
ppl interval | Number of pieces | Percentage of |
[2000,+∞] | 5 | 0.125% |
[1500,2000] | 1 | 0.025% |
[1000,1500] | 5 | 0.125% |
[750,1000] | 11 | 0.276% |
[500,750] | 28 | 0.702% |
[250,500] | 217 | 5.441% |
[100,250] | 1241 | 31.18% |
[0,100] | 2480 | 62.187% |
The confusion (ppl) distribution of erroneous sentences is:
ppl interval | Number of pieces | Percentage of |
[2000,+∞] | 86 | 5.923% |
[1500,2000] | 43 | 2.961% |
[1000,1500] | 111 | 7.645% |
[750,1000] | 107 | 7.369% |
[500,750] | 204 | 14.05% |
[250,500] | 501 | 34.504% |
[100,250] | 345 | 23.76% |
[0,100] | 55 | 3.788% |
As can be seen from the above histograms, for the correct sentence, when ppl is less than 250, the duty cycle of the correct sentence is 93.367%
For erroneous sentences, the duty cycle of erroneous sentences is 72.425% when ppl is greater than 250.
Therefore, when ppl is 250, a sentence can be distinguished well, and the preset confusion degree (preset ppl) is determined to be 250. In this embodiment, when the ppl of the target sentence is greater than the preset ppl, the target sentence is determined to be an error sentence.
Likewise, the acquisition method of the preset log likelihood probability (logprob) is similar to the acquisition method of the preset confusion. The logprob histogram of the training sample includes a correct sentence and a wrong sentence logprob histogram calculated from the training sample. They reflect the log-likelihood probability distribution of the correct sentence and the log-likelihood probability distribution of the incorrect sentence.
For example, the log likelihood probability (logprob) distribution of the correct sentence is:
logprob interval | Number of pieces | Percentage of |
[-∞,-4.0] | 1 | 0.0251% |
[-4.0,-3.5] | 0 | 0 |
[-3.5,-3.0] | 14 | 0.351% |
[-3.0,-2.5] | 122 | 3.0591% |
[-2.5,-2.0] | 1371 | 34.378% |
[-2.0,-1.5] | 1740 | 43.631% |
[-1.5,-1.0] | 673 | 16.876% |
[-1.0,0] | 67 | 1.68% |
The log likelihood probability (logprob) distribution of the erroneous sentence is:
logprob interval | Number of pieces | Percentage of |
[-∞,-4.0] | 8 | 0.551% |
[-4.0,-3.5] | 31 | 2.135% |
[-3.5,-3.0] | 200 | 13.774% |
[-3.0,-2.5] | 656 | 45.179% |
[-2.5,-2.0] | 502 | 34.573% |
[-2.0,-1.5] | 52 | 3.581% |
[-1.5,-1.0] | 3 | 0.207% |
[-1.0,0] | 0 | 0 |
From the above histograms, it can be seen that for correct sentences, the duty cycle of correct sentences is 96.566% when logprob is greater than-2.5, and for incorrect sentences, the duty cycle of incorrect sentences is 61.639% when logprob is less than-2.5.
Therefore, when the logprob is-2.5, a sentence can be better distinguished, and the preset logprob is determined to be-2.5. In this embodiment, when the logprob of the target is smaller than the logprob threshold, the target sentence is determined to be an erroneous sentence.
The erroneous sentence detection device provided by the embodiment acquires a target sentence obtained by an automatic voice recognition technology; acquiring an ith text contained in the target sentence, and judging whether a word matched with the ith text exists in a preset dictionary, wherein the initial value of i is 1, and i is a positive integer; if the words matched with the ith section of characters do not exist in the preset dictionary, adjusting the word number of the ith section of characters, and judging whether the words matched with the ith section of characters exist in the preset dictionary; if the words matched with the ith word segment exist in the preset dictionary, determining the ith word segment as the ith word segment of the target sentence, enabling i=i+1, acquiring the ith word segment contained in the target sentence, and judging whether the words matched with the ith word segment exist in the preset dictionary; when the total word number of i words is the same as the total word number of the target sentence, determining that the target sentence consists of the i words; sequentially inputting the i words into a pre-trained language model according to the sequence in the target sentence, and calculating the confusion degree and/or log likelihood probability of the target sentence through the language model; when the confusion degree of the target sentence is larger than the preset confusion degree and/or the log likelihood probability of the target sentence is smaller than the preset log likelihood probability, judging that the target sentence is a wrong sentence, thereby realizing the purpose of identifying whether the sentence is the wrong sentence.
Further, in another embodiment of the apparatus of the present invention, the sentence detection program may be further invoked by the processor to implement the following steps:
and if the target sentence is a wrong sentence, sending a wrong sentence reminding message.
When the target sentence is judged to be a wrong sentence, a prompt can be sent to a user. For example, when the sentence converted by voice is to be further processed by natural language, if the sentence is judged to be wrong, the user can be reminded that the target sentence is a wrong sentence in a manner of popup window error reporting on the display device.
In this embodiment, by sending the alert, the user can quickly learn whether and which error sentences exist, and further perform the subsequent operation.
Alternatively, in other embodiments, the sentence detection program may be further divided into one or more modules, where one or more modules are stored in the memory 11 and executed by one or more processors (the processor 12 in this embodiment) to implement the present invention, and the modules referred to herein are a series of instruction segments of a computer program capable of performing a specific function for describing the execution of the sentence detection program in the sentence detection device.
For example, referring to fig. 3, a schematic program module of a sentence detection program in an embodiment of the sentence detection apparatus according to the present invention is shown, where the sentence detection program may be divided into a first obtaining module 10, a second obtaining module 20, an adjusting module 30, a first judging module 40, a determining module 50, a calculating module 60 and a second judging module 70, by way of example:
The first acquisition module 10 is configured to: acquiring a target sentence obtained by an automatic voice recognition technology;
the second acquisition module 20 is configured to: acquiring an ith text contained in the target sentence, and judging whether a word matched with the ith text exists in a preset dictionary, wherein the initial value of i is 1, and i is a positive integer;
the adjustment module 30 is configured to: if the words matched with the ith section of characters do not exist in the preset dictionary, adjusting the word number of the ith section of characters, and judging whether the words matched with the ith section of characters exist in the preset dictionary;
the first judging module 40 is configured to: if the words matched with the ith word segment exist in the preset dictionary, determining the ith word segment as the ith word segment of the target sentence, enabling i=i+1, acquiring the ith word segment contained in the target sentence, and judging whether the words matched with the ith word segment exist in the preset dictionary;
the determining module 50 is configured to: and when the total word number of the i words is the same as the total word number of the target sentence, determining that the target sentence consists of the i words.
The calculation module 60 is configured to: and sequentially inputting the i words into a pre-trained language model according to the sequence in the target sentence, and calculating the confusion degree and/or the log likelihood probability of the target sentence through the language model.
The second judging module 70 is configured to: and judging the target sentence as a wrong sentence when the confusion degree of the target sentence is larger than a preset confusion degree and/or the log likelihood probability of the target sentence is smaller than a preset log likelihood probability.
The functions or operation steps implemented when the program modules, such as the first acquiring module 10, the second acquiring module 20, the adjusting module 30, the first judging module 40, the determining module 50, the calculating module 60, and the second judging module 70, are substantially the same as those of the foregoing embodiments, and are not repeated herein.
In addition, an embodiment of the present invention further proposes a computer-readable storage medium having stored thereon a sentence detection program executable by one or more processors to implement the following operations:
acquiring a target sentence obtained by an automatic voice recognition technology;
acquiring an ith text contained in the target sentence, and judging whether a word matched with the ith text exists in a preset dictionary, wherein the initial value of i is 1, and i is a positive integer;
if the words matched with the ith section of characters do not exist in the preset dictionary, adjusting the word number of the ith section of characters, and judging whether the words matched with the ith section of characters exist in the preset dictionary;
If the words matched with the ith word segment exist in the preset dictionary, determining the ith word segment as the ith word segment of the target sentence, enabling i=i+1, acquiring the ith word segment contained in the target sentence, and judging whether the words matched with the ith word segment exist in the preset dictionary;
when the total word number of i words is the same as the total word number of the target sentence, determining that the target sentence consists of the i words;
sequentially inputting the i words into a pre-trained language model according to the sequence in the target sentence, and calculating the confusion degree and/or log likelihood probability of the target sentence through the language model;
and judging the target sentence as a wrong sentence when the confusion degree of the target sentence is larger than a preset confusion degree and/or the log likelihood probability of the target sentence is smaller than a preset log likelihood probability.
The computer-readable storage medium of the present invention is substantially the same as the embodiments of the apparatus and method for detecting a sentence in the above-described manner, and will not be described in detail herein.
It should be noted that, the foregoing reference numerals of the embodiments of the present invention are merely for describing the embodiments, and do not represent the advantages and disadvantages of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article or method that comprises the element.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.
Claims (10)
1. A method for detecting a sentence, the method comprising:
acquiring a target sentence obtained by an automatic voice recognition technology;
Acquiring an ith text contained in the target sentence, and judging whether a word matched with the ith text exists in a preset dictionary, wherein the initial value of i is 1, and i is a positive integer;
if the words matched with the ith section of characters do not exist in the preset dictionary, adjusting the word number of the ith section of characters, and judging whether the words matched with the ith section of characters exist in the preset dictionary;
if the words matched with the ith word segment exist in the preset dictionary, determining the ith word segment as the ith word segment of the target sentence, enabling i=i+1, acquiring the ith word segment contained in the target sentence, and judging whether the words matched with the ith word segment exist in the preset dictionary;
when the total word number of i words is the same as the total word number of the target sentence, determining that the target sentence consists of the i words;
sequentially inputting the i words into a pre-trained language model according to the sequence in the target sentence, and calculating the confusion degree and/or log likelihood probability of the target sentence through the language model;
and judging the target sentence as a wrong sentence when the confusion degree of the target sentence is larger than a preset confusion degree and/or the log likelihood probability of the target sentence is smaller than a preset log likelihood probability.
2. The method of claim 1, wherein sequentially inputting the i words into a pre-trained language model according to the order in the target sentence comprises:
judging whether preset keywords exist in the i words or not;
if the i words have preset keywords, words, except the preset keywords, in the i words are sequentially input into a pre-trained language model according to the sequence in the target sentence.
3. The method for detecting a sentence in error according to claim 1 or 2, wherein when the confusion degree of the target sentence is greater than a preset confusion degree and/or the log likelihood probability of the target sentence is less than a preset log likelihood probability, before determining that the target sentence is a sentence in error, further comprising:
determining the preset confusion degree and/or the preset log likelihood probability;
the determining the preset confusion degree and/or the determining the preset log likelihood probability specifically comprises:
obtaining a training sample for training the language model, wherein the training sample comprises a positive sample and a negative sample;
obtaining the confusion degree of the positive sample and the log likelihood probability of the positive sample; and
Obtaining the confusion degree of the negative sample and the log likelihood probability of the negative sample;
obtaining a confusion degree histogram according to the confusion degree of the positive sample and the confusion degree of the negative sample, and obtaining the preset confusion degree through the confusion degree histogram; and
and acquiring a log-likelihood probability histogram according to the log-likelihood probability of the positive sample and the log-likelihood probability of the negative sample, and acquiring the preset log-likelihood probability through the log-likelihood probability histogram.
4. The method of claim 2, wherein the language model is a deep learning language model or a statistical-based language model.
5. The sentence detection method according to claim 1 or 2, characterized in that the method further comprises:
and if the target sentence is a wrong sentence, sending a wrong sentence reminding message.
6. An apparatus for detecting a sentence in a sentence, the apparatus comprising a memory and a processor, the memory having stored thereon a sentence detection program operable on the processor, the sentence detection program, when executed by the processor, performing the steps of:
acquiring a target sentence obtained by an automatic voice recognition technology;
Acquiring an ith text contained in the target sentence, and judging whether a word matched with the ith text exists in a preset dictionary, wherein the initial value of i is 1, and i is a positive integer;
if the words matched with the ith section of characters do not exist in the preset dictionary, adjusting the word number of the ith section of characters, and judging whether the words matched with the ith section of characters exist in the preset dictionary;
if the words matched with the ith word segment exist in the preset dictionary, determining the ith word segment as the ith word segment of the target sentence, enabling i=i+1, acquiring the ith word segment contained in the target sentence, and judging whether the words matched with the ith word segment exist in the preset dictionary;
when the total word number of i words is the same as the total word number of the target sentence, determining that the target sentence consists of the i words;
sequentially inputting the i words into a pre-trained language model according to the sequence in the target sentence, and calculating the confusion degree and/or log likelihood probability of the target sentence through the language model;
and judging the target sentence as a wrong sentence when the confusion degree of the target sentence is larger than a preset confusion degree and/or the log likelihood probability of the target sentence is smaller than a preset log likelihood probability.
7. The erroneous sentence detection apparatus of claim 6, wherein sequentially inputting the i words to a pre-trained language model in accordance with the order in the target sentence comprises:
judging whether preset keywords exist in the i words or not;
if the i words have preset keywords, words, except the preset keywords, in the i words are sequentially input into a pre-trained language model according to the sequence in the target sentence.
8. The sentence detection apparatus according to claim 6 or 7, wherein the sentence detection program is executed by the processor, further implementing the steps of:
when the confusion degree of the target sentence is larger than a preset confusion degree and/or the log likelihood probability of the target sentence is smaller than a preset log likelihood probability, determining the preset confusion degree and/or determining the preset log likelihood probability before judging that the target sentence is a wrong sentence;
the determining the preset confusion degree and/or the determining the preset log likelihood probability specifically comprises:
obtaining a training sample for training the language model, wherein the training sample comprises a positive sample and a negative sample;
obtaining the confusion degree of the positive sample and the log likelihood probability of the positive sample; and
Obtaining the confusion degree of the negative sample and the log likelihood probability of the negative sample;
obtaining a confusion degree histogram according to the confusion degree of the positive sample and the confusion degree of the negative sample, and obtaining the preset confusion degree through the confusion degree histogram; and
and acquiring a log-likelihood probability histogram according to the log-likelihood probability of the positive sample and the log-likelihood probability of the negative sample, and acquiring the preset log-likelihood probability through the log-likelihood probability histogram.
9. The sentence detection apparatus according to claim 6 or 7, wherein the sentence detection program is executable by the processor, further implementing the steps of:
and if the target sentence is a wrong sentence, sending a wrong sentence reminding message.
10. A computer-readable storage medium having stored thereon a sentence detection program executable by one or more processors to implement the steps of the sentence detection method of any one of claims 1 to 5.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910343889.6A CN110211571B (en) | 2019-04-26 | 2019-04-26 | Sentence fault detection method, sentence fault detection device and computer readable storage medium |
PCT/CN2019/102191 WO2020215550A1 (en) | 2019-04-26 | 2019-08-23 | Wrong sentence detection method and apparatus, and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910343889.6A CN110211571B (en) | 2019-04-26 | 2019-04-26 | Sentence fault detection method, sentence fault detection device and computer readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110211571A CN110211571A (en) | 2019-09-06 |
CN110211571B true CN110211571B (en) | 2023-05-26 |
Family
ID=67786422
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910343889.6A Active CN110211571B (en) | 2019-04-26 | 2019-04-26 | Sentence fault detection method, sentence fault detection device and computer readable storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110211571B (en) |
WO (1) | WO2020215550A1 (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110852087B (en) * | 2019-09-23 | 2022-02-22 | 腾讯科技(深圳)有限公司 | Chinese error correction method and device, storage medium and electronic device |
CN110765996B (en) * | 2019-10-21 | 2022-07-29 | 北京百度网讯科技有限公司 | Text information processing method and device |
WO2021138898A1 (en) * | 2020-01-10 | 2021-07-15 | 深圳市欢太科技有限公司 | Speech recognition result detection method and apparatus, and storage medium |
CN112380855B (en) * | 2020-11-20 | 2024-03-08 | 北京百度网讯科技有限公司 | Method for determining statement smoothness, method and device for determining probability prediction model |
CN112863499B (en) * | 2021-01-13 | 2023-01-24 | 北京小米松果电子有限公司 | Speech recognition method and device, storage medium |
CN113096667A (en) * | 2021-04-19 | 2021-07-09 | 上海云绅智能科技有限公司 | Wrongly-written character recognition detection method and system |
CN115062148B (en) * | 2022-06-23 | 2023-06-20 | 广东国义信息科技有限公司 | Risk control method based on database |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6064959A (en) * | 1997-03-28 | 2000-05-16 | Dragon Systems, Inc. | Error correction in speech recognition |
JP2014089247A (en) * | 2012-10-29 | 2014-05-15 | Nippon Telegr & Teleph Corp <Ntt> | Identification language model learning device, identification language model learning method, and program |
CN105244029A (en) * | 2015-08-28 | 2016-01-13 | 科大讯飞股份有限公司 | Voice recognition post-processing method and system |
CN107741928A (en) * | 2017-10-13 | 2018-02-27 | 四川长虹电器股份有限公司 | A kind of method to text error correction after speech recognition based on field identification |
CN108255857A (en) * | 2016-12-29 | 2018-07-06 | 北京国双科技有限公司 | A kind of sentence detection method and device |
CN108766437A (en) * | 2018-05-31 | 2018-11-06 | 平安科技(深圳)有限公司 | Audio recognition method, device, computer equipment and storage medium |
CN108959250A (en) * | 2018-06-27 | 2018-12-07 | 众安信息技术服务有限公司 | A kind of error correction method and its system based on language model and word feature |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001069543A (en) * | 1999-08-26 | 2001-03-16 | Nec Corp | Radio paging system |
CN101295293B (en) * | 2007-04-29 | 2010-06-02 | 摩托罗拉公司 | Automatic error correction method for input character string of ideographic character |
US9653071B2 (en) * | 2014-02-08 | 2017-05-16 | Honda Motor Co., Ltd. | Method and system for the correction-centric detection of critical speech recognition errors in spoken short messages |
-
2019
- 2019-04-26 CN CN201910343889.6A patent/CN110211571B/en active Active
- 2019-08-23 WO PCT/CN2019/102191 patent/WO2020215550A1/en active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6064959A (en) * | 1997-03-28 | 2000-05-16 | Dragon Systems, Inc. | Error correction in speech recognition |
JP2014089247A (en) * | 2012-10-29 | 2014-05-15 | Nippon Telegr & Teleph Corp <Ntt> | Identification language model learning device, identification language model learning method, and program |
CN105244029A (en) * | 2015-08-28 | 2016-01-13 | 科大讯飞股份有限公司 | Voice recognition post-processing method and system |
CN108255857A (en) * | 2016-12-29 | 2018-07-06 | 北京国双科技有限公司 | A kind of sentence detection method and device |
CN107741928A (en) * | 2017-10-13 | 2018-02-27 | 四川长虹电器股份有限公司 | A kind of method to text error correction after speech recognition based on field identification |
CN108766437A (en) * | 2018-05-31 | 2018-11-06 | 平安科技(深圳)有限公司 | Audio recognition method, device, computer equipment and storage medium |
CN108959250A (en) * | 2018-06-27 | 2018-12-07 | 众安信息技术服务有限公司 | A kind of error correction method and its system based on language model and word feature |
Also Published As
Publication number | Publication date |
---|---|
CN110211571A (en) | 2019-09-06 |
WO2020215550A1 (en) | 2020-10-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110211571B (en) | Sentence fault detection method, sentence fault detection device and computer readable storage medium | |
CN108292500B (en) | Apparatus and method for end-of-sentence detection using grammar consistency | |
WO2020224213A1 (en) | Sentence intent identification method, device, and computer readable storage medium | |
JP5901001B1 (en) | Method and device for acoustic language model training | |
US11043213B2 (en) | System and method for detection and correction of incorrectly pronounced words | |
CN109284503B (en) | Translation statement ending judgment method and system | |
CN114416943B (en) | Training method and device for dialogue model, electronic equipment and storage medium | |
CN110866095A (en) | Text similarity determination method and related equipment | |
CN113657098B (en) | Text error correction method, device, equipment and storage medium | |
CN112364658A (en) | Translation and voice recognition method, device and equipment | |
CN112434520A (en) | Named entity recognition method and device and readable storage medium | |
CN110826301A (en) | Punctuation mark adding method, system, mobile terminal and storage medium | |
CN114386399A (en) | Text error correction method and device | |
CN112257470A (en) | Model training method and device, computer equipment and readable storage medium | |
CN115858776B (en) | Variant text classification recognition method, system, storage medium and electronic equipment | |
CN111460811A (en) | Crowdsourcing task answer verification method and device, computer equipment and storage medium | |
CN107656627B (en) | Information input method and device | |
CN108304366B (en) | Hypernym detection method and device | |
CN113177406B (en) | Text processing method, text processing device, electronic equipment and computer readable medium | |
CN110413983B (en) | Method and device for identifying name | |
CN108021918B (en) | Character recognition method and device | |
CN110866390B (en) | Method and device for recognizing Chinese grammar error, computer equipment and storage medium | |
CN110929749A (en) | Text recognition method, text recognition device, text recognition medium and electronic equipment | |
CN113254658B (en) | Text information processing method, system, medium, and apparatus | |
CN112185346B (en) | Multilingual voice keyword detection and model generation method and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |