CN110211571B - Sentence fault detection method, sentence fault detection device and computer readable storage medium - Google Patents

Sentence fault detection method, sentence fault detection device and computer readable storage medium Download PDF

Info

Publication number
CN110211571B
CN110211571B CN201910343889.6A CN201910343889A CN110211571B CN 110211571 B CN110211571 B CN 110211571B CN 201910343889 A CN201910343889 A CN 201910343889A CN 110211571 B CN110211571 B CN 110211571B
Authority
CN
China
Prior art keywords
sentence
preset
target sentence
words
likelihood probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910343889.6A
Other languages
Chinese (zh)
Other versions
CN110211571A (en
Inventor
张勇
马骏
王少军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910343889.6A priority Critical patent/CN110211571B/en
Priority to PCT/CN2019/102191 priority patent/WO2020215550A1/en
Publication of CN110211571A publication Critical patent/CN110211571A/en
Application granted granted Critical
Publication of CN110211571B publication Critical patent/CN110211571B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0631Creating reference templates; Clustering
    • G10L2015/0633Creating reference templates; Clustering using lexical or orthographic knowledge sources
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Signal Processing (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to the technical field of voice semantics, and discloses a method for detecting a mispronounced sentence, which comprises the following steps: acquiring a target sentence; identifying i word compositions contained in the target sentence; sequentially inputting the i words into a pre-trained language model according to the sequence in the target sentence, and calculating the confusion degree and/or log likelihood probability of the target sentence through the language model; and judging the target sentence as a wrong sentence when the confusion degree of the target sentence is larger than the preset confusion degree and/or the log likelihood probability of the target sentence is smaller than the preset log likelihood probability. The invention also provides a sentence error detection device and a computer readable storage medium. The invention can identify whether the sentence is a wrong sentence or not.

Description

Sentence fault detection method, sentence fault detection device and computer readable storage medium
Technical Field
The present invention relates to the field of speech semantic technologies, and in particular, to a method and apparatus for detecting a sentence error, and a computer readable storage medium.
Background
With the development of technology, automatic speech recognition (Automatic Speech Recognition, ASR) technology, which is a technology for converting human speech into text, is increasingly used. In the application process of the ASR technology, substitution, insertion or deletion errors are unavoidable in the ASR recognition result due to the influence of background noise or the influence of pronunciation of a speaker, such as dialect, accent, faster speaking, idiom habit and the like. These recognition errors may cause problems of misorder, mismatching, unknown semantics, mislogic statement and the like in the recognition sentences, so that the missentences are formed. These mistakes are not only difficult to understand and analyze, but also present great difficulties for subsequent natural language processing (Natural Language Processing, NLP) applications. In addition to the sentences obtained by ASR techniques, sentences manually entered in the computer may also have errors. Therefore, the recognition of the correctness of the sentence has a certain practical meaning and necessity.
Disclosure of Invention
The invention provides a method and a device for detecting a wrong sentence and a computer readable storage medium, which mainly aim to identify whether the sentence is the wrong sentence or not.
In order to achieve the above object, the present invention further provides a method for detecting a sentence error, the method comprising:
acquiring a target sentence obtained by an automatic voice recognition technology;
acquiring an ith text contained in the target sentence, and judging whether a word matched with the ith text exists in a preset dictionary, wherein the initial value of i is 1, and i is a positive integer;
if the words matched with the ith section of characters do not exist in the preset dictionary, adjusting the word number of the ith section of characters, and judging whether the words matched with the ith section of characters exist in the preset dictionary;
if the words matched with the ith word segment exist in the preset dictionary, determining the ith word segment as the ith word segment of the target sentence, enabling i=i+1, acquiring the ith word segment contained in the target sentence, and judging whether the words matched with the ith word segment exist in the preset dictionary;
when the total word number of i words is the same as the total word number of the target sentence, determining that the target sentence consists of the i words;
Sequentially inputting the i words into a pre-trained language model according to the sequence in the target sentence, and calculating the confusion degree and/or log likelihood probability of the target sentence through the language model;
and judging the target sentence as a wrong sentence when the confusion degree of the target sentence is larger than a preset confusion degree and/or the log likelihood probability of the target sentence is smaller than a preset log likelihood probability.
Optionally, the sequentially inputting the i words into the pre-trained language model according to the sequence in the target sentence includes:
judging whether preset keywords exist in the i words or not;
if the i words have preset keywords, words, except the preset keywords, in the i words are sequentially input into a pre-trained language model according to the sequence in the target sentence.
Optionally, when the confusion degree of the target sentence is greater than a preset confusion degree and/or the log likelihood probability of the target sentence is less than a preset log likelihood probability, before judging that the target sentence is a wrong sentence, the method further includes:
determining the preset confusion degree and/or the preset log likelihood probability;
the determining the preset confusion degree and/or the determining the preset log likelihood probability specifically comprises:
Determining the preset confusion degree and/or determining the preset log likelihood probability comprises:
obtaining a training sample for training the language model, wherein the training sample comprises a positive sample and a negative sample;
obtaining the confusion degree of the positive sample and the log likelihood probability of the positive sample; and
obtaining the confusion degree of the negative sample and the log likelihood probability of the negative sample;
obtaining a confusion degree histogram according to the confusion degree of the positive sample and the confusion degree of the negative sample, and obtaining the preset confusion degree through the confusion degree histogram; and
and acquiring a log-likelihood probability histogram according to the log-likelihood probability of the positive sample and the log-likelihood probability of the negative sample, and acquiring the preset log-likelihood probability through the log-likelihood probability histogram.
Optionally, the language model is a deep learning language model or a statistical-based language model.
Optionally, the method further comprises:
and if the target sentence is a wrong sentence, sending a wrong sentence reminding message.
In addition, in order to achieve the above object, the present invention also provides an apparatus for detecting a sentence in a sentence, the apparatus comprising a memory and a processor, wherein the memory stores a sentence detecting program capable of running on the processor, and the sentence detecting program, when executed by the processor, performs the steps of:
Acquiring a target sentence obtained by an automatic voice recognition technology;
acquiring an ith text contained in the target sentence, and judging whether a word matched with the ith text exists in a preset dictionary, wherein the initial value of i is 1, and i is a positive integer;
if the words matched with the ith section of characters do not exist in the preset dictionary, adjusting the word number of the ith section of characters, and judging whether the words matched with the ith section of characters exist in the preset dictionary;
if the words matched with the ith word segment exist in the preset dictionary, determining the ith word segment as the ith word segment of the target sentence, enabling i=i+1, acquiring the ith word segment contained in the target sentence, and judging whether the words matched with the ith word segment exist in the preset dictionary;
when the total word number of i words is the same as the total word number of the target sentence, determining that the target sentence consists of the i words;
sequentially inputting the i words into a pre-trained language model according to the sequence in the target sentence, and calculating the confusion degree and/or log likelihood probability of the target sentence through the language model;
and judging the target sentence as a wrong sentence when the confusion degree of the target sentence is larger than a preset confusion degree and/or the log likelihood probability of the target sentence is smaller than a preset log likelihood probability.
Optionally, the sequentially inputting the i words into the pre-trained language model according to the sequence in the target sentence includes:
judging whether preset keywords exist in the i words or not;
if the i words have preset keywords, words, except the preset keywords, in the i words are sequentially input into a pre-trained language model according to the sequence in the target sentence.
Optionally, the sentence detection program is executed by the processor, and further implements the following steps:
when the confusion degree of the target sentence is larger than a preset confusion degree and/or the log likelihood probability of the target sentence is smaller than a preset log likelihood probability, determining the preset confusion degree and/or determining the preset log likelihood probability before judging that the target sentence is a wrong sentence;
the determining the preset confusion degree and/or the determining the preset log likelihood probability specifically comprises:
obtaining a training sample for training the language model, wherein the training sample comprises a positive sample and a negative sample;
obtaining the confusion degree of the positive sample and the log likelihood probability of the positive sample; and
obtaining the confusion degree of the negative sample and the log likelihood probability of the negative sample;
Obtaining a confusion degree histogram according to the confusion degree of the positive sample and the confusion degree of the negative sample, and obtaining the preset confusion degree through the confusion degree histogram; and
and acquiring a log-likelihood probability histogram according to the log-likelihood probability of the positive sample and the log-likelihood probability of the negative sample, and acquiring the preset log-likelihood probability through the log-likelihood probability histogram.
Optionally, the language model is a deep learning language model or a statistical-based language model.
Optionally, the sentence detection program may be executed by the processor, and further implement the following steps:
and if the target sentence is a wrong sentence, sending a wrong sentence reminding message.
In addition, to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a sentence detection program executable by one or more processors to implement the steps of the sentence detection method as described above.
The invention provides a method, a device and a computer readable storage medium for detecting a wrong sentence, which are used for acquiring a target sentence obtained by an automatic voice recognition technology; acquiring an ith text contained in the target sentence, and judging whether a word matched with the ith text exists in a preset dictionary, wherein the initial value of i is 1, and i is a positive integer; if the words matched with the ith section of characters do not exist in the preset dictionary, adjusting the word number of the ith section of characters, and judging whether the words matched with the ith section of characters exist in the preset dictionary; if the words matched with the ith word segment exist in the preset dictionary, determining the ith word segment as the ith word segment of the target sentence, enabling i=i+1, acquiring the ith word segment contained in the target sentence, and judging whether the words matched with the ith word segment exist in the preset dictionary; when the total word number of i words is the same as the total word number of the target sentence, determining that the target sentence consists of the i words; sequentially inputting the i words into a pre-trained language model according to the sequence in the target sentence, and calculating the confusion degree and/or log likelihood probability of the target sentence through the language model; when the confusion degree of the target sentence is larger than the preset confusion degree and/or the log likelihood probability of the target sentence is smaller than the preset log likelihood probability, judging that the target sentence is a wrong sentence, thereby realizing the purpose of identifying whether the sentence is the wrong sentence.
Drawings
FIG. 1 is a flowchart illustrating a method for detecting a sentence in accordance with an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating an internal structure of a sentence detecting apparatus according to an embodiment of the present invention;
fig. 3 is a schematic block diagram of a sentence detection program in a sentence detection apparatus according to an embodiment of the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The invention provides a method for detecting a mispronounced sentence. Referring to fig. 1, a flow chart of a method for detecting a sentence in error according to an embodiment of the invention is shown. The method may be performed by an apparatus, which may be implemented in software and/or hardware.
In this embodiment, the method for detecting a sentence error includes:
step S10, obtaining a target sentence obtained through an automatic voice recognition technology.
In this embodiment, the target sentence obtained by the automatic speech recognition (automatic speech recognition, ASR) technology may be one sentence or a plurality of sentences, and each sentence may be a long sentence or a short sentence. In other embodiments, the target statement may also be a statement entered through other pathways.
Step S20, acquiring an ith text contained in the target sentence, and judging whether a word matched with the ith text exists in a preset dictionary, wherein the initial value of i is 1, and i is a positive integer.
When the ith text included in the target sentence is acquired, the ith text may be acquired sequentially in the left-to-right order (i.e. front-to-back), or may be acquired sequentially in the right-to-left order (i.e. back-to-front).
The number of words in each piece of text acquired may be the same or different.
Preferably, the number of words of the i-th text is the same as the number of words of the longest text in the preset dictionary.
And step S30, if no word matched with the ith section of characters exists in the preset dictionary, adjusting the number of characters of the ith section of characters, and judging whether the word matched with the ith section of characters exists in the preset dictionary.
Step S40, if there is a word matching with the i-th word in the preset dictionary, determining that the i-th word is the i-th word of the target sentence, letting i=i+1, obtaining the i-th word contained in the target sentence, and judging whether there is a word matching with the i-th word in the preset dictionary.
And S50, when the total word number of the i words is the same as the total word number of the target sentence, determining that the target sentence consists of the i words.
For example, for a sentence "i love autumn". If the word number of the longest word in the dictionary is 3, acquiring the first 3 words of 'I love' in 'I love autumn' as a first section, and matching the first section of 'I love' with the words in the dictionary; if the matching is unsuccessful, reducing 'I love' by one word to obtain 'I love', and matching 'I happiness' with words in a dictionary, if the matching is unsuccessful, directly determining 'I' as a single word; then, selecting words in the dictionary for matching in 'favorite autumn', if the matching is unsuccessful, reducing one word for matching in 'favorite autumn', namely, matching the words in the dictionary in 'favorite' if the matching is successful, determining that the words are favorite as one word, and similarly, matching the words in the dictionary in 'autumn', if the matching is successful, determining that the words in 'autumn' are one word. Then, after word segmentation processing is carried out on the sentence, the sentence is obtained to be respectively composed of three words of I'm, favorite and autumn.
Through the steps, the words contained in the target sentence can be rapidly identified, and rapid sentence misplacement detection is facilitated.
Step S60, the i words are sequentially input into a pre-trained language model according to the sequence in the target sentence, and the confusion degree and/or the log likelihood probability of the target sentence are calculated through the language model.
The language model may be a deep learning language model, e.g. a feed forward neural network (feedforward neutral network), a recurrent neural network (recurrent neutral network, RNN), or a statistical based language model, e.g. an N-gram (N-gram).
In this embodiment, a training set (the training set is a positive sample) may be formed by obtaining a correct word sequence (i.e., a correct sentence), then, the selected language model is trained by the sentences in the training set to obtain parameters of the language model, and the trained language model may identify the probability of occurrence of a word sequence.
From a statistical point of view, a sentence s in natural language may be composed of any word string (composed of several words in a certain word order), but the probability of occurrence P(s) of the sentence is small. For example, assume that there are the following sentences s1 and s2:
s1: i just eat dinner
s2: just me has eaten dinner
Obviously, for Chinese, s1 is a correct sentence, and s2 is an erroneous sentence, so for Chinese, the probability of occurrence of sentence s1 is greater than that of sentence s2, i.e. P (s 1) > P (s 2).
If there is one sentence s composed of m words, the probability P (W1, W2, …, wm) of the sentence s is:
P(W1,W2,…,Wm)=P(W1)P(W1|W2)P(W3|W1,W2)…P(Wm|W1,W2,…Wm-1)
P (W1), P (w1|w2), P (w3|w1, W2), etc. in the above formula can be calculated by using a pre-trained language model, and the probability P (W1, W2, …, wm) of the sentence s can be calculated after the values are calculated. Since the probability is a decimal value in the range of [0,1], a very small decimal value can be obtained by multiplying a plurality of probability values, and errors can be generated, we calculate the logarithmic value to obtain log likelihood probability logprob:
logprob=log(P(W1,W2,…,Wm))
logprob symbolizes the likelihood of a sentence appearing. The larger the logprob, the more likely it is that the sentence will appear, and the smaller the logprob, the less likely it will appear.
Another parameter that symbolizes the size of the probability of occurrence of a sentence is ppl (confusion), which is defined as the geometric average of the reciprocal of the probability of occurrence of a sentence, defined as:
ppl=10^(-logprob/(-logprob-OOVs+1))
OOVs in the above formula are the number of out-of-domain words in a sentence (out-of-domain words refer to words outside the dictionary). And, if ppl is smaller, the likelihood of occurrence of the sentence is large, and if ppl is larger, the likelihood of occurrence of the sentence is smaller.
Further, in another embodiment of the present invention, the sequentially inputting the i words into the pre-trained language model according to the order in the target sentence includes:
Judging whether preset keywords exist in the i words or not;
if the i words have preset keywords, words, except the preset keywords, in the i words are sequentially input into a pre-trained language model according to the sequence in the target sentence.
In this embodiment, the preset keyword may be a stop word, for example: "feed", "you good", "thank you", "you" and the like.
The preset keyword may also be an exclamation word, for example: "o", "ya", "wa" and other words.
The preset keywords can be other types of words which do not affect the semantics of the sentences, and can be preset according to actual requirements.
In the embodiment, the preset keywords are removed, so that the accuracy of the detection result is not affected, the obtained target sentences can be simplified, and the detection speed is improved.
And step S70, judging that the target sentence is a wrong sentence when the confusion degree of the target sentence is larger than the preset confusion degree and/or the log likelihood probability of the target sentence is smaller than the preset log likelihood probability.
In this embodiment, the erroneous sentence is a wrong sentence, including a word string of a sick sentence or a non-sentence.
The preset log likelihood probability may be a value that is preset to distinguish whether the log likelihood probability of the target sentence is high or low, and when the log likelihood probability of the target sentence is higher than the preset log likelihood probability, the log likelihood probability of the target sentence is high, and the target sentence is judged to be a correct sentence; when the log-likelihood probability of the target sentence is lower than the preset log-likelihood probability, the log-likelihood probability of the target sentence is low, and the target sentence is judged to be an erroneous sentence.
Similarly, the preset confusion degree may be a preset value for distinguishing whether the confusion degree of the sentence to be detected is high or low, when the confusion degree of the target sentence is higher than the preset confusion degree, the confusion degree of the target sentence is high, the target sentence is judged to be an erroneous sentence, and when the confusion degree of the target sentence is lower than the preset confusion degree, the confusion degree of the sentence to be detected is low, and the target sentence is judged to be a correct sentence.
In a possible embodiment, when the confusion degree of the target sentence is greater than a preset confusion degree and/or the log likelihood probability of the target sentence is less than a preset log likelihood probability, before determining that the target sentence is a wrong sentence, the method further includes:
determining the preset confusion degree and/or determining the preset log likelihood probability.
The determining the preset confusion degree and/or the determining the preset log likelihood probability specifically comprises:
obtaining a training sample for training the language model, wherein the training sample comprises a positive sample and a negative sample;
obtaining the confusion degree of the positive sample and the log likelihood probability of the positive sample; and
obtaining the confusion degree of the negative sample and the log likelihood probability of the negative sample;
obtaining a confusion degree histogram according to the confusion degree of the positive sample and the confusion degree of the negative sample, and obtaining the preset confusion degree through the confusion degree histogram; and
and acquiring a log-likelihood probability histogram according to the log-likelihood probability of the positive sample and the log-likelihood probability of the negative sample, and acquiring the preset log-likelihood probability through the log-likelihood probability histogram.
In this embodiment, the preset confusion degree is determined by a confusion degree histogram of a training sample, and the preset log likelihood probability is determined by a log likelihood probability histogram of the training sample.
Wherein the confusion histogram of the training sample includes a correct sentence and an incorrect sentence confusion histogram calculated from the training sample (i.e., a training set composed of sentences recognized by the ASR), which reflect the confusion distribution of the correct sentence and the confusion distribution of the incorrect sentence.
In the present embodiment, the preset confusion may be determined from the confusion distribution of the correct sentence, and the confusion distribution of the wrong sentence. Then, whether the confusion degree of the target sentence is within the confusion degree range of the correct sentence or the confusion degree range of the error sentence is determined, so that whether the target sentence is the correct sentence or the error sentence is judged.
Specifically, in the training sample, the confusion of the correct sentence and the confusion of the wrong sentence may be calculated. And then determining the confusion distribution of the correct sentences and the confusion distribution of the wrong sentences, thereby determining a confusion threshold for distinguishing the correct sentences from the wrong sentences, namely, the preset confusion.
For example, the confusion (ppl) distribution of correct sentences is:
ppl interval Number of pieces Percentage of
[2000,+∞] 5 0.125%
[1500,2000] 1 0.025%
[1000,1500] 5 0.125%
[750,1000] 11 0.276%
[500,750] 28 0.702%
[250,500] 217 5.441%
[100,250] 1241 31.18%
[0,100] 2480 62.187%
The confusion (ppl) distribution of erroneous sentences is:
ppl interval Number of pieces Percentage of
[2000,+∞] 86 5.923%
[1500,2000] 43 2.961%
[1000,1500] 111 7.645%
[750,1000] 107 7.369%
[500,750] 204 14.05%
[250,500] 501 34.504%
[100,250] 345 23.76%
[0,100] 55 3.788%
From the above histograms, it can be seen that for the correct sentence, the duty cycle of the correct sentence is 93.367% when ppl is less than 250.
For erroneous sentences, the duty cycle of erroneous sentences is 72.425% when ppl is greater than 250.
Therefore, when ppl is 250, a sentence can be distinguished well, and the preset confusion degree (preset ppl) is determined to be 250. In this embodiment, when the ppl of the target sentence is greater than the preset ppl, the target sentence is determined to be an error sentence.
Likewise, the acquisition method of the preset log likelihood probability (logprob) is similar to the acquisition method of the preset confusion. The logprob histogram of the training sample includes a correct sentence and a wrong sentence logprob histogram calculated from the training sample. They reflect the log-likelihood probability distribution of the correct sentence and the log-likelihood probability distribution of the incorrect sentence.
For example, the log likelihood probability (logprob) distribution of the correct sentence is:
logprob interval Number of pieces Percentage of
[-∞,-4.0] 1 0.0251%
[-4.0,-3.5] 0 0
[-3.5,-3.0] 14 0.351%
[-3.0,-2.5] 122 3.0591%
[-2.5,-2.0] 1371 34.378%
[-2.0,-1.5] 1740 43.631%
[-1.5,-1.0] 673 16.876%
[-1.0,0] 67 1.68%
The log likelihood probability (logprob) distribution of the erroneous sentence is:
logprob interval Number of pieces Percentage of
[-∞,-4.0] 8 0.551%
[-4.0,-3.5] 31 2.135%
[-3.5,-3.0] 200 13.774%
[-3.0,-2.5] 656 45.179%
[-2.5,-2.0] 502 34.573%
[-2.0,-1.5] 52 3.581%
[-1.5,-1.0] 3 0.207%
[-1.0,0] 0 0
From the above histograms, it can be seen that for correct sentences, the duty cycle of correct sentences is 96.566% when logprob is greater than-2.5, and for incorrect sentences, the duty cycle of incorrect sentences is 61.639% when logprob is less than-2.5.
Therefore, when the logprob is-2.5, a sentence can be better distinguished, and the preset logprob is determined to be-2.5. In this embodiment, when the logprob of the target is smaller than the logprob threshold, the target sentence is determined to be an erroneous sentence.
According to the method for detecting the mispronounced sentence, which is provided by the embodiment, a target sentence obtained through an automatic voice recognition technology is obtained; acquiring an ith text contained in the target sentence, and judging whether a word matched with the ith text exists in a preset dictionary, wherein the initial value of i is 1, and i is a positive integer; if the words matched with the ith section of characters do not exist in the preset dictionary, adjusting the word number of the ith section of characters, and judging whether the words matched with the ith section of characters exist in the preset dictionary; if the words matched with the ith word segment exist in the preset dictionary, determining the ith word segment as the ith word segment of the target sentence, enabling i=i+1, acquiring the ith word segment contained in the target sentence, and judging whether the words matched with the ith word segment exist in the preset dictionary; when the total word number of i words is the same as the total word number of the target sentence, determining that the target sentence consists of the i words; sequentially inputting the i words into a pre-trained language model according to the sequence in the target sentence, and calculating the confusion degree and/or log likelihood probability of the target sentence through the language model; when the confusion degree of the target sentence is larger than the preset confusion degree and/or the log likelihood probability of the target sentence is smaller than the preset log likelihood probability, judging that the target sentence is a wrong sentence, thereby realizing the purpose of identifying whether the sentence is the wrong sentence.
Further, in another embodiment of the method of the present invention, the method further comprises the steps of:
and if the target sentence is a wrong sentence, sending a wrong sentence reminding message.
When the target sentence is judged to be a wrong sentence, a prompt can be sent to a user. For example, when the sentence converted by voice is to be further processed by natural language, if the sentence is judged to be wrong, the user can be reminded that the target sentence is a wrong sentence in a manner of popup window error reporting on the display device.
In this embodiment, by sending the alert, the user can quickly learn whether and which error sentences exist, and further perform the subsequent operation.
The invention also provides a device for detecting the mispronounced sentence. Referring to fig. 2, an internal structure diagram of a sentence detecting device according to an embodiment of the invention is shown.
In this embodiment, the sentence-error detecting apparatus 1 may be a PC (Personal Computer ), or may be a terminal device such as a smart phone, a tablet computer, or a portable computer. The sentence detection device 1 comprises at least a memory 11, a processor 12, a communication bus 13, and a network interface 14.
The memory 11 includes at least one type of readable storage medium including flash memory, a hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the sentence detection device 1, for example a hard disk of the sentence detection device 1. The memory 11 may also be an external storage device of the sentence detection apparatus 1 in other embodiments, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the sentence detection apparatus 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the sentence detection apparatus 1. The memory 11 may be used not only for storing application software installed in the sentence detection apparatus 1 and various types of data, for example, codes of the sentence detection program 01, but also for temporarily storing data that has been output or is to be output.
The processor 12 may in some embodiments be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor or other data processing chip for executing program code or processing data stored in the memory 11, such as executing the sentence detection program 01, etc.
The communication bus 13 is used to enable connection communication between these components.
The network interface 14 may optionally comprise a standard wired interface, a wireless interface (e.g. WI-FI interface), typically used to establish a communication connection between the apparatus 1 and other electronic devices.
Optionally, the device 1 may further comprise a user interface, which may comprise a Display (Display), an input unit such as a Keyboard (Keyboard), and a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an Organic Light-Emitting Diode (OLED) touch, or the like. The display may also be referred to as a display screen or a display unit, as appropriate, for displaying information processed in the sentence detection device 1 and for displaying a visual user interface.
Fig. 2 shows only the sentence detection apparatus 1 having the components 11-14 and the sentence detection program 01, it will be understood by those skilled in the art that the structure shown in fig. 2 does not constitute a limitation of the sentence detection apparatus 1, and may include fewer or more components than shown, or may combine certain components, or may be a different arrangement of components.
In the embodiment of the apparatus 1 shown in fig. 2, the memory 11 stores a sentence detection program 01; the processor 12 performs the following steps when executing the sentence detection program 01 stored in the memory 11:
and acquiring a target sentence obtained by an automatic voice recognition technology.
In this embodiment, the target sentence obtained by the automatic speech recognition (automatic speech recognition, ASR) technology may be one sentence or a plurality of sentences, and each sentence may be a long sentence or a short sentence. In other embodiments, the target statement may also be a statement entered through other pathways.
And acquiring an ith text contained in the target sentence, and judging whether a word matched with the ith text exists in a preset dictionary, wherein the initial value of i is 1, and i is a positive integer.
When the ith text included in the target sentence is acquired, the ith text may be acquired sequentially in the left-to-right order (i.e. front-to-back), or may be acquired sequentially in the right-to-left order (i.e. back-to-front).
The number of words in each piece of text acquired may be the same or different.
Preferably, the number of words of the i-th text is the same as the number of words of the longest text in the preset dictionary.
If the words matched with the ith section of characters do not exist in the preset dictionary, the word number of the ith section of characters is adjusted, and whether the words matched with the ith section of characters exist in the preset dictionary is judged.
If the words matched with the ith word segment exist in the preset dictionary, determining the ith word segment as the ith word segment of the target sentence, enabling i=i+1, acquiring the ith word segment contained in the target sentence, and judging whether the words matched with the ith word segment exist in the preset dictionary.
And when the total word number of the i words is the same as the total word number of the target sentence, determining that the target sentence consists of the i words.
For example, for a sentence "i love autumn". If the word number of the longest word in the dictionary is 3, acquiring the first 3 words of 'I love' in 'I love autumn' as a first section, and matching the first section of 'I love' with the words in the dictionary; if the matching is unsuccessful, reducing 'I love' by one word to obtain 'I love', and matching 'I happiness' with words in a dictionary, if the matching is unsuccessful, directly determining 'I' as a single word; then, selecting words in the dictionary for matching in 'favorite autumn', if the matching is unsuccessful, reducing one word for matching in 'favorite autumn', namely, matching the words in the dictionary in 'favorite' if the matching is successful, determining that the words are favorite as one word, and similarly, matching the words in the dictionary in 'autumn', if the matching is successful, determining that the words in 'autumn' are one word. Then, after word segmentation processing is carried out on the sentence, the sentence is obtained to be respectively composed of three words of I'm, favorite and autumn.
Through the steps, the words contained in the target sentence can be rapidly identified, and rapid sentence misplacement detection is facilitated.
And sequentially inputting the i words into a pre-trained language model according to the sequence in the target sentence, and calculating the confusion degree and/or the log likelihood probability of the target sentence through the language model.
The language model may be a deep learning language model, e.g. a feed forward neural network (feedforward neutral network), a recurrent neural network (recurrent neutral network, RNN), or a statistical based language model, e.g. an N-gram (N-gram).
In this embodiment, a training set (the training set is a positive sample) may be formed by obtaining a correct word sequence (i.e., a correct sentence), then, the selected language model is trained by the sentences in the training set to obtain parameters of the language model, and the trained language model may identify the probability of occurrence of a word sequence.
From a statistical point of view, a sentence s in natural language may be composed of any word string (composed of several words in a certain word order), but the probability of occurrence P(s) of the sentence is small. For example, assume that there are the following sentences s1 and s2:
s1: i just eat dinner
s2: just me has eaten dinner
Obviously, for Chinese, s1 is a correct sentence, and s2 is an erroneous sentence, so for Chinese, the probability of occurrence of sentence s1 is greater than that of sentence s2, i.e. P (s 1) > P (s 2).
If there is one sentence s composed of m words, the probability P (W1, W2, …, wm) of the sentence s is:
P(W1,W2,…,Wm)=P(W1)P(W1|W2)P(W3|W1,W2)…P(Wm|W1,W2,…Wm-1)
p (W1), P (w1|w2), P (w3|w1, W2), etc. in the above formula can be calculated by using a pre-trained language model, and the probability P (W1, W2, …, wm) of the sentence s can be calculated after the values are calculated. Since the probability is a decimal value in the range of [0,1], a very small decimal value can be obtained by multiplying a plurality of probability values, and errors can be generated, we calculate the logarithmic value to obtain log likelihood probability logprob:
logprob=log(P(W1,W2,…,Wm))
logprob symbolizes the size of the likelihood of occurrence of a sentence. The larger the logprob, the more likely it is that the sentence will appear, and the smaller the logprob, the less likely it will appear.
Another parameter that symbolizes the size of the probability of occurrence of a sentence is ppl (confusion), which is defined as the geometric average of the reciprocal of the probability of occurrence of a sentence, defined as:
ppl=10^(-logprob/(-logprob-OOVs+1))
OOVs in the above formula are the number of out-of-domain words in a sentence (out-of-domain words refer to words outside the dictionary). And, if ppl is smaller, the likelihood of occurrence of the sentence is large, and if ppl is larger, the likelihood of occurrence of the sentence is smaller.
Further, in another embodiment of the present invention, the sequentially inputting the i words into the pre-trained language model according to the order in the target sentence includes:
judging whether preset keywords exist in the i words or not;
if the i words have preset keywords, words, except the preset keywords, in the i words are sequentially input into a pre-trained language model according to the sequence in the target sentence.
In this embodiment, the preset keyword may be a stop word, for example: "feed", "you good", "thank you", "you" and the like.
The preset keyword may also be an exclamation word, for example: "o", "ya", "wa" and other words.
The preset keywords can be other types of words which do not affect the semantics of the sentences, and can be preset according to actual requirements.
In the embodiment, the preset keywords are removed, so that the accuracy of the detection result is not affected, the obtained target sentences can be simplified, and the detection speed is improved.
And judging the target sentence as a wrong sentence when the confusion degree of the target sentence is larger than a preset confusion degree and/or the log likelihood probability of the target sentence is smaller than a preset log likelihood probability.
In this embodiment, the erroneous sentence is a wrong sentence, including a word string of a sick sentence or a non-sentence.
The preset log likelihood probability may be a value that is preset to distinguish whether the log likelihood probability of the target sentence is high or low, and when the log likelihood probability of the target sentence is higher than the preset log likelihood probability, the log likelihood probability of the target sentence is high, and the target sentence is judged to be a correct sentence; when the log-likelihood probability of the target sentence is lower than the preset log-likelihood probability, the log-likelihood probability of the target sentence is low, and the target sentence is judged to be an erroneous sentence.
Similarly, the preset confusion degree may be a preset value for distinguishing whether the confusion degree of the sentence to be detected is high or low, when the confusion degree of the target sentence is higher than the preset confusion degree, the confusion degree of the target sentence is high, the target sentence is judged to be an erroneous sentence, and when the confusion degree of the target sentence is lower than the preset confusion degree, the confusion degree of the sentence to be detected is low, and the target sentence is judged to be a correct sentence.
In a possible embodiment, when the confusion degree of the target sentence is greater than a preset confusion degree and/or the log likelihood probability of the target sentence is less than a preset log likelihood probability, before the target sentence is judged to be a wrong sentence, the following steps are further implemented:
determining the preset confusion degree and/or determining the preset log likelihood probability.
The determining the preset confusion degree and/or the determining the preset log likelihood probability specifically comprises:
obtaining a training sample for training the language model, wherein the training sample comprises a positive sample and a negative sample;
obtaining the confusion degree of the positive sample and the log likelihood probability of the positive sample; and
obtaining the confusion degree of the negative sample and the log likelihood probability of the negative sample;
obtaining a confusion degree histogram according to the confusion degree of the positive sample and the confusion degree of the negative sample, and obtaining the preset confusion degree through the confusion degree histogram; and
and acquiring a log-likelihood probability histogram according to the log-likelihood probability of the positive sample and the log-likelihood probability of the negative sample, and acquiring the preset log-likelihood probability through the log-likelihood probability histogram.
In this embodiment, the preset confusion degree is determined by a confusion degree histogram of a training sample, and the preset log likelihood probability is determined by a log likelihood probability histogram of the training sample.
Wherein the confusion histogram of the training sample includes a correct sentence and an incorrect sentence confusion histogram calculated from the training sample (i.e., a training set composed of sentences recognized by the ASR), which reflect the confusion distribution of the correct sentence and the confusion distribution of the incorrect sentence.
In the present embodiment, the preset confusion may be determined from the confusion distribution of the correct sentence, and the confusion distribution of the wrong sentence. Then, whether the confusion degree of the target sentence is within the confusion degree range of the correct sentence or the confusion degree range of the error sentence is determined, so that whether the target sentence is the correct sentence or the error sentence is judged.
Specifically, in the training sample, the confusion of the correct sentence and the confusion of the wrong sentence may be calculated. And then determining the confusion distribution of the correct sentences and the confusion distribution of the wrong sentences, thereby determining a confusion threshold for distinguishing the correct sentences from the wrong sentences, namely, the preset confusion.
For example, the confusion (ppl) distribution of correct sentences is:
ppl interval Number of pieces Percentage of
[2000,+∞] 5 0.125%
[1500,2000] 1 0.025%
[1000,1500] 5 0.125%
[750,1000] 11 0.276%
[500,750] 28 0.702%
[250,500] 217 5.441%
[100,250] 1241 31.18%
[0,100] 2480 62.187%
The confusion (ppl) distribution of erroneous sentences is:
ppl interval Number of pieces Percentage of
[2000,+∞] 86 5.923%
[1500,2000] 43 2.961%
[1000,1500] 111 7.645%
[750,1000] 107 7.369%
[500,750] 204 14.05%
[250,500] 501 34.504%
[100,250] 345 23.76%
[0,100] 55 3.788%
As can be seen from the above histograms, for the correct sentence, when ppl is less than 250, the duty cycle of the correct sentence is 93.367%
For erroneous sentences, the duty cycle of erroneous sentences is 72.425% when ppl is greater than 250.
Therefore, when ppl is 250, a sentence can be distinguished well, and the preset confusion degree (preset ppl) is determined to be 250. In this embodiment, when the ppl of the target sentence is greater than the preset ppl, the target sentence is determined to be an error sentence.
Likewise, the acquisition method of the preset log likelihood probability (logprob) is similar to the acquisition method of the preset confusion. The logprob histogram of the training sample includes a correct sentence and a wrong sentence logprob histogram calculated from the training sample. They reflect the log-likelihood probability distribution of the correct sentence and the log-likelihood probability distribution of the incorrect sentence.
For example, the log likelihood probability (logprob) distribution of the correct sentence is:
logprob interval Number of pieces Percentage of
[-∞,-4.0] 1 0.0251%
[-4.0,-3.5] 0 0
[-3.5,-3.0] 14 0.351%
[-3.0,-2.5] 122 3.0591%
[-2.5,-2.0] 1371 34.378%
[-2.0,-1.5] 1740 43.631%
[-1.5,-1.0] 673 16.876%
[-1.0,0] 67 1.68%
The log likelihood probability (logprob) distribution of the erroneous sentence is:
logprob interval Number of pieces Percentage of
[-∞,-4.0] 8 0.551%
[-4.0,-3.5] 31 2.135%
[-3.5,-3.0] 200 13.774%
[-3.0,-2.5] 656 45.179%
[-2.5,-2.0] 502 34.573%
[-2.0,-1.5] 52 3.581%
[-1.5,-1.0] 3 0.207%
[-1.0,0] 0 0
From the above histograms, it can be seen that for correct sentences, the duty cycle of correct sentences is 96.566% when logprob is greater than-2.5, and for incorrect sentences, the duty cycle of incorrect sentences is 61.639% when logprob is less than-2.5.
Therefore, when the logprob is-2.5, a sentence can be better distinguished, and the preset logprob is determined to be-2.5. In this embodiment, when the logprob of the target is smaller than the logprob threshold, the target sentence is determined to be an erroneous sentence.
The erroneous sentence detection device provided by the embodiment acquires a target sentence obtained by an automatic voice recognition technology; acquiring an ith text contained in the target sentence, and judging whether a word matched with the ith text exists in a preset dictionary, wherein the initial value of i is 1, and i is a positive integer; if the words matched with the ith section of characters do not exist in the preset dictionary, adjusting the word number of the ith section of characters, and judging whether the words matched with the ith section of characters exist in the preset dictionary; if the words matched with the ith word segment exist in the preset dictionary, determining the ith word segment as the ith word segment of the target sentence, enabling i=i+1, acquiring the ith word segment contained in the target sentence, and judging whether the words matched with the ith word segment exist in the preset dictionary; when the total word number of i words is the same as the total word number of the target sentence, determining that the target sentence consists of the i words; sequentially inputting the i words into a pre-trained language model according to the sequence in the target sentence, and calculating the confusion degree and/or log likelihood probability of the target sentence through the language model; when the confusion degree of the target sentence is larger than the preset confusion degree and/or the log likelihood probability of the target sentence is smaller than the preset log likelihood probability, judging that the target sentence is a wrong sentence, thereby realizing the purpose of identifying whether the sentence is the wrong sentence.
Further, in another embodiment of the apparatus of the present invention, the sentence detection program may be further invoked by the processor to implement the following steps:
and if the target sentence is a wrong sentence, sending a wrong sentence reminding message.
When the target sentence is judged to be a wrong sentence, a prompt can be sent to a user. For example, when the sentence converted by voice is to be further processed by natural language, if the sentence is judged to be wrong, the user can be reminded that the target sentence is a wrong sentence in a manner of popup window error reporting on the display device.
In this embodiment, by sending the alert, the user can quickly learn whether and which error sentences exist, and further perform the subsequent operation.
Alternatively, in other embodiments, the sentence detection program may be further divided into one or more modules, where one or more modules are stored in the memory 11 and executed by one or more processors (the processor 12 in this embodiment) to implement the present invention, and the modules referred to herein are a series of instruction segments of a computer program capable of performing a specific function for describing the execution of the sentence detection program in the sentence detection device.
For example, referring to fig. 3, a schematic program module of a sentence detection program in an embodiment of the sentence detection apparatus according to the present invention is shown, where the sentence detection program may be divided into a first obtaining module 10, a second obtaining module 20, an adjusting module 30, a first judging module 40, a determining module 50, a calculating module 60 and a second judging module 70, by way of example:
The first acquisition module 10 is configured to: acquiring a target sentence obtained by an automatic voice recognition technology;
the second acquisition module 20 is configured to: acquiring an ith text contained in the target sentence, and judging whether a word matched with the ith text exists in a preset dictionary, wherein the initial value of i is 1, and i is a positive integer;
the adjustment module 30 is configured to: if the words matched with the ith section of characters do not exist in the preset dictionary, adjusting the word number of the ith section of characters, and judging whether the words matched with the ith section of characters exist in the preset dictionary;
the first judging module 40 is configured to: if the words matched with the ith word segment exist in the preset dictionary, determining the ith word segment as the ith word segment of the target sentence, enabling i=i+1, acquiring the ith word segment contained in the target sentence, and judging whether the words matched with the ith word segment exist in the preset dictionary;
the determining module 50 is configured to: and when the total word number of the i words is the same as the total word number of the target sentence, determining that the target sentence consists of the i words.
The calculation module 60 is configured to: and sequentially inputting the i words into a pre-trained language model according to the sequence in the target sentence, and calculating the confusion degree and/or the log likelihood probability of the target sentence through the language model.
The second judging module 70 is configured to: and judging the target sentence as a wrong sentence when the confusion degree of the target sentence is larger than a preset confusion degree and/or the log likelihood probability of the target sentence is smaller than a preset log likelihood probability.
The functions or operation steps implemented when the program modules, such as the first acquiring module 10, the second acquiring module 20, the adjusting module 30, the first judging module 40, the determining module 50, the calculating module 60, and the second judging module 70, are substantially the same as those of the foregoing embodiments, and are not repeated herein.
In addition, an embodiment of the present invention further proposes a computer-readable storage medium having stored thereon a sentence detection program executable by one or more processors to implement the following operations:
acquiring a target sentence obtained by an automatic voice recognition technology;
acquiring an ith text contained in the target sentence, and judging whether a word matched with the ith text exists in a preset dictionary, wherein the initial value of i is 1, and i is a positive integer;
if the words matched with the ith section of characters do not exist in the preset dictionary, adjusting the word number of the ith section of characters, and judging whether the words matched with the ith section of characters exist in the preset dictionary;
If the words matched with the ith word segment exist in the preset dictionary, determining the ith word segment as the ith word segment of the target sentence, enabling i=i+1, acquiring the ith word segment contained in the target sentence, and judging whether the words matched with the ith word segment exist in the preset dictionary;
when the total word number of i words is the same as the total word number of the target sentence, determining that the target sentence consists of the i words;
sequentially inputting the i words into a pre-trained language model according to the sequence in the target sentence, and calculating the confusion degree and/or log likelihood probability of the target sentence through the language model;
and judging the target sentence as a wrong sentence when the confusion degree of the target sentence is larger than a preset confusion degree and/or the log likelihood probability of the target sentence is smaller than a preset log likelihood probability.
The computer-readable storage medium of the present invention is substantially the same as the embodiments of the apparatus and method for detecting a sentence in the above-described manner, and will not be described in detail herein.
It should be noted that, the foregoing reference numerals of the embodiments of the present invention are merely for describing the embodiments, and do not represent the advantages and disadvantages of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article or method that comprises the element.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (10)

1. A method for detecting a sentence, the method comprising:
acquiring a target sentence obtained by an automatic voice recognition technology;
Acquiring an ith text contained in the target sentence, and judging whether a word matched with the ith text exists in a preset dictionary, wherein the initial value of i is 1, and i is a positive integer;
if the words matched with the ith section of characters do not exist in the preset dictionary, adjusting the word number of the ith section of characters, and judging whether the words matched with the ith section of characters exist in the preset dictionary;
if the words matched with the ith word segment exist in the preset dictionary, determining the ith word segment as the ith word segment of the target sentence, enabling i=i+1, acquiring the ith word segment contained in the target sentence, and judging whether the words matched with the ith word segment exist in the preset dictionary;
when the total word number of i words is the same as the total word number of the target sentence, determining that the target sentence consists of the i words;
sequentially inputting the i words into a pre-trained language model according to the sequence in the target sentence, and calculating the confusion degree and/or log likelihood probability of the target sentence through the language model;
and judging the target sentence as a wrong sentence when the confusion degree of the target sentence is larger than a preset confusion degree and/or the log likelihood probability of the target sentence is smaller than a preset log likelihood probability.
2. The method of claim 1, wherein sequentially inputting the i words into a pre-trained language model according to the order in the target sentence comprises:
judging whether preset keywords exist in the i words or not;
if the i words have preset keywords, words, except the preset keywords, in the i words are sequentially input into a pre-trained language model according to the sequence in the target sentence.
3. The method for detecting a sentence in error according to claim 1 or 2, wherein when the confusion degree of the target sentence is greater than a preset confusion degree and/or the log likelihood probability of the target sentence is less than a preset log likelihood probability, before determining that the target sentence is a sentence in error, further comprising:
determining the preset confusion degree and/or the preset log likelihood probability;
the determining the preset confusion degree and/or the determining the preset log likelihood probability specifically comprises:
obtaining a training sample for training the language model, wherein the training sample comprises a positive sample and a negative sample;
obtaining the confusion degree of the positive sample and the log likelihood probability of the positive sample; and
Obtaining the confusion degree of the negative sample and the log likelihood probability of the negative sample;
obtaining a confusion degree histogram according to the confusion degree of the positive sample and the confusion degree of the negative sample, and obtaining the preset confusion degree through the confusion degree histogram; and
and acquiring a log-likelihood probability histogram according to the log-likelihood probability of the positive sample and the log-likelihood probability of the negative sample, and acquiring the preset log-likelihood probability through the log-likelihood probability histogram.
4. The method of claim 2, wherein the language model is a deep learning language model or a statistical-based language model.
5. The sentence detection method according to claim 1 or 2, characterized in that the method further comprises:
and if the target sentence is a wrong sentence, sending a wrong sentence reminding message.
6. An apparatus for detecting a sentence in a sentence, the apparatus comprising a memory and a processor, the memory having stored thereon a sentence detection program operable on the processor, the sentence detection program, when executed by the processor, performing the steps of:
acquiring a target sentence obtained by an automatic voice recognition technology;
Acquiring an ith text contained in the target sentence, and judging whether a word matched with the ith text exists in a preset dictionary, wherein the initial value of i is 1, and i is a positive integer;
if the words matched with the ith section of characters do not exist in the preset dictionary, adjusting the word number of the ith section of characters, and judging whether the words matched with the ith section of characters exist in the preset dictionary;
if the words matched with the ith word segment exist in the preset dictionary, determining the ith word segment as the ith word segment of the target sentence, enabling i=i+1, acquiring the ith word segment contained in the target sentence, and judging whether the words matched with the ith word segment exist in the preset dictionary;
when the total word number of i words is the same as the total word number of the target sentence, determining that the target sentence consists of the i words;
sequentially inputting the i words into a pre-trained language model according to the sequence in the target sentence, and calculating the confusion degree and/or log likelihood probability of the target sentence through the language model;
and judging the target sentence as a wrong sentence when the confusion degree of the target sentence is larger than a preset confusion degree and/or the log likelihood probability of the target sentence is smaller than a preset log likelihood probability.
7. The erroneous sentence detection apparatus of claim 6, wherein sequentially inputting the i words to a pre-trained language model in accordance with the order in the target sentence comprises:
judging whether preset keywords exist in the i words or not;
if the i words have preset keywords, words, except the preset keywords, in the i words are sequentially input into a pre-trained language model according to the sequence in the target sentence.
8. The sentence detection apparatus according to claim 6 or 7, wherein the sentence detection program is executed by the processor, further implementing the steps of:
when the confusion degree of the target sentence is larger than a preset confusion degree and/or the log likelihood probability of the target sentence is smaller than a preset log likelihood probability, determining the preset confusion degree and/or determining the preset log likelihood probability before judging that the target sentence is a wrong sentence;
the determining the preset confusion degree and/or the determining the preset log likelihood probability specifically comprises:
obtaining a training sample for training the language model, wherein the training sample comprises a positive sample and a negative sample;
obtaining the confusion degree of the positive sample and the log likelihood probability of the positive sample; and
Obtaining the confusion degree of the negative sample and the log likelihood probability of the negative sample;
obtaining a confusion degree histogram according to the confusion degree of the positive sample and the confusion degree of the negative sample, and obtaining the preset confusion degree through the confusion degree histogram; and
and acquiring a log-likelihood probability histogram according to the log-likelihood probability of the positive sample and the log-likelihood probability of the negative sample, and acquiring the preset log-likelihood probability through the log-likelihood probability histogram.
9. The sentence detection apparatus according to claim 6 or 7, wherein the sentence detection program is executable by the processor, further implementing the steps of:
and if the target sentence is a wrong sentence, sending a wrong sentence reminding message.
10. A computer-readable storage medium having stored thereon a sentence detection program executable by one or more processors to implement the steps of the sentence detection method of any one of claims 1 to 5.
CN201910343889.6A 2019-04-26 2019-04-26 Sentence fault detection method, sentence fault detection device and computer readable storage medium Active CN110211571B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910343889.6A CN110211571B (en) 2019-04-26 2019-04-26 Sentence fault detection method, sentence fault detection device and computer readable storage medium
PCT/CN2019/102191 WO2020215550A1 (en) 2019-04-26 2019-08-23 Wrong sentence detection method and apparatus, and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910343889.6A CN110211571B (en) 2019-04-26 2019-04-26 Sentence fault detection method, sentence fault detection device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110211571A CN110211571A (en) 2019-09-06
CN110211571B true CN110211571B (en) 2023-05-26

Family

ID=67786422

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910343889.6A Active CN110211571B (en) 2019-04-26 2019-04-26 Sentence fault detection method, sentence fault detection device and computer readable storage medium

Country Status (2)

Country Link
CN (1) CN110211571B (en)
WO (1) WO2020215550A1 (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110852087B (en) * 2019-09-23 2022-02-22 腾讯科技(深圳)有限公司 Chinese error correction method and device, storage medium and electronic device
CN110765996B (en) * 2019-10-21 2022-07-29 北京百度网讯科技有限公司 Text information processing method and device
WO2021138898A1 (en) * 2020-01-10 2021-07-15 深圳市欢太科技有限公司 Speech recognition result detection method and apparatus, and storage medium
CN112380855B (en) * 2020-11-20 2024-03-08 北京百度网讯科技有限公司 Method for determining statement smoothness, method and device for determining probability prediction model
CN112863499B (en) * 2021-01-13 2023-01-24 北京小米松果电子有限公司 Speech recognition method and device, storage medium
CN113096667A (en) * 2021-04-19 2021-07-09 上海云绅智能科技有限公司 Wrongly-written character recognition detection method and system
CN115062148B (en) * 2022-06-23 2023-06-20 广东国义信息科技有限公司 Risk control method based on database

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6064959A (en) * 1997-03-28 2000-05-16 Dragon Systems, Inc. Error correction in speech recognition
JP2014089247A (en) * 2012-10-29 2014-05-15 Nippon Telegr & Teleph Corp <Ntt> Identification language model learning device, identification language model learning method, and program
CN105244029A (en) * 2015-08-28 2016-01-13 科大讯飞股份有限公司 Voice recognition post-processing method and system
CN107741928A (en) * 2017-10-13 2018-02-27 四川长虹电器股份有限公司 A kind of method to text error correction after speech recognition based on field identification
CN108255857A (en) * 2016-12-29 2018-07-06 北京国双科技有限公司 A kind of sentence detection method and device
CN108766437A (en) * 2018-05-31 2018-11-06 平安科技(深圳)有限公司 Audio recognition method, device, computer equipment and storage medium
CN108959250A (en) * 2018-06-27 2018-12-07 众安信息技术服务有限公司 A kind of error correction method and its system based on language model and word feature

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001069543A (en) * 1999-08-26 2001-03-16 Nec Corp Radio paging system
CN101295293B (en) * 2007-04-29 2010-06-02 摩托罗拉公司 Automatic error correction method for input character string of ideographic character
US9653071B2 (en) * 2014-02-08 2017-05-16 Honda Motor Co., Ltd. Method and system for the correction-centric detection of critical speech recognition errors in spoken short messages

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6064959A (en) * 1997-03-28 2000-05-16 Dragon Systems, Inc. Error correction in speech recognition
JP2014089247A (en) * 2012-10-29 2014-05-15 Nippon Telegr & Teleph Corp <Ntt> Identification language model learning device, identification language model learning method, and program
CN105244029A (en) * 2015-08-28 2016-01-13 科大讯飞股份有限公司 Voice recognition post-processing method and system
CN108255857A (en) * 2016-12-29 2018-07-06 北京国双科技有限公司 A kind of sentence detection method and device
CN107741928A (en) * 2017-10-13 2018-02-27 四川长虹电器股份有限公司 A kind of method to text error correction after speech recognition based on field identification
CN108766437A (en) * 2018-05-31 2018-11-06 平安科技(深圳)有限公司 Audio recognition method, device, computer equipment and storage medium
CN108959250A (en) * 2018-06-27 2018-12-07 众安信息技术服务有限公司 A kind of error correction method and its system based on language model and word feature

Also Published As

Publication number Publication date
CN110211571A (en) 2019-09-06
WO2020215550A1 (en) 2020-10-29

Similar Documents

Publication Publication Date Title
CN110211571B (en) Sentence fault detection method, sentence fault detection device and computer readable storage medium
CN108292500B (en) Apparatus and method for end-of-sentence detection using grammar consistency
WO2020224213A1 (en) Sentence intent identification method, device, and computer readable storage medium
JP5901001B1 (en) Method and device for acoustic language model training
US11043213B2 (en) System and method for detection and correction of incorrectly pronounced words
CN109284503B (en) Translation statement ending judgment method and system
CN114416943B (en) Training method and device for dialogue model, electronic equipment and storage medium
CN110866095A (en) Text similarity determination method and related equipment
CN113657098B (en) Text error correction method, device, equipment and storage medium
CN112364658A (en) Translation and voice recognition method, device and equipment
CN112434520A (en) Named entity recognition method and device and readable storage medium
CN110826301A (en) Punctuation mark adding method, system, mobile terminal and storage medium
CN114386399A (en) Text error correction method and device
CN112257470A (en) Model training method and device, computer equipment and readable storage medium
CN115858776B (en) Variant text classification recognition method, system, storage medium and electronic equipment
CN111460811A (en) Crowdsourcing task answer verification method and device, computer equipment and storage medium
CN107656627B (en) Information input method and device
CN108304366B (en) Hypernym detection method and device
CN113177406B (en) Text processing method, text processing device, electronic equipment and computer readable medium
CN110413983B (en) Method and device for identifying name
CN108021918B (en) Character recognition method and device
CN110866390B (en) Method and device for recognizing Chinese grammar error, computer equipment and storage medium
CN110929749A (en) Text recognition method, text recognition device, text recognition medium and electronic equipment
CN113254658B (en) Text information processing method, system, medium, and apparatus
CN112185346B (en) Multilingual voice keyword detection and model generation method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant