CN112528629A - Sentence smoothness judging method and system - Google Patents
Sentence smoothness judging method and system Download PDFInfo
- Publication number
- CN112528629A CN112528629A CN201910820551.5A CN201910820551A CN112528629A CN 112528629 A CN112528629 A CN 112528629A CN 201910820551 A CN201910820551 A CN 201910820551A CN 112528629 A CN112528629 A CN 112528629A
- Authority
- CN
- China
- Prior art keywords
- sentence
- text
- probability
- smoothness
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 13
- 238000007781 pre-processing Methods 0.000 claims abstract description 20
- 230000011218 segmentation Effects 0.000 claims abstract description 7
- 238000004364 calculation method Methods 0.000 claims description 10
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
Images
Abstract
The invention discloses a method and a system for judging sentence order, which comprises the following steps: preprocessing each collected text, wherein the preprocessing comprises word segmentation and part-of-speech tagging; according to the preprocessed text, a language model training tool is utilized to carry out word occurrence frequency statistics by using a 3-gram model according to word and part-of-speech division; calculating the text passing degree probability of each text divided under a 3-gram model according to words and parts of speech respectively, taking the influence of different sentence lengths on a result into consideration, dividing the text passing degree probability of each text divided under the words and parts of speech respectively by dividing the sentence length of the corresponding text by the word length of the corresponding text to be used as the sentence passing degree probability of the corresponding text divided under the words and parts of speech respectively, and weighting and summing the sentence passing degree probability of each text divided under the words and the sentence passing degree probability divided under the parts of speech to obtain the final sentence passing degree probability of the corresponding text; and calculating the average value of the final sentence smoothness probabilities of the texts as a threshold value for judging whether the sentences are good or bad.
Description
Technical Field
The invention belongs to the technical field of language models in natural language processing, and particularly relates to a method and a system for judging sentence smoothness.
Background
In the man-machine interaction system, due to the existence of recognition errors or speech with unknown sources in the speech recognition part, the recognized text sentences are cluttered and discordant, such as: "I want to go ahead", "weather really does so", "I do". The occurrence of the above-mentioned non-positive, non-compliant statements may result in the dialog system not being able to correctly resolve its meaning. Recognizing such statements and returning results can severely impact the user experience for a dialogue system without voice wake-up when the user is not interacting with the dialogue system.
Disclosure of Invention
Aiming at the problems and the defects in the prior art, the invention provides a novel method and a novel system for judging the sentence smoothness.
The invention solves the technical problems through the following technical scheme:
the invention provides a sentence smoothness judging method which is characterized by comprising the following steps of:
s1, collecting a plurality of texts as training corpora, and preprocessing each text, wherein the preprocessing comprises word segmentation and part of speech tagging;
s2, according to the preprocessed text, utilizing a language model training tool to perform word occurrence frequency statistics by using a 3-gram model according to word and part-of-speech division respectively, and storing statistical results;
s3, calculating the text passing degree probability of each text divided under a 3-gram model according to words and parts of speech, considering the influence of different sentence lengths on the result, dividing the following text passing degree probability of each text according to words and parts of speech by dividing the sentence length of the corresponding text into the sentence passing degree probability of the corresponding text according to words and parts of speech, and carrying out weighted summation on the sentence passing degree probability of each text divided according to words and the sentence passing degree probability divided according to parts of speech to obtain the final sentence passing degree probability of the corresponding text;
and S4, calculating the average value of the final sentence smoothness probability of the texts to be used as a threshold value for judging whether the sentence is good or bad.
Preferably, S5, according to the calculation method in step S3, calculates the final sentence smoothness probability of the sentence to be determined, compares the final sentence smoothness probability of the sentence to be determined with a threshold to determine whether the sentence to be determined is smooth, determines the sentence smoothness if the final sentence smoothness probability of the sentence to be determined is greater than the threshold, and determines the sentence non-smoothness if the final sentence smoothness probability of the sentence to be determined is less than the threshold.
Preferably, the language model training tool employs srilm.
The invention also provides a sentence smoothness judging system which is characterized by comprising a preprocessing module, a statistical module, a first calculating module and a second calculating module;
the preprocessing module is used for collecting a plurality of texts as training corpora and preprocessing each text, wherein the preprocessing comprises word segmentation and part of speech tagging;
the statistical module is used for performing word occurrence frequency statistics by using a 3-gram model according to word and part of speech division by using a language model training tool according to the preprocessed text, and storing statistical results;
the first calculation module is used for calculating the text popularity probability of each text divided under the 3-gram model according to words and parts of speech respectively, considering the influence of different sentence lengths on the result, dividing the text popularity probability of each text divided under the words and parts of speech respectively by the sentence length of the corresponding text as the sentence popularity probability of the corresponding text divided under the words and parts of speech respectively, and carrying out weighted summation on the sentence popularity probability of each text divided under the words and the sentence popularity probability divided under the parts of speech to obtain the final sentence popularity probability of the corresponding text;
and the second calculation module is used for calculating the average value of the final sentence smoothness probabilities of the texts as a threshold value for judging whether the sentences are good or bad.
Preferably, the sentence smoothness judging system further comprises a comparison module;
the first calculation module is used for calculating the final sentence smoothness probability of the sentence to be distinguished, the comparison module is used for comparing the final sentence smoothness probability of the sentence to be distinguished with a threshold value to judge whether the sentence to be distinguished is smooth or not, the sentence smoothness is judged when the final sentence smoothness probability of the sentence to be distinguished is larger than the threshold value, and the sentence smoothness is judged when the final sentence smoothness probability of the sentence to be distinguished is smaller than the threshold value.
Preferably, the language model training tool employs srilm.
On the basis of the common knowledge in the field, the above preferred conditions can be combined randomly to obtain the preferred embodiments of the invention.
The positive progress effects of the invention are as follows:
the invention judges whether the user voice input sentence is smooth or not through the model, solves the problem of processing strategy of sentence incompatibility caused by unknown source voice recognition result and semantic recognition accuracy rate in the process of man-machine conversation, and improves the user experience degree of man-machine conversation.
Drawings
FIG. 1 is a flowchart illustrating a method for determining the order of sentences according to a preferred embodiment of the invention.
FIG. 2 is a block diagram of a sentence order determination system according to a preferred embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
As shown in fig. 1, the present embodiment provides a sentence smoothness determination method, which includes the following steps:
And 102, carrying out word occurrence frequency statistics by using a 3-gram model according to word and part-of-speech division by using a srilm language model training tool according to the preprocessed text, and storing a statistical result.
103, calculating the text passing degree probability of each text divided under the 3-gram model according to the word and the part of speech, taking the influence of different sentence lengths on the result into consideration, dividing the following text passing degree probability of each text divided according to the word and the part of speech by the sentence length of the corresponding text to serve as the sentence passing degree probability of the corresponding text divided according to the word and the part of speech, and carrying out weighted summation on the sentence passing degree probability of each text divided according to the word and the sentence passing degree probability divided according to the part of speech to obtain the final sentence passing degree probability of the corresponding text.
And 104, calculating the average value of the final sentence smoothness probabilities of the texts to be used as a threshold value for judging whether the sentences are good or bad.
And 105, calculating the final sentence smoothness probability of the sentence to be judged according to the calculation mode in the step 103, comparing the final sentence smoothness probability of the sentence to be judged with a threshold value to judge whether the sentence to be judged is smooth or not, judging the sentence smoothness if the final sentence smoothness probability of the sentence to be judged is larger than the threshold value, and judging the sentence not to be smooth if the final sentence smoothness probability of the sentence to be judged is smaller than the threshold value.
As shown in fig. 2, the embodiment further provides a sentence smoothness determination system, which includes a preprocessing module 1, a statistical module 2, a first calculating module 3, a second calculating module 4, and a comparing module 5.
The preprocessing module 1 is used for collecting a plurality of texts as training corpora and preprocessing each text, wherein the preprocessing includes word segmentation and part of speech tagging.
And the statistical module 2 is used for performing word occurrence frequency statistics by using a srilm language model training tool according to the preprocessed text and a 3-gram model according to word and part-of-speech division respectively, and storing statistical results.
The first calculation module 3 is configured to calculate a text smoothness probability of each text divided under the 3-gram model according to words and parts of speech, take the influence of different sentence lengths on a result into consideration, divide the text smoothness probability of each text divided according to words and parts of speech by the sentence length of the corresponding text as a sentence smoothness probability of the corresponding text divided according to words and parts of speech, and perform weighted summation on the sentence smoothness probability of each text divided according to words and parts of speech to obtain a final sentence smoothness probability of the corresponding text.
The second calculating module 4 is configured to calculate an average value of the final sentence smoothness probabilities of the texts as a threshold for judging whether the sentence is good or bad.
The first calculating module 3 is used for calculating the final sentence smoothness probability of the sentence to be distinguished, the comparing module 5 is used for comparing the final sentence smoothness probability of the sentence to be distinguished with a threshold value to judge whether the sentence to be distinguished is smooth or not, judging the sentence smoothness when the final sentence smoothness probability of the sentence to be distinguished is larger than the threshold value, and judging the sentence non-smoothness when the final sentence smoothness probability of the sentence to be distinguished is smaller than the threshold value.
The invention judges whether the user voice input sentence is smooth or not through the model, solves the problem of processing strategy of sentence incompatibility caused by unknown source voice recognition result and semantic recognition accuracy rate in the process of man-machine conversation, and improves the user experience degree of man-machine conversation.
While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that these are by way of example only, and that the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the spirit and scope of the invention, and these changes and modifications are within the scope of the invention.
Claims (6)
1. A sentence smoothness judging method is characterized by comprising the following steps:
s1, collecting a plurality of texts as training corpora, and preprocessing each text, wherein the preprocessing comprises word segmentation and part of speech tagging;
s2, according to the preprocessed text, utilizing a language model training tool to perform word occurrence frequency statistics by using a 3-gram model according to word and part-of-speech division respectively, and storing statistical results;
s3, calculating text passing degree probability of each text divided under a 3-gram model according to words and parts of speech, dividing the text passing degree probability according to words and parts of speech by the sentence length of the corresponding text to serve as sentence passing degree probability of the corresponding text divided under words and parts of speech, and carrying out weighted summation on the sentence passing degree probability of each text divided under words and the sentence passing degree probability divided under parts of speech to obtain the final sentence passing degree probability of the corresponding text;
and S4, calculating the average value of the final sentence smoothness probability of the texts to be used as a threshold value for judging whether the sentence is good or bad.
2. The sentence smoothness judging method of claim 1, wherein S5, according to the calculation method in step S3, calculates a final sentence smoothness probability of the sentence to be judged, compares the final sentence smoothness probability of the sentence to be judged with a threshold to judge whether the sentence to be judged is smooth, if the final sentence smoothness probability of the sentence to be judged is greater than the threshold, judges the sentence smooth, and if the final sentence smoothness probability of the sentence to be judged is less than the threshold, judges the sentence not smooth.
3. The sentence order judging method of claim 1 wherein the language model training tool is srilm.
4. A sentence smoothness judging system is characterized by comprising a preprocessing module, a statistical module, a first calculating module and a second calculating module;
the preprocessing module is used for collecting a plurality of texts as training corpora and preprocessing each text, wherein the preprocessing comprises word segmentation and part of speech tagging;
the statistical module is used for performing word occurrence frequency statistics by using a 3-gram model according to word and part of speech division by using a language model training tool according to the preprocessed text, and storing statistical results;
the first calculation module is used for calculating the text smoothness probability of each text divided under the 3-gram model according to words and parts of speech, dividing the text smoothness probability of each text according to words and parts of speech by dividing the sentence length of the corresponding text as the sentence smoothness probability of the corresponding text divided under the words and parts of speech, and weighting and summing the sentence smoothness probability of each text divided under the words and the sentence smoothness probability divided under the parts of speech to obtain the final sentence smoothness probability of the corresponding text;
and the second calculation module is used for calculating the average value of the final sentence smoothness probabilities of the texts as a threshold value for judging whether the sentences are good or bad.
5. The sentence smoothness determination system of claim 4 further comprising a comparison module;
the first calculation module is used for calculating the final sentence smoothness probability of the sentence to be distinguished, the comparison module is used for comparing the final sentence smoothness probability of the sentence to be distinguished with a threshold value to judge whether the sentence to be distinguished is smooth or not, the sentence smoothness is judged when the final sentence smoothness probability of the sentence to be distinguished is larger than the threshold value, and the sentence smoothness is judged when the final sentence smoothness probability of the sentence to be distinguished is smaller than the threshold value.
6. The sentence smoothness determination system of claim 4 wherein the language model training tool employs srilm.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910820551.5A CN112528629A (en) | 2019-08-29 | 2019-08-29 | Sentence smoothness judging method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910820551.5A CN112528629A (en) | 2019-08-29 | 2019-08-29 | Sentence smoothness judging method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112528629A true CN112528629A (en) | 2021-03-19 |
Family
ID=74974092
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910820551.5A Pending CN112528629A (en) | 2019-08-29 | 2019-08-29 | Sentence smoothness judging method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112528629A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104598439A (en) * | 2013-10-30 | 2015-05-06 | 阿里巴巴集团控股有限公司 | Title correction method and device of information object and method for pushing information object |
CN105005557A (en) * | 2015-08-06 | 2015-10-28 | 电子科技大学 | Chinese ambiguity word processing method based on dependency parsing |
CN107291693A (en) * | 2017-06-15 | 2017-10-24 | 广州赫炎大数据科技有限公司 | A kind of semantic computation method for improving term vector model |
CN108255857A (en) * | 2016-12-29 | 2018-07-06 | 北京国双科技有限公司 | A kind of sentence detection method and device |
CN109344830A (en) * | 2018-08-17 | 2019-02-15 | 平安科技(深圳)有限公司 | Sentence output, model training method, device, computer equipment and storage medium |
CN110134952A (en) * | 2019-04-29 | 2019-08-16 | 华南师范大学 | A kind of Error Text rejection method for identifying, device and storage medium |
-
2019
- 2019-08-29 CN CN201910820551.5A patent/CN112528629A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104598439A (en) * | 2013-10-30 | 2015-05-06 | 阿里巴巴集团控股有限公司 | Title correction method and device of information object and method for pushing information object |
CN105005557A (en) * | 2015-08-06 | 2015-10-28 | 电子科技大学 | Chinese ambiguity word processing method based on dependency parsing |
CN108255857A (en) * | 2016-12-29 | 2018-07-06 | 北京国双科技有限公司 | A kind of sentence detection method and device |
CN107291693A (en) * | 2017-06-15 | 2017-10-24 | 广州赫炎大数据科技有限公司 | A kind of semantic computation method for improving term vector model |
CN109344830A (en) * | 2018-08-17 | 2019-02-15 | 平安科技(深圳)有限公司 | Sentence output, model training method, device, computer equipment and storage medium |
CN110134952A (en) * | 2019-04-29 | 2019-08-16 | 华南师范大学 | A kind of Error Text rejection method for identifying, device and storage medium |
Non-Patent Citations (1)
Title |
---|
何天文 等: "基于语义语法分析的中文语句困惑度评价", 《计算机应用研究》, vol. 34, no. 12, pages 3538 - 3542 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108305634B (en) | Decoding method, decoder and storage medium | |
KR100655491B1 (en) | Two stage utterance verification method and device of speech recognition system | |
US8990084B2 (en) | Method of active learning for automatic speech recognition | |
US8532991B2 (en) | Speech models generated using competitive training, asymmetric training, and data boosting | |
US8280733B2 (en) | Automatic speech recognition learning using categorization and selective incorporation of user-initiated corrections | |
US7421387B2 (en) | Dynamic N-best algorithm to reduce recognition errors | |
US10217457B2 (en) | Learning from interactions for a spoken dialog system | |
CN107123417A (en) | Optimization method and system are waken up based on the customized voice that distinctive is trained | |
CN109036471B (en) | Voice endpoint detection method and device | |
CN106847259B (en) | Method for screening and optimizing audio keyword template | |
JP6875819B2 (en) | Acoustic model input data normalization device and method, and voice recognition device | |
CN112927679A (en) | Method for adding punctuation marks in voice recognition and voice recognition device | |
CN111883137A (en) | Text processing method and device based on voice recognition | |
CN110853669B (en) | Audio identification method, device and equipment | |
CN114999463B (en) | Voice recognition method, device, equipment and medium | |
Novotney et al. | Analysis of low-resource acoustic model self-training | |
JP3628245B2 (en) | Language model generation method, speech recognition method, and program recording medium thereof | |
KR101229108B1 (en) | Apparatus for utterance verification based on word specific confidence threshold | |
CN112528629A (en) | Sentence smoothness judging method and system | |
JP3621922B2 (en) | Sentence recognition apparatus, sentence recognition method, program, and medium | |
JPH08314490A (en) | Word spotting type method and device for recognizing voice | |
JP2000075885A (en) | Voice recognition device | |
CN113327596B (en) | Training method of voice recognition model, voice recognition method and device | |
CN109101499B (en) | Artificial intelligence voice learning method based on neural network | |
CN114822538A (en) | Method, device, system and equipment for training and voice recognition of re-grading model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |