CN116341525A

CN116341525A - Text examination and correction system based on natural language processing

Info

Publication number: CN116341525A
Application number: CN202310294883.0A
Authority: CN
Inventors: 洪创波
Original assignee: Guangdong Chaoting Group Co ltd
Current assignee: Guangdong Chaoting Group Co ltd
Priority date: 2023-03-24
Filing date: 2023-03-24
Publication date: 2023-06-27

Abstract

The invention relates to the technical field of natural language processing, in particular to a text examination and correction system based on natural language processing, which comprises the steps of inputting natural language information, classifying the natural language information into voice information, picture information and text information, respectively carrying out text conversion on the voice information and the picture information, uniformly carrying out morphological analysis, then carrying out fluency judgment, and carrying out examination and correction on the text information with lower fluency. The invention has the advantages that: the analyzed semantic data is subjected to secondary smoothness judgment, error text information is further screened, the accuracy of system output processing information is improved, and the system is timely prompted to an operator, so that the operator can know the processing progress and difficulty of natural language according to error correction conditions, and the operator can directly know error correction details according to labels of error correction positions in the prompt, so that an error correction program of the system is debugged, and the accuracy and efficiency of text inspection and error correction of the system are improved.

Description

Text examination and correction system based on natural language processing

Technical Field

The invention relates to the technical field of natural language processing, in particular to a text examination and correction system based on natural language processing.

Background

Natural language processing is an important direction in the fields of computer science and artificial intelligence, and is used for researching various theories and methods capable of realizing effective communication between people and computers by using natural language, and natural language processing is a science integrating linguistics, computer science and mathematics, and the research in the field relates to natural language, namely language used by people in daily life, so that the natural language processing has close relation with the research of the linguistics, but has important differences. Natural language processing is not a general study of natural language, but rather, is the development of computer systems, and in particular software systems therein, that are part of computer science that can effectively implement natural language communications.

A natural language processing system, a natural language processing method, and a natural language processing program disclosed in chinese patent CN106030568B are capable of automatically correcting a segmentation model of morphological analysis within a certain time.

The existing natural language processing system has the defects that: the existing natural language processing system is used for analyzing and identifying the voice information, but the input information is possibly in error in the process of inputting and converting the natural language during voice input and image input due to problems such as accent, picture definition and the like, the existing natural language processing system is poor in identification capability of the natural language input information with errors, and is used for correcting the information, and correction bases for timely feeding back confirmation to an information input source are less in error correction capability.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, and provides a text examination and correction system based on natural language processing, which effectively solves the defects of the prior art.

The aim of the invention is achieved by the following technical scheme: a text censoring and error correction system based on natural language processing, comprising the steps of:

1) Inputting natural language information, and classifying the natural language information into voice information, picture information and text information;

2) The voice information is converted into character information, the smoothness of the converted character information is judged, the character information with smoothness reaching the standard is input into a character information module, and the character information with smoothness not reaching the standard is input into the character information module after being corrected;

3) The picture information is identified and converted into character information, and intelligent typesetting is carried out on the character information on the picture;

4) Judging the fluency of the text information converted from the picture information, performing intelligent error correction on the text information with lower fluency, and inputting the text information into a text information module;

5) Performing morphological analysis and translation on the input text information and generating semantic data;

6) Judging semantic fluency of the semantic data;

7) The semantic data with the smoothness reaching the standard is directly output;

8) And the semantic data retrieval equipment database with lower fluency is used for correcting errors.

Optionally, the text information with lower fluency in the step 2) is extracted, then a voice back-query confirmation is provided for the sounding source, after the sounding source answers according to the confirmation question, the voice information is converted into text and is compared with the text information for the first time for analysis and combination, and after the text information is analyzed and combined for two times, error correction is completed and input to the text information module.

The technical scheme is adopted: through judging the fluency of the voice information, after a user sends the voice information, the system can timely find the problem of natural language information which cannot be recognized in voice characters after voice recognition, and can perform a reverse query to supplement the judgment basis for correcting the voice information, so that the accuracy of correcting the voice information is improved, the situation that the input confidence judgment is wrong due to the problems of dialect pronunciation and the like in the voice input process, and the correction basis is insufficient to cause larger error correction deviation of the voice information is avoided, and the voice information is directly corrected by converting the voice information into the text information, so that the difficulty of language processing caused by lower accuracy of the input text information can be reduced.

Optionally, the text information with lower fluency in the step 4 sends an error correction prompt to an operator when intelligent error correction is performed, and error correction recording is performed on the equipment database.

The technical scheme is adopted: through carrying out intelligent error correction to the character information identified on the picture, when the characters on the picture are smeared or the fonts are not neat, the method can utilize the logical relation of the dialect to carry out repair to a certain extent, and send error correction prompt to an operator, so that the operator can assist natural language processing equipment to manually read unclear pattern information, the error rate of the system for identifying the illegal fonts is reduced, or the operator can search the reason of the information processing error according to the error correction prompt after the pattern information processing error, record the error correction information and record the unclear or illegal fonts, and when the similar patterns are identified again, the error correction record in the database of the equipment is called to assist in carrying out font identification, thereby the system can improve the accuracy of font identification after identifying and storing the font pictures for multiple times, and self-service error correction is realized.

Optionally, the specific names and analysis results after the morphological analysis in the step 5) are stored into a device database, and the device database is compared and searched during the morphological analysis.

The technical scheme is adopted: the device database is arranged to store the result of the word information morpheme analysis, so that the subsequent morpheme analysis can be compared with the analysis data in the database, the morpheme analysis efficiency is improved, the system can store more natural language processing data in multiple morpheme analysis, the accuracy of language processing identification is improved, some special names of the word information can only represent a person or an article, after the special names are searched and the search information is stored, data processing can be carried out according to the search condition of the special names when the names are identified next time, for example, a vehicle can be known when a vehicle name is identified, and the accuracy of morpheme analysis is improved.

Optionally, after the semantic data in the step 8) is subjected to error correction in the search database, an error correction prompt is popped up to an operator, the error correction prompt marks the error corrected text, and the error correction prompt is stored in the equipment database.

The technical scheme is adopted: the analyzed semantic data is subjected to secondary smoothness judgment, error text information is further screened, the accuracy of system output processing information is improved, and the system is timely prompted to an operator, so that the operator can know the processing progress and difficulty of natural language according to error correction conditions, and the operator can directly know error correction details according to labels of error correction positions in the prompt, so that an error correction program of the system is debugged, and the accuracy and efficiency of text inspection and error correction of the system are improved.

The invention has the following advantages:

1. according to the text examination and correction system based on natural language processing, smoothness judgment is carried out on voice information, so that after a user sends out the voice information, the system can timely find out the problem of natural language information which cannot be identified in voice characters after voice recognition and can carry out a back question to supplement judgment basis for voice information correction, the accuracy of voice information correction is improved, the situation that input confidence judgment is wrong due to the fact that the problem of dialect pronunciation and the like in the voice input process, correction deviation of the voice information is large due to the fact that the correction basis is insufficient is avoided, and correction is carried out on the voice information after the voice information is converted into the character information, so that the difficulty of language processing caused by low accuracy of the input character information can be reduced.

2. According to the text examination and correction system based on natural language processing, intelligent correction is carried out on character information identified on a picture, when characters on the picture are smeared or fonts are irregular, a certain degree of restoration can be carried out by utilizing a logic relation of a front language and a rear language, correction prompts are sent to an operator, so that the operator can assist a natural language processing device to manually read unclear pattern information, the error rate of the system for identifying the illegal fonts is reduced, or the operator can search the cause of the information processing errors according to the correction prompts after the pattern information processing errors, record the correction information and can record the unclear or illegal fonts, and when similar patterns are identified again, correction record assistance in a database of the retrieval device can be carried out for carrying out font identification, and therefore the system can improve the accuracy of font identification after multiple times of identification and storage of the font pictures, and self-service correction is realized.

3. According to the text examination and correction system based on natural language processing, a device database is arranged to store the result of text information morpheme analysis, so that the subsequent morpheme analysis can be compared with analysis data in the database, the morpheme analysis efficiency is improved, the system can store more natural language processing data in multiple morpheme analyses, the accuracy of language processing identification is improved, some special names of text information can only represent a person or an article, after the special names are searched and search information is stored, data processing can be carried out according to the search condition of the special names when the names are identified next time, for example, a car name can be identified, and the accuracy of morpheme analysis is improved.

4. According to the text examination and correction system based on natural language processing, through performing secondary smoothness judgment on the analyzed semantic data, wrong text information is further screened, the accuracy of system output processing information is improved, and timely prompt is given to an operator, so that the operator can know the processing progress and difficulty of the natural language according to the correction condition, and the operator can directly know correction details according to the marks of correction positions in prompt, so that the correction program of the system is debugged, and the accuracy and efficiency of text examination and correction of the system are improved.

Drawings

Fig. 1 is a block diagram of the system operation structure of the present invention.

Detailed Description

Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present invention and should not be construed as limiting the invention.

As shown in fig. 1, a text review correction system based on natural language processing includes the steps of:

6) Judging semantic fluency of the semantic data;

Example 1: the text information with lower fluency in the step 2) is extracted and then a voice back-asking confirmation is provided for a sound source, after the sound source answers according to the confirmation question, the voice information is converted into text and is compared, analyzed and combined with the text information for the first time, error correction is completed after the text information is analyzed and combined for two times, and input to a text information module, after the voice information is sent out by a user, the fluency judgment is carried out on the voice information, the system can timely find out the problem of natural language information which cannot be recognized in the voice text after voice recognition, back-asking is carried out to supplement the judgment basis for error correction of the voice information, the accuracy rate of error correction of the voice information is improved, the situation that the input confidence judgment is wrong due to the problem of pronunciation of the dialect and the like in the voice input process, error correction deviation of the voice information is larger due to the fact that the error correction basis is insufficient is corrected directly, and the difficulty in language processing caused by the fact that the input text information is low in the accuracy rate of the voice information is reduced.

Example 2: in the step 4, the text information with lower fluency sends an error correction prompt to an operator when intelligent error correction is performed, error correction recording is performed on an equipment database, through intelligent error correction is performed on the text information identified on the picture, when the text on the picture is smeared or the fonts are not regular, the text information can be repaired to a certain extent by utilizing the logical relation of the front and rear languages, and the error correction prompt is sent to the operator, so that the operator can assist a natural language processing equipment to manually read the unclear pattern information, the error rate of the system for identifying the illegal fonts is reduced, or the operator can search the cause of the information processing error according to the error correction prompt after the pattern information processing error, record the unclear or illegal fonts, and when similar patterns are identified again, the error correction recording in the equipment database can be called for assisting in performing font identification, thereby improving the identification accuracy after the fonts are identified and stored for many times, and realizing self-service error correction.

Example 3: the specific names and analysis results after the morphological analysis in the step 5) are stored into a device database, the device database is compared and searched during the morphological analysis, the device database is arranged to store the results after the morphological analysis of the text information, so that the subsequent morphological analysis can be compared with the analysis data in the database, the morphological analysis efficiency is improved, the system can store more natural language processing data in multiple morphological analysis, the accuracy of language processing recognition is improved, some special names of the text information possibly only represent a person or an article, after the special names are searched and the search information is stored, the data processing can be performed according to the search condition of the special names when the names are recognized next time, for example, a car name can be recognized, and the accuracy of the morphological analysis is improved.

Example 4: the semantic data in the step 8) are searched for error correction of the database, then an error correction prompt is popped up to an operator, error correction characters are marked in the error correction prompt, the error correction prompt is stored in a device database, error text information is further screened through second smoothness judgment on the analyzed semantic data, the accuracy of system output processing information is improved, the operator is prompted in time, the operator can know the processing progress and difficulty of natural language according to error correction conditions, and the operator can directly know error correction details according to marks of error correction positions in the prompt, so that an error correction program of the system is debugged, and the accuracy and efficiency of text examination and error correction of the system are improved.

The working principle of the invention is as follows:

s1, converting voice information and picture information into text information, judging fluency, and performing preliminary screening;

s2, performing morphological analysis on the summarized text information and correcting the error of part of semantic data.

Compared with the prior art, the invention has the following beneficial effects compared with the prior art:

Claims

1. A text censoring and correcting system based on natural language processing, which is characterized in that: the method comprises the following steps:

6) Judging semantic fluency of the semantic data;

2. A text censoring and correction system based on natural language processing as recited in claim 1, wherein: and 2) extracting the text information with lower fluency in the step 2), then providing a voice back-inquiry confirmation for a sound source, after the sound source answers according to the confirmation question, converting the voice information into text, carrying out comparison analysis combination with the text information of the first time, and completing error correction input to a text information module after the text information is analyzed and combined twice.

3. A text censoring and correction system based on natural language processing as claimed in claim 2, wherein: and (4) sending error correction prompts to operators when intelligent error correction is performed on the text information with low fluency in the step, and performing error correction recording on the equipment database.

4. A text censoring and correction system based on natural language processing as recited in claim 3, wherein: and 5) storing the specific names and analysis results after the morphological analysis in the step 5) into a device database, and comparing and searching the device database during the morphological analysis.

5. A natural language processing based text censoring and correction system as set forth in claim 4 wherein: and (3) after the semantic data in the step 8) are subjected to database retrieval and error correction, popping up an error correction prompt to an operator, marking error correction characters in the error correction prompt, and storing the error correction prompt into a device database.