CN114638219A

CN114638219A - Intelligent wrong word recognition method based on machine learning algorithm

Info

Publication number: CN114638219A
Application number: CN202210137942.9A
Authority: CN
Inventors: 赖贵全; 唐宇
Original assignee: Chengdu Yida Shuan Technology Co ltd
Current assignee: Chengdu Yida Shuan Technology Co ltd
Priority date: 2022-02-15
Filing date: 2022-02-15
Publication date: 2022-06-17

Abstract

The invention discloses an intelligent wrong word recognition method based on a machine learning algorithm, belongs to the technical field of intelligent information recognition, and solves the problem that analysis errors exist in wrong word recognition due to the fact that a wrong word recognition technology cannot automatically update a wrong word library, and the method comprises the following steps: (1) establishing a wrong word recognition management system on an application platform; (2) the wrong word recognition management system classifies the wrongly written characters in the information application; (3) the wrong word recognition management system is connected with the server to establish a wrong word library and learn and record words in the wrong word library; (4) the wrong word and phrase recognition management system collects all published manuscripts to form a historical draft library, adopts a neural network algorithm to carry out artificial intelligent recognition and learning on the historical draft library, and updates the historical draft library into a wrong word library; (5) after the artificial intelligence identification, the alarm is given for artificial verification and modification. The invention is used for carrying out high intelligent identification on the information manuscript in each application.

Description

Intelligent wrong word recognition method based on machine learning algorithm

Technical Field

The invention belongs to the technical field of intelligent information identification, and particularly relates to an intelligent wrong word identification method based on a machine learning algorithm.

Background

Character recognition is a technology for automatically recognizing characters by using a computer, and is an important field of pattern recognition application. People need to process a large amount of words, reports and texts in production and life. In order to reduce the labor and improve the processing efficiency, the 50 s began to explore the general character recognition method and developed an optical character recognizer. In the 60 s, utility machines using magnetic ink and special fonts were introduced. In the later 60 s, a plurality of character types and handwritten character recognition machines appeared, and the recognition precision and the machine performance of the character recognition machines can basically meet the requirements. Such as a handwritten form number recognition machine and a printed form english number recognition machine for letter sorting. In the 70 s, the basic theory of character recognition and the development of high-performance character recognition machines were mainly studied, and the research of character recognition was emphasized.

The character recognition generally includes several parts, such as collection of character information, analysis and processing of information, classification and discrimination of information, and the like.

The information collection is to convert the gray scale of the characters on the paper surface into electric signals and input the electric signals into a computer. The information collection is realized by a paper feeding mechanism and a photoelectric conversion device in the character recognition machine, and the photoelectric conversion device comprises a flying spot scanning device, a camera, a photosensitive element, a laser scanning device and the like.

The information analysis and processing is to eliminate various noises and interferences caused by printing quality, paper quality (uniformity, stain, etc.) or writing tools, etc., and to perform various normalization processes such as size, deflection, shading, thickness, etc., on the converted electric signals.

The information classification and judgment is to classify and judge the normalized text information after the noise is removed so as to output a recognition result.

At present, some wrongly-written characters recognition technologies in various APP or background management systems are matched through words in fuzzy query articles. The technology cannot automatically add a wrong word bank and intelligently identify the emotional trend of the article, so that the use scene of the sensitive words cannot be analyzed wrongly.

Disclosure of Invention

The invention aims to:

the intelligent wrong word identification method based on the machine learning algorithm is provided for solving the problem that analysis errors exist in wrong word identification due to the fact that a wrong word library cannot be automatically updated by wrong word identification technologies in various kinds of APP in the prior art.

The technical scheme adopted by the invention is as follows:

an intelligent wrong word recognition method based on a machine learning algorithm comprises the following steps:

(1) establishing a wrong word recognition management system on the application platform, wherein the wrong word recognition management system is used for recognizing and managing wrong words of the news media information APP;

(2) the wrong word recognition management system classifies wrong words in the information application, and the wrong word classification comprises the following steps: punctuation marks, names of people, positions, digital usage and common error-prone words of similar meaning words, wherein the digital usage errors comprise case errors and digital symbol errors;

(3) the wrong word recognition management system is connected with a server to establish a wrong word bank, and words in the wrong word bank are learned and recorded, wherein the wrong word bank comprises a name word bank, a position word bank and a wrong word bank;

(4) the wrong word and phrase recognition management system collects all published manuscripts to form a historical draft library, adopts a neural network algorithm to carry out artificial intelligent recognition and learning on the historical draft library, and updates the historical draft library into a wrong word library;

(5) after the manuscripts to be checked are identified through artificial intelligence, the error word identification management system alarms the error words and reports the alarm words to artificial verification and modification.

Further, in the step (4), an artificial intelligence algorithm is adopted to analyze a scene in which commonly used words in a historical manuscript library are associated and used together, when the scene meets the use times meeting the requirement of the fixed words, the words are learned and recorded, and when the manuscript is subjected to wrong word recognition, a wrong word recognition management system recognizes words which are overlapped with the words of the fixed words and have a difference part, a wrong alarm is given.

Further, the method for performing the machine learning analysis of the error words by adopting the decision tree algorithm comprises the following steps:

a. and (3) generating a decision tree: generating a decision tree by a wrong word sample set, wherein the wrong word sample set is a data set which has history according to actual needs, has a certain degree of integration and is used for data analysis and processing;

b. pruning of the decision tree: and (4) checking, correcting and modifying the decision tree generated at the last stage, and checking a preliminary rule generated in the decision tree generation process by using data in the new error word sample data set so as to cut branches influencing the accuracy of pre-balance.

Further, after the error word recognition management system recognizes and gives an alarm, the recognized error words are classified and uploaded to the error word library of the server, the error word library is updated, and a neural network algorithm is adopted for a new round of learning and updating.

In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:

1. on the basis of the prior art, the invention adopts an artificial intelligence recognition technology to continuously learn, update and recognize the wrong word and word lexicon, realizes the automatic addition of the content of the lexicon, and intelligently recognizes the emotional tendency of an article, thereby achieving the effect of more precise warning of the wrong words and words by analyzing the scene errors of the use of sensitive words.

2. The method is different from the fuzzy word matching and identifying method, adopts the accurate word use scene analysis and identification method, realizes the error correction effect that the word can be identified when the use scene is wrong even if the word is edited correctly, greatly improves the intelligent degree of the wrong word identifying method, and provides convenience for word workers.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Example 1

(1) establishing an error word recognition management system on the application platform, wherein the error word recognition management system is used for recognizing and managing error words of the news media information APP;

(3) the wrong word recognition management system is connected with the server to establish a wrong word bank, and words in the wrong word bank are learned and recorded, wherein the wrong word bank comprises a name word bank, a position word bank and a wrong word bank;

After the error word recognition management system recognizes and gives an alarm, the recognized error words are classified and uploaded to an error word library of the server, the error word library is updated, and a neural network algorithm is adopted for a new round of learning and updating.

Compared with the prior art, the method and the device have the advantages that the high-level error characteristics can be automatically learned through the neural network learned through a machine without manually defining a training set by recognizing the characteristics with higher levels and more abstract characteristics, so that the wrong words can be intelligently recognized, and the error correction rate is greatly improved.

In the step (4), a scene that common words in a historical draft library are associated and used is analyzed by adopting an artificial intelligence algorithm, when the frequency of use of the common words meets the requirement of a fixed word is reached, the words are learned and recorded, and when the wrong words are identified on manuscripts, a wrong word identification management system identifies words which are partially overlapped with the words of the fixed words and have a difference part, a wrong alarm is given.

Different from a fuzzy word matching and identifying method, the method adopts an accurate word use scene analysis and identification method, realizes the error correction effect that the words can be identified when the use scene is wrong even if the words are edited correctly, and greatly improves the intelligent degree of the wrong word identifying method.

Preferably, the method for performing the wrong word machine learning analysis by using the decision tree algorithm comprises the following steps:

Example 2

Preferably, the error word machine learning analysis is performed by adopting a naive Bayes algorithm, and the naive Bayes algorithm is a classification algorithm. It is not a single algorithm, but a series of algorithms, all of which have a common principle, i.e. each feature being classified is independent of the value of any other feature.

The Bayesian method is characterized by combining the prior probability and the posterior probability, thereby avoiding the subjective bias of only using the prior probability and avoiding the over-fitting phenomenon of singly using the sample information. The Bayesian classification algorithm shows higher accuracy under the condition of larger data set, and the algorithm is simpler.

The naive Bayes method is correspondingly simplified on the basis of a Bayes algorithm, namely that the attributes are mutually independent under the condition when a target value is given. That is, there is no attribute variable that has a large weight on the decision result, nor is there an attribute variable that has a small weight on the decision result. Although the simplified method reduces the classification effect of the Bayesian classification algorithm to a certain extent, in an actual application scenario, the complexity of the Bayesian method is greatly simplified.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. An intelligent wrong word identification method based on a machine learning algorithm is characterized by comprising the following steps:

2. The intelligent wrong word recognition method based on machine learning algorithm as claimed in claim 1, characterized in that in step (4), an artificial intelligence algorithm is adopted to analyze the scene of using the commonly used words in the historical draft library in a linked manner, when the usage times of the commonly used words meet the requirement of the fixed words are reached, the words are learned and recorded, and when the wrong word recognition is performed on the manuscript, the wrong word recognition management system recognizes the words which are overlapped with the words of the fixed words and have different parts, and a wrong alarm is given.

3. The intelligent recognition method for the error word based on the machine learning algorithm according to claim 1, characterized in that the decision tree algorithm is adopted to perform the machine learning analysis of the error word, and comprises the following steps:

4. The intelligent recognition method for the wrong words based on the machine learning algorithm as claimed in claim 1, characterized in that after the recognition and alarm of the wrong words, the recognition management system classifies the recognized wrong words and uploads the classified wrong words to the wrongly written word library of the server, the wrongly written word library is updated, and a neural network algorithm is adopted to perform a new round of learning and updating.