CN110991166B

CN110991166B - Chinese wrongly-written character recognition method and system based on pattern matching

Info

Publication number: CN110991166B
Application number: CN201911219533.8A
Authority: CN
Inventors: 曹馨宇; 王海涛; 刘亮亮; 付雪; 赵静; 张帆; 赵超; 吴刚; 丁文兴; 周长青
Original assignee: China National Institute of Standardization
Current assignee: China National Institute of Standardization
Priority date: 2019-12-03
Filing date: 2019-12-03
Publication date: 2021-07-30
Anticipated expiration: 2039-12-03
Also published as: CN110991166A

Abstract

The invention discloses a wrongly written or mispronounced character recognition method based on pattern matching, which comprises the following steps: s1, defining a wrongly written character recognition mode according to the structural characteristics of a language; s2, establishing an index of a wrongly written character recognition mode based on a graph storage structure; and S3, automatically checking and correcting the error of the text to be checked through the index of the wrongly written character recognition mode. The method disclosed by the invention integrates grammar restriction and conditional function collocation to recognize wrongly written characters by defining a wrongly written character recognition mode, can effectively aim at errors which violate local or long-distance grammar constraint conditions, and has good accuracy; the system realizes the definition of the wrongly written character recognition mode and the establishment of the index through a program, and automatically corrects and corrects the error of the text to be checked by utilizing the wrongly written character recognition mode index; the index structure is established based on the flexibility of the graph storage structure to realize breadth-first search and depth-first search of data, so that a perfect database (matching library) is constructed, and the accuracy of wrongly-written character recognition is improved.

Description

Chinese wrongly-written character recognition method and system based on pattern matching

Technical Field

The invention relates to the technical field of natural language processing by an artificial intelligent computer, in particular to a Chinese wrongly written character recognition method and system based on pattern matching.

Background

Automatic proofreading of Chinese text is one of the main applications of natural language processing, and is also a difficult problem of natural language understanding. With the advent of the big data age, errors in Chinese texts are increasing, some wrongly written characters in texts can be effectively found and automatically corrected by a statistical method and a machine learning method, but some wrongly written characters in texts are errors caused by violating local or long-distance grammatical or semantic constraints, so that the wrongly written characters are difficult to find and prepare by some contexts, and the process needs to be completed by some grammatical rules and semantic collocation. For example, common words such as "that" and "where", "and" ground "are often confused and errors occur frequently, and generally, the automatic proofreading method is difficult to find or has a particularly high error correction rate, and it is not sufficient to determine whether an error occurs by using a single context or collocation identification when finding such an error.

In view of the above, the present invention is particularly proposed.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a Chinese wrongly-written character recognition method and system based on pattern matching, and the recognition accuracy is improved.

In order to achieve the purpose, the technical scheme of the invention is as follows:

a wrongly written or mispronounced word recognition method based on pattern matching comprises the following steps:

s1, defining a wrongly written character recognition mode according to the structural characteristics of a language;

s2, establishing an index of a wrongly written character recognition mode based on a graph storage structure;

and S3, automatically checking and correcting the error of the text to be checked through the index of the wrongly written character recognition mode.

Further, in the method for identifying wrongly written characters based on pattern matching, in step s1, a wrongly written character identification pattern is established according to the grammar structure and semantic restriction features of the Chinese language, including

And setting recognition matching conditions and associating semantic operation as recognition rules to form the wrongly-written character recognition mode.

Further, in the method for identifying wrongly written characters based on pattern matching, the identification matching condition in step s11 is formed by matching restriction functions; the limiting function includes

NOTCEAIN (< S >, < W | WORDCLASS1>), used for judging whether the sentence "S" to be debugged contains the target word "W" or the word class "WORDCLASS 1", if not, returning to TRUE, otherwise, returning to FALSE;

NOTINDWITH (< S >, < W | WORDClASS1>), for judging whether the sentence "S" to be debugged ends with the target word "W" or the word class "WORDClASS 1", if not, returning TRUE, otherwise, returning FALSE;

MATCHED (< S >, < W | WORDClASS1>) is used for judging whether the sentence "S" to be debugged matches the target word "W" or the word class "WORDClASS 1", if the matching is successful, returning to TRUE, otherwise returning to FALSE;

the matching of the restriction function is done by a connector.

Further, in the method for identifying wrongly written words based on pattern matching, the semantic operation includes:

OK (< target word >): indicating that the target word is correct if the sentence to be debugged satisfies the recognition matching condition;

MARK (< target word >): indicating that the target word is possible to be wrong and marked if the sentence to be debugged meets the recognition matching condition;

REWRITE (< target word >, < correct word >): the method indicates that if the sentence to be debugged meets the recognition matching condition, the target word is wrong and contains wrongly written characters, and the correct word is the corresponding correct word and is automatically replaced.

Further, in the method for identifying wrongly written characters based on pattern matching, the step s2 of establishing an index of the recognition pattern of wrongly written characters based on the graph storage structure includes

S21, defining a graph structure through codes;

s21, parameters in the graph structure are defined through codes.

Further, in the method for identifying wrongly written characters based on pattern matching, in the step s3, the automatic error checking and automatic error correction are performed on the text to be checked by using the index of the wrongly written character identification pattern, where the method includes:

s31, segmenting words of a sentence to be debugged and marking words at each position;

s32, sequentially filtering words in the sentence to be checked, if the words reach the tail of the sentence, quitting the checking, and if not, turning to S33;

s33, matching words in the sentence to be debugged with the wrongly-written or mispronounced character recognition mode indexes, and if the matching is successful, putting the matching result into a temporary array;

s34, taking intersection of results in the temporary array, judging whether the number of elements successfully matched is equal to the length of the matching rule or not, and putting the rule index numbers with equal length into the final array;

s35, sequentially traversing each rule in the final array, and judging whether the sequence of the successfully matched rules is consistent with the rules or not, if so, successfully matching;

and S36, after the matching is successful, semantic operation is executed according to the back piece of the wrongly-written or mispronounced character recognition mode.

And S37, outputting a debugging result, and finishing the current sentence debugging.

In another aspect, the present invention further relates to a system for identifying wrongly written words based on pattern matching, which includes a processor and a memory, wherein the memory stores a program, and when the program is executed by the processor, the program performs the following steps:

D1. defining a wrongly written character recognition mode according to the structural characteristics of the language;

D2. establishing an index of a wrongly written character recognition mode based on a graph storage structure;

D3. and automatically debugging and correcting the text to be debugged through the established index structure.

Further, in the above-mentioned wrongly written or mispronounced word recognition system based on pattern matching, in step d1, a wrongly written or mispronounced word recognition pattern is established according to the grammatical structure and semantic restriction features of the chinese language, including

Further, in the above system for recognizing wrongly written words based on pattern matching, the recognition matching condition is formed by matching restriction functions; the limiting function includes

the matching of the restriction function is done by a connector.

Further, in the above system for recognizing wrongly written words based on pattern matching, the setting of the recognition matching condition and the semantic operation may include:

Compared with the prior art, the invention has the beneficial effects that:

the method disclosed by the invention fuses grammar restriction and conditional function collocation by defining a wrongly-written character recognition mode, is then used for wrongly-written character recognition, can effectively aim at errors which violate local or long-distance grammar constraint conditions, has good accuracy rate and certain practicability; the system of the invention implements the method, realizes the definition of the wrongly written character recognition mode and the establishment of the index through a program, and automatically corrects and corrects the error of the text to be checked by utilizing the wrongly written character recognition mode index; the index structure is established based on the flexibility of the graph storage structure to realize breadth-first search and depth-first search of data, so that a perfect database (matching library) is constructed, and the accuracy of wrongly-written character recognition is improved.

Drawings

In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below. Throughout the drawings, like elements or portions are generally identified by like reference numerals. In the drawings, elements or portions are not necessarily drawn to scale.

FIG. 1 is a flow chart of an embodiment of a method for identifying wrongly written words based on pattern matching according to the present invention;

Detailed Description

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and therefore are only examples, and the protection scope of the present invention is not limited thereby.

It is to be noted that, unless otherwise specified, technical or scientific terms used herein shall have the ordinary meaning as understood by those skilled in the art to which the invention pertains.

Example 1

As shown in fig. 1, a method for identifying wrongly written words based on pattern matching includes the steps of:

The method is particularly suitable for identifying wrongly written characters of Chinese texts, defines the recognition mode of wrongly written characters by utilizing the characteristics of Chinese syntactic structures, semantic restrictions and the like, combines some syntactic structures and conditional restrictions in the mode, and matches the texts of the sentences to be checked by utilizing the defined mode to check the mistakes and correct the mistakes.

Specifically, in a specific embodiment provided by the present invention, in step s1, a wrongly written or mispronounced word recognition mode is established according to a grammatical structure and semantic restriction features of the chinese language, which specifically includes: and setting recognition matching conditions and associating semantic operation as recognition rules to form the wrongly-written character recognition mode.

In this embodiment, the wrongly written word recognition mode is used as a wrongly written word recognition rule, and the structure of the wrongly written word recognition rule includes a recognition matching condition and a semantic operation associated with the recognition matching condition, so as to perform subsequent semantic operation on sentences meeting the recognition matching condition; wherein the recognition matching condition is defined by a conditional function (restriction function) to the grammatical structure and semantic restriction feature. In the example given in the present invention, the structure of the wrongly written word recognition pattern is as follows:

rule 1: NOTCCONTAIN (S, < target >) & NOTEDWITH (S, < | punctuation >) & MATCHED (S, < | certain type 1> < | certain type 2>) → OK (< target >);

rule 2: notcontinue (S, < word class1>) & notenddwell (S, < target word > < | word class 2>) → MARK (< target word >);

rule 3: NOTCEAIN (S, < word class1>) & MATCHED (S, < target word > < | word class 2>) → REWRITE (< target word >, < correct word >).

The wildcard characters used in the above-mentioned patterns are all in the conventional meaning, such as "+" indicates that any number of characters can be separated in the middle; and means and, etc. The above symbol "→" is used to indicate that the matching condition preceding the character is associated with a semantic operation following it.

The restriction function in the above mode is defined as follows:

NOTCEAIN (< S >, < W | WORDClASS1>) is used for judging whether the sentence "S" to be debugged contains the target word "W" or the word class "WORDClASS 1", if not, returning to TRUE, otherwise, returning to FALSE;

NOTINDWITH (< S >, < W | WORDClASS1>) is used for judging whether the sentence "S" to be debugged ends with the target word "W" or the word class "WORDClASS 1", if not, returning to TRUE, otherwise, returning to FALSE;

MATCHED (< S >, < W | WORDClASS1>) is used to determine whether the sentence "S" to be debugged matches the target word "W" or the word class "WORDClASS 1", and returns TRUE if the matching is successful, otherwise returns FALSE.

It should be noted that "W" is used to refer to the target word in the function; "WORDClASS" is used in the function to refer to a part of speech, and "S" is used in the function to refer to a sentence to be debugged.

The chinese part of speech includes nouns, verbs, adjectives, numerals, quantifiers, pronouns, distinguishments, adverbs, prepositions, conjunctions, auxiliary words, sighs, moods, and vocabularies, and further, the part of speech in this embodiment is defined as follows:<！WORDClASS1>＝<W₁|W₂|...|W_n>(ii) a W represents a specific word or phrase.

In this embodiment, the semantic operation of the wrongly written word recognition mode includes three types, which are respectively defined as follows:

OK (< target word >): indicating that the "< target word >" is correct if the pattern is satisfied;

MARK (< target word >): indicating that the "target word" may be wrong and marked if the sentence satisfies the pattern;

REWRITE (< target word >, < correct word >): if the sentence satisfies the mode, the target word is wrong and contains wrongly written words, and < correct word > "is the corresponding correct word, and is automatically replaced to realize proofreading.

In the method, the wrongly-written character recognition mode is segmented, the index of the wrongly-written character recognition mode is stored by utilizing the structure of the graph, and the graph-based index structure is established. The graph storage structure (short for 'graph structure') is composed of a plurality of nodes, the nodes can be connected with each other to form a network, and in the computer data structure, a graph is one of the most flexible data structures; the invention stores the index structure of the wrongly-written character recognition mode by using the graph structure so as to realize breadth-first search and depth-first search.

The step S2 comprises the following steps:

s21, defining a graph structure by using codes; defining the number of edges, vertexes, introductions, nodes, labels and the like of the graph; the invention provides a specific embodiment of a code definition diagram structure, which comprises the following steps:

static int nEdge; // number of sides

static vector<gtype>G[W]；

static int nRu [ W ]; // degree of penetration

static int nType [ W ]; //1 words, 2 parts of speech

static int nBelong [ W ]; // which rule class it belongs to, initially-1, if not-1, the nType value must be 4 (is a rule point)

// index (Global)

static int nSum; // FindID element number, total number of graph nodes

static map < string, int > FindID; // corresponding reference numerals in the figures

static map < int, string > FindName; establishing a mapping of indices and words

S22, defining the structure of the rule (the rule is a wrongly written or mispronounced word recognition mode) by the code;

static int nRuleClass；

static vector<RuleClassType>RuleClass；

therefore, the graph structure is defined according to the structure correspondence of the wrongly written character recognition mode, namely, the wrongly written character recognition mode index based on the graph storage structure is established.

After the index is established, the mode matching is carried out on the text (Chinese sentence) to be checked for errors through the established index structure, and corresponding operation is carried out according to semantic operation in the matched wrongly-written character recognition mode, so that automatic error checking and automatic error correction are realized.

The step S3 comprises

in this step, the sentence to be searched after word segmentation is W₁W₂…W_NFor the sentence after word segmentation, the tag array Status N is used]Word W for each position_iAnd (3) marking:

an initial state, Status [ i ] ═ 0(1< ═ i < ═ n);

s32, scanning the words W in the sentence S to be debugged in sequence_iIf the end of the sentence S is reached, quitting error checking, and turning to the S37. otherwise, turning to the S33;

s33, the word W in the sentence S to be debugged is_iMatching with the wrongly-written character recognition mode index, and if the matching is successful, putting the matching result into an array vecTempResult (temporary array);

s34, then taking intersection from the result in the array vecTempResult, judging whether the number of elements successfully matched is equal to the length of the matching rule, and putting the index numbers (namely the labels in the code definition graph structure) of the rules with equal length into the array vecResult (final array): the length of the Rule is judged by using "&" in the Rule as a divider, and if two "&" dividers are included in the Rule1, the length is 3.

S35, sequentially traversing each wrongly-recognized character recognition module in the array vecResult, checking whether the sequence of the wrongly-recognized character recognition modes which are successfully matched is consistent with the matching conditions in the wrongly-recognized character recognition modes, and if so, indicating that the matching rule is effective, namely, the matching is successful;

for example: the recognition sentence "are these children this is to do that? ";

in the wrongly written character recognition mode, there are rules: NOTICONTAIN (S, <! ALL QUESTER >) & MATCHED (S, < that > <! no query assistant >) → MARK (that);

the matching process is as follows:

NOTICAIN (S, < | all interrogatories >) -TRUE

MATCHED (S, < that > <! Do not doubt >) -TRUE

Matching is successful, the back-piece in the rule is executed, and the 'that' in the marked sentence is possibly wrong;

and S36, matching successfully, if the back piece is MARK, marking the Status [ i ] of the current target word as 1 to indicate that the word has an error, and if the back piece is REWRITE, marking the Status [ i ] of the current target word as 2 to indicate that the word has an error and replacing the word with a correct word in the back piece in the wrongly-written character recognition mode.

The method disclosed by the invention integrates grammar restriction and condition function collocation by defining a wrongly-written character recognition mode, and is then used for wrongly-written character recognition, so that errors which violate local or long-distance grammar constraint conditions, especially common errors such as 'that' and 'where' and 'ground' and the like, which are difficult to find and automatically correct by a machine learning method, can be effectively targeted; according to the method, through practical experiments, the wrongly-written character recognition mode of 1000 common words with errors is manually summarized, the experiment adopts a test corpus of 1 ten thousand rows of sentences, homophone errors 300 in the corpus sentences are manually constructed, the recall rate of the experiment result reaches 95%, and the accuracy rate reaches 90%; therefore, the method is applied to the identification of wrongly-written characters, has good accuracy and certain practicability.

Example 2

The invention also provides a system for identifying wrongly-written words based on pattern matching, which is used for implementing the method of the invention, and the system comprises a processor and a memory, wherein the memory stores a program, and when the program is run by the processor, the following steps are executed:

The system of the invention is particularly suitable for identifying wrongly written characters of Chinese texts, defines the recognition mode of the wrongly written characters by utilizing the characteristics of Chinese grammatical structure, semantic restriction and the like, and carries out error checking and correction.

In one embodiment, the program of the present invention is executed to perform step d1. establishing a wrongly written character recognition mode according to a grammatical structure and semantic restriction features of chinese, including: and setting recognition matching conditions and associating semantic operation as recognition rules to form the wrongly-written character recognition mode.

In this embodiment, the wrongly written word recognition mode is used as a wrongly written word recognition rule, and the structure of the wrongly written word recognition rule includes a recognition matching condition and a semantic operation associated with the recognition matching condition, so as to perform subsequent semantic operation on sentences meeting the recognition matching condition; the recognition matching condition is defined by a conditional function (restriction function) and a wildcard, etc., for example, the structure of the wrongly written character recognition pattern is as follows:

rule 2: NOTCCONTAIN (S, < word class1>) & NOTEDWITH (S, < target word > < | word class 2>) → MARK (< target word >)

Rule 2: NOTCEAN (S, < word class1>) & MATCHED (S, < target word > < | word class 2>) → REWRITE (< target word >, < correct word >)

The wildcard characters used in the above-mentioned patterns are all in the conventional meaning, such as "+" indicates that any number of characters can be separated in the middle; and means "and", etc.; the above symbol "→" is used to indicate that the matching condition preceding the character is associated with a semantic operation following it.

The restriction function in the above mode is defined as follows:

OK (< target word >): indicating that the "target word" is correct if the pattern is satisfied;

MARK (< target word >): indicating that the "target word" may be flagged incorrectly if the sentence satisfies the pattern;

REWRITE (< target word >, < correct word >): if the sentence satisfies the mode, the target word is wrong and contains wrongly written characters, and the correct word is the corresponding correct word and is automatically replaced to realize proofreading.

In the present invention system, the present invention program is executed, and when executing the step D2., the method includes:

D21. defining a graph structure; the number of edges, vertexes, incomes, nodes and the like of the graph are defined, and one specific implementation given by the invention is as follows:

static int nEdge; // number of sides

static vector<gtype>G[W]；

static int nRu [ W ]; // degree of penetration

static int nType [ W ]; //1 words, 2 parts of speech

static int nBelong [ W ]; // to which rule class

// index (Global)

static int nSum; // FindID element number, total number of graph nodes

D22. Defining the structure of the rules (rules, i.e. wrongly written word recognition patterns);

static int nRuleClass；

static vector<RuleClassType>RuleClass；

The program of the invention is run, performing said step D3. comprising

D31. Segmenting words of a sentence to be checked and marking words at each position;

an initial state, Status [ i ] ═ 0(1< ═ i < ═ n);

D32. sequentially scanning the words W in the sentence S to be debugged_iIf the end of the sentence S is reached, quitting error checking and turning to D37, otherwise, turning to D33;

D33. w in the sentence "S" to be debugged_iMatching with the wrongly recognized character recognition mode index, and if the matching is successful, putting the matching result into a temporary array (vecTempResult);

D34. then taking intersection from the result in the temporary array (vecTempResult), judging whether the number of the matches is equal to the length of the rule, and putting the index numbers of the rule with the same length into the final array vecResult:

D35. sequentially traversing each rule in the array vecResult to see whether the sequence of the matched rules is consistent with the rules, and if so, indicating that the matched rules are effective

the matching process is as follows:

NOTICAIN (S, < | all interrogatories >) -TRUE

MATCHED (S, < that > <! Do not doubt >) -TRUE

D36. matching is successful, if the back-piece (referring to the back-piece of the data structure in the computer language) is MARK, Status [ i ] of the current target word is marked as 1 to indicate that the word has an error, and if the back-piece is REWRITE, Status [ i ] of the current target word is marked as 2 to indicate that the word has an error and the word is replaced by a correct word in the back-piece of the wrongly recognized character recognition mode.

D37. And outputting a debugging result, and finishing the current sentence debugging.

The system of the invention implements the method, realizes the definition of the wrongly written character recognition mode and the establishment of the index through a program, and automatically corrects and corrects the error of the text to be checked by utilizing the wrongly written character recognition mode index; the index structure is established based on the flexibility of the graph storage structure to realize breadth-first search and depth-first search of data, so that a perfect database (matching library) is constructed, and the accuracy of wrongly-written character recognition is improved.

In particular, according to the embodiments of the present disclosure, the structure described in the drawings (logic block diagram) referred to may be implemented as a computer software program, for example, the above-disclosed embodiment 2 includes a computer program product as a computer program carried on a computer readable medium, the computer program containing codes for implementing the procedures shown in the structure of fig. 1.

Constructing the wrongly written or mispronounced word recognition system based on pattern matching through a program; the programming languages used to construct the system include an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The system for recognizing wrongly written words based on pattern matching is constructed as program code that can be completely executed on a user computer/smart mobile terminal (e.g., mobile phone, pad, etc.), partially executed on the user computer/smart mobile terminal (e.g., mobile phone, pad, etc.), executed as a stand-alone software package, partially executed on the user computer/smart mobile terminal (e.g., mobile phone, pad, etc.) and partially executed on a remote computer, or completely executed on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer or the intelligent mobile terminal through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the internet using an internet service provider).

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications and substitutions do not depart from the spirit and scope of the present invention, and they should be construed as being included in the following claims and description.

Claims

1. A wrongly-written character recognition method based on pattern matching is characterized in that: the method comprises the following steps:

s3, automatically checking and correcting the error of the text to be checked through the index of the wrongly written character recognition mode;

s1, establishing a wrongly-written or mispronounced character recognition mode according to a grammatical structure and semantic restriction characteristics of Chinese, wherein the wrongly-written or mispronounced character recognition mode comprises the steps of setting recognition matching conditions and associating semantic operations as recognition rules to form the wrongly-written or mispronounced character recognition mode;

the identification matching condition is formed by matching limiting functions;

the structure of the recognition pattern includes:

rule 3: notcontinue (S, < word class1>) & MATCHED (S, < target word > < | word class 2>) → REWRITE (< target word >, < correct word >);

wherein any plurality of characters are spaced apart in the representation; and; → that the preceding matching condition is associated with the subsequent semantic operation;

the restriction function includes:

NOTINDWITH (< S >, < W | WORDClASS1>), used for judging whether the sentence "S" to be debugged ends with the target word "W" or the part of speech "WORDClASS 1", if not, returning to TRUE, otherwise, returning to FALSE;

the collocation of the restriction function is completed through a connector;

s1, setting a recognition matching condition and associating semantic operation, wherein the semantic operation comprises the following steps:

2. The method for identifying wrongly written words based on pattern matching as claimed in claim 1, wherein: s2, establishing an index of a wrongly written character recognition mode based on a graph storage structure, comprising

S21, defining a graph structure through codes;

s21, parameters in the graph structure are defined through codes.

3. The method for identifying wrongly written words based on pattern matching as claimed in claim 1, wherein: the step s3, automatically checking and correcting the error of the text to be checked through the index of the wrongly written character recognition mode, including:

s32, sequentially filtering words in the sentence to be debugged, if the end of the sentence is reached, quitting the debugging, otherwise, turning to S33;

s34, taking intersection of results in the temporary array, judging whether the number of elements successfully matched is equal to the length of a matching rule or not, and putting the rule index numbers with equal length into a final array; the length of the rule is judged by taking "&" in the rule as a divider;

s36, after the matching is successful, semantic operation is executed according to the back piece of the wrongly-written or mispronounced character recognition mode;

4. A wrongly written or mispronounced word recognition system based on pattern matching is characterized in that: the system comprises a processor and a memory, wherein the memory stores a program, and when the program is executed by the processor, the method comprises the following steps:

s1, defining a wrongly written character recognition mode according to the structural characteristics of the language;

s2, establishing an index of the wrongly written character recognition mode based on the graph storage structure;

s3, automatically debugging and correcting the text to be debugged through the established index structure;

the identification matching condition is formed by matching limiting functions; the limiting function includes

the collocation of the restriction function is completed through a connector;

in the setting, identifying and matching conditions and associating semantic operations, the semantic operations include: