CN115310432A

CN115310432A - Wrongly written character detection and correction method

Info

Publication number: CN115310432A
Application number: CN202210975544.4A
Authority: CN
Inventors: 郑海涛; 马仕镕; 李映辉; 江勇; 夏树涛; 肖喜
Original assignee: Shenzhen International Graduate School of Tsinghua University
Current assignee: Shenzhen International Graduate School of Tsinghua University
Priority date: 2022-08-15
Filing date: 2022-08-15
Publication date: 2022-11-08

Abstract

The invention discloses a method for detecting and correcting wrongly written characters, which comprises the following steps: obtaining a comparative learning model, comprising the following modules: the main module is a pre-training language model, and the auxiliary module comprises: the character-pronunciation encoding module, the character-shape encoding module and the dictionary encoding module; model training: the main module is trained by using the wrongly written or mispronounced character correction task, a contrast learning task is added, positive examples and negative examples required by word sound, character pattern and dictionary knowledge construction are respectively constructed by aiming at the word sound, the character pattern and the dictionary, and the auxiliary module is used for coding information of the word sound, the character pattern and the dictionary definition and common knowledge to guide the main module to learn the word sound, the character pattern, the word definition and the common knowledge, so that the main module contains the knowledge required by the wrongly written or mispronounced character detection and correction task; model reasoning: and only the main module is reserved for reasoning so as to ensure the reasoning efficiency of the model. The invention improves the detection and correction effects of wrongly written characters, so that wrongly written characters which are difficult to be found by the existing method can be found, and further, the wrongly written characters can be effectively corrected.

Description

Wrongly written character detection and correction method

Technical Field

The invention relates to the field of computer application, in particular to a wrongly written character detection and correction method.

Background

The wrongly-written characters detection and correction refers to a technology for automatically detecting and correcting wrongly-written characters appearing in the spelling process of the Chinese characters. In recent years, the mainstream technology has achieved a good effect by detecting and correcting wrongly written characters using a language model pre-trained on a large corpus, and in particular, a Bidirectional Encoder (BERT) based on a converter has been widely used for this task. Some recent works also introduce pronunciation and font information of Chinese characters to assist the language model to better complete the task of detecting and correcting wrongly written characters.

The most similar prior implementation scheme of the invention is based on a BERT pre-training language model, after a sentence containing wrongly-written characters is input, the language model is used for extracting the semantic features of each Chinese character in the sentence, and the pronunciation and font features of the Chinese character are extracted through other deep neural networks, the three features are fused through a multi-mode gate control fusion unit constructed based on a converter (Transformer), and finally, the sentence with the wrongly-written characters corrected is output. The method achieves an effect over previous mainstream methods in the task of wrongly-written word detection and correction.

However, the prior art still has the following disadvantages: the pre-trained language model still has insufficient ability to detect and correct wrongly written words, and a considerable portion of wrongly written words are still difficult to be found or corrected.

Disclosure of Invention

The invention aims to solve the problem of improving the capability of detecting and correcting wrongly-written characters of a pre-training language model, and provides a wrongly-written character detecting and correcting method.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention discloses a method for detecting and correcting wrongly written characters, which comprises the following steps:

s1, obtaining a comparison learning model, wherein the comparison learning model comprises the following modules: the main module is a pre-training language model, and the auxiliary module comprises: the character-pronunciation encoding module, the character-shape encoding module and the dictionary encoding module;

s2, model training: directly training a main module by using a wrongly written or mispronounced character correcting task, adding a contrast learning task, constructing positive examples and negative examples required for contrast learning respectively aiming at word sound, character patterns and dictionary knowledge, and coding information of the word sound, the character patterns and the dictionary definitions of the Chinese characters by using an auxiliary module respectively, thereby guiding the main module to learn the word sound, the character patterns, the word definitions and the common knowledge of the Chinese characters, so that the main module already contains the knowledge required by the wrongly written or mispronounced character detecting and correcting task after the training stage is finished;

s3, model reasoning: only the main module is reserved for reasoning so as to ensure the reasoning efficiency of the model.

In some embodiments, the comparative learning task in step S2 includes: the word-pronunciation comparison learning task draws the distance between characters with similar pronunciation in a model representation space and pushes the distance between characters with different pronunciations away, the word-shape comparison learning task training model can distinguish the Chinese characters with similar character patterns from the Chinese characters with dissimilar character patterns in the representation space, and the dictionary comparison learning task enhances the capability of the model in understanding word definitions and common knowledge and guides the model to be linked with the related word definitions and common knowledge when detecting and correcting spelling errors.

In some embodiments, the training process of the dictionary contrast learning task comprises the following steps:

a1, obtaining a sentence X with wrongly written characters and a correct sentence which is corresponding to the sentence and does not contain the wrongly written characters, and determining a phrase corresponding to the position of the wrongly written characters;

a2, obtaining the phrase inParaphrase sentences in dictionary as positive examples of dictionary comparison learning task

Randomly selecting N paraphrase sentences corresponding to other words from the dictionary as negative examples of the task

A3, the sentence X with the wrongly written characters obtains the representation D of each character in the corresponding sentence through the encoder of the main module ^o The paraphrase sentences of positive examples and negative examples respectively obtain the representation D of each character in the sentences through the dictionary coding module in the auxiliary module ^p And

a4: and calculating the similarity between the sentence X with the wrongly written characters and the positive example and the negative example, namely acquiring all indexes { s, s + 1., s + w } of the phrase at the position of the index s where the wrongly written characters are positioned, obtaining sentence-level representations corresponding to the sentence X with the wrongly written characters, the positive example paraphrase sentence and the negative example paraphrase in an average pooling mode, and calculating the cosine similarity as the similarity between the sentence X with the wrongly written characters and the corresponding positive example paraphrase sentence and the negative example sentence.

In some embodiments, the similarity in step A4 is represented by the following formula:

wherein w represents the number of characters contained in the phrase at the position of the wrongly written character, p represents a positive example in the comparative learning, and n _i The ith negative example in comparative learning is shown.

In some embodiments, the pronunciation comparison learning task, the glyph comparison learning task and the dictionary comparison learning task all use InfoNCE as an objective function, and the pre-training language model is a BERT pre-training language model.

In some embodiments, the training process of the pronunciation comparison learning task comprises the following steps:

b1, obtaining a sentence X containing wrongly written characters;

b2, replacing the wrongly written characters with characters similar to the pinyin to obtain a new sentence, and taking the sentence as a positive example of a word-pronunciation comparison learning task

Replacing wrongly written characters with other random Chinese characters to obtain N negative cases

B3, the sentence X with the wrongly written characters passes through an encoder of the main module to obtain the representation P of each character in the corresponding sentence ^o And constructing positive and negative examples of the task, wherein the paraphrase sentences of the positive and negative examples respectively obtain the representation P of each character in the sentence through the character-pronunciation coding module in the auxiliary module ^p And

and B4, calculating the similarity of the sentence X with the wrongly written characters and the positive examples and all the negative examples expressed at the aspect of the character pitch.

In some embodiments, the similarity in step B4 is represented by the following formula:

where s represents the location of the modified kanji.

In some embodiments, the training process of the glyph contrast learning task comprises the following steps:

c1, obtaining a sentence X containing wrongly written characters;

c2, replacing the wrongly written characters in the sentence X with the wrongly written characters with similar characters, and taking the new sentence as a positive example of a character pattern comparison learning task

In addition, N negative examples are obtained by replacing wrongly-written characters with other random Chinese characters

C3, respectively obtaining the representation V of each character in the corresponding sentence by the sentence X with the wrongly written characters through the main module encoder, the positive example and the negative example through the font encoding module in the auxiliary module ^o ，V ^p And

and C4, respectively calculating the similarity of the sentence X with the wrongly written characters and the positive example and the negative example expressed at the character tone level.

In some embodiments, the similarity in step C4 is represented by the following formula:

where s represents the location of the modified Chinese character.

The invention also discloses a computer readable storage medium, on which a computer program is stored, which, when executed by a processor, is capable of implementing the above-mentioned method for detecting and correcting a wrongly written word.

The invention has the following beneficial effects:

the invention leads the model to jointly learn the character pronunciation, character pattern and dictionary knowledge of the Chinese character by introducing the definition and common knowledge of the Chinese character in the dictionary and adding the contrast learning task in the model training stage, and can enhance the capability of the pre-training language model to detect and correct the wrongly written character, thereby improving the detection and correction effect of the wrongly written character, finding the wrongly written character which is difficult to be found by the existing method, and further effectively correcting the wrongly written character.

Drawings

FIG. 1 is a flow chart of a method for detecting and correcting a wrongly written word according to an embodiment of the present invention.

Detailed Description

The embodiments of the present invention will be described in detail below. It should be emphasized that the following description is merely exemplary in nature and is not intended to limit the scope of the invention or its application.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the embodiments of the present invention, "a plurality" means two or more unless specifically limited otherwise.

The wrongly written characters are corrected and detected, and not only pronunciation and visual information of the Chinese characters need to be considered, but also semantic and common knowledge contained in the Chinese characters often need to be considered. The masking pre-training strategy adopted by the existing language model in the pre-training determines that the learned semantic knowledge of the language model is more in the collocation among Chinese characters rather than the definition and common sense of the connotation of the Chinese characters. The prior art does not try to introduce word definition and common knowledge to improve the task effect of detecting and correcting wrongly written characters, and the correct wrongly written characters are difficult to find or modify.

The embodiment of the invention mainly aims to provide a wrongly-written or mispronounced character detection and correction method combining multi-mode information of Chinese character pronunciation and character pattern and dictionary knowledge, wherein two key points are mainly involved, and firstly, the definition and common knowledge of Chinese characters in a dictionary are introduced to enhance the capacity of detecting and correcting wrongly-written or mispronounced characters by a pre-training language model; 2. a unified contrast learning framework is used for solving the problem of wrongly-written character detection and correction, and a contrast learning task is added in a model training stage to guide a model to jointly learn the pronunciation, the font and the dictionary knowledge of the Chinese characters. The dictionary contains the definition and common knowledge of words and can be used for enhancing the detection and correction effect of wrongly written words.

The method for detecting and correcting wrongly written characters provided by the embodiment of the invention is shown in fig. 1, and the method is a unified comparative learning framework and comprises a main module and three auxiliary modules: the main module is a pre-training language model, and the pre-training language model is specifically a BERT pre-training language model; the auxiliary module comprises a character pronunciation coding module, a character pattern coding module and a dictionary coding module. In the model training stage, in addition to directly training the main module by using the wrongly written or mispronounced character correction task, three comparison learning tasks are added, positive examples and negative examples required by comparison learning are respectively constructed aiming at the pronunciation, the font and the dictionary knowledge, and the three auxiliary modules are used for respectively coding the pronunciation, the font and the dictionary paraphrase information of the Chinese characters, so that the main module is guided to learn the pronunciation, the font, the word definition, the common knowledge and other knowledge of the Chinese characters. After the training stage is finished, the main module already contains various knowledge required by the wrongly-written character detection and correction task, so that only the main module is reserved in the model reasoning stage, and the reasoning efficiency of the model is ensured.

Example (b):

s1, obtaining a comparison learning model:

the comparative learning model comprises the following modules: the main module is a pre-training language model; the auxiliary module comprises: the character-pronunciation encoding module, the character-shape encoding module and the dictionary encoding module.

The method for detecting and correcting wrongly written characters provided by the embodiment of the invention is divided into two parts, namely model training and model reasoning.

S2, model training:

the model training part consists of four tasks: the method comprises the following steps of correcting wrongly written characters, comparing and learning words and pronunciations, comparing and learning words and dictionaries, and specifically describing four tasks of a model training part as follows:

wrongly written character correcting training task

Given a sentence X = { X ] of length n containing wrongly written words ₁ ,x ₂ ,…,x _n In which x _i Which is the ith character of sentence X. The main module, BERT pre-training language model, encodes the sentence and predicts the corrected sentence Y = { Y } with wrong characters ₁ ,y ₂ ,…,y _n In which y is _i The ith character of sentence Y. In addition, the corresponding correct sentence L = { L ] without wrongly written characters is given ₁ ,l ₂ ,…,l _n In which l _i For the ith character of the sentence L, the objective function of the wrongly written character correction training task needs to maximize the probability that the predicted sentence is consistent with the correct sentence after correcting the wrongly written character, and the expression is as follows:

where p represents the conditional probability.

Word and pronunciation comparison learning task

The pronunciation of the Chinese characters is expressed by pinyin, and in order to enable a model to better detect and correct wrongly written characters with similar pronunciation, a character-pronunciation comparison learning task is provided, which aims to shorten the distance of characters with similar pronunciation in a model expression space and push away the distance between characters with different pronunciations. When dealing with spelling errors, the models will preferentially imagine their corresponding similarly pronounced characters.

Specifically, for a sentence X containing wrongly-written characters in a training sample, the wrongly-written characters are replaced by characters similar to pinyin to obtain a new sentence, and the sentence is used as a positive example of a pronunciation comparison learning task

The original sentence X passesThe encoder of the main module can obtain a representation P of each character in the corresponding sentence ^o The positive example and the negative example of the task are constructed, and the representation P of each character in the corresponding sentence is obtained through the pronunciation coding module in the auxiliary module ^p And

respectively calculating the similarity of the original sentence, the positive examples and all the negative examples expressed in the level of the character pronunciation, wherein the expression is as follows:

where s represents the location of the modified kanji character,

representing a vector or matrix obtained by transposing any of the representative vectors or matrices P. And only the expression vector of the position of the modified Chinese character is considered when the similarity is calculated.

The InfonCE function is used as an optimization objective function of the pronunciation comparison learning task. The purpose of this function is to pull the distance between the original sentence and the positive example and pull the distance between the original sentence and the negative example in the representation space of the main module. The concrete form is as follows:

font comparison learning task

Similar to the character-sound comparison learning task, the character-shape comparison learning task is provided to train the model to distinguish the characters with similar character shapes from the characters with dissimilar character shapes in the representation space, so that the capability of detecting and correcting the characters with similar and wrong shapes by the model is improved.

In particular toIn addition, basically consistent with the above paragraph, for sentence X with wrongly written characters in the training set, the wrongly written characters are replaced with characters with similar character patterns, and the new sentence is used as a correct example of the character pattern comparison learning task

In addition, N negative examples are obtained by replacing wrongly written characters with other random Chinese characters

The original sentence X is passed through main module coder, positive example and negative example and passed through font coding module in auxiliary module to respectively obtain representation V of every character in correspondent sentence ^o ，V ^p And

wherein

Representing the vector or matrix obtained by transposing any expression vector or matrix V.

InfonCE is also used as the optimization objective function of the glyph contrast learning task:

dictionary comparison learning task

When wrongly written characters cannot be corrected by depending on the pronunciation and the font information of the Chinese characters, the word definition and the common knowledge contained in the dictionary are very useful for correcting spelling errors. Dictionary-contrast learning tasks are presented to enhance the ability of models to understand word meanings and related common sense and to guide the models in correlating with related word meanings and making appropriate corrections when spelling errors are found.

Specifically, a sentence X with wrongly written characters in the training sample is given, a correct sentence without wrongly written characters corresponding to the sentence is found, and a phrase corresponding to the position of the wrongly written characters is found from the correct sentence. Looking up the dictionary to obtain the paraphrase of the phrase in the dictionary as the positive example of the dictionary comparison learning task

In addition, N paraphrase sentences corresponding to other words are randomly selected from the dictionary to serve as negative examples of the task

The original sentence X is passed through the coder of main module to obtain representation Do of every character in the correspondent sentence, and the paraphrase sentences of positive example and negative example are passed through the dictionary coding module in auxiliary module to respectively obtain representation D of every character in the sentence ^p And

when the similarity between an original sentence and positive examples and negative examples is calculated, all indexes { s, s + 1.., s + w } of a phrase at the position of an index s where wrongly written characters are located are obtained, wherein w represents the number of characters contained in the phrase at the position of the wrongly written characters, p represents positive examples in comparison learning, and n represents negative examples _i The ith negative example in comparative learning is shown. After sentence-level expressions corresponding to an original sentence, a positive example paraphrase sentence and a negative example paraphrase sentence are obtained in an average pooling mode, cosine similarity is calculated to serve as similarity between the original sentence and the corresponding positive and negative example paraphrase sentences, and the expression is as follows:

InfonCE is still used as the objective function for the dictionary-contrast learning task:

the four training tasks can be synchronously applied to the training process of the model, are used for guiding the model to learn the pronunciation, the font and the dictionary knowledge of the Chinese characters, and can complete the detection and correction tasks of wrongly-written characters. The objective functions to be optimized of all tasks are weighted and summed to be used as the total objective function of model training, and the expression is as follows:

in the above formula, λ ₁ ,λ ₂ ,λ ₃ ,λ ₄ And respectively representing the weights of the target functions corresponding to the four adjustable training tasks in the total target function.

S3, model reasoning:

in the model reasoning phase, only the main modules which already contain various required knowledge are reserved for detecting and correcting wrongly written characters. Similar to the task of correcting wrongly written words, a sentence X = { X ] containing wrongly written words is input to the model ₁ ,x ₂ ,...,x _n The model predicts and outputs a sentence Y = { Y) after the correction of the wrongly written characters is finished ₁ ,y ₂ ,...,y _n And (5) regarding Chinese characters different from the Chinese characters in the original sentence as detected wrongly-written characters, so as to complete the detection and correction of wrongly-written characters.

The experimental effect ratio of the wrongly written character detection and correction method and the existing method on the wrongly written character detection and correction task is shown in table 1:

TABLE 1

Among them, SIGHAN13, SIGHAN14, and SIGHAN15 are three widely used sets of wrongly-written-word detection and correction evaluation data, and LEAD is a method for detecting and correcting wrongly-written-words provided by the present invention. The table shows the accuracy, recall rate and F1 index of the wrongly-written character detection and correction respectively, wherein the F1 index is the weighted average of the accuracy and recall rate, and can reflect the detection and correction effect of the wrongly-written character most comprehensively. The experimental results shown in table 1 indicate that, on three evaluation data sets in which the task of detecting and correcting the wrongly written characters is widely used, the F1 indexes of the detection and correction of the wrongly written characters by the method of the present invention exceed all existing methods, wherein, on the data sets of SIGHAN13 and SIGHAN15, the accuracy, recall rate and F1 indexes of the correction of the wrongly written characters by the method of the present invention completely exceed the most advanced methods in the prior art, which indicates that the embodiment of the present invention achieves experimental effects exceeding all existing technologies.

The method for detecting and correcting the wrongly-written characters provided by the embodiment of the invention enhances the capability of detecting and correcting the wrongly-written characters of the pre-training language model, and the effect of the method for detecting and correcting the wrongly-written characters on the task of detecting and correcting the wrongly-written characters is better than that of the existing various methods on the premise of not reducing the reasoning efficiency.

Examples of the experiments

Take the following sentences with wrongly written words as an example: "meet each and every difficulty and overcome it. "in this sentence," difficulty and difficulty "should be replaced with" difficulty ". However, the difference between the "fixed" and "difficult" characters is large in both the character form and the pronunciation, and the "fixed difficult" is replaced by the "hard difficult" with the character pronunciation closer without the assistance of other knowledge. The method provided by the embodiment of the invention introduces paraphrase information of words in the dictionary in the model training stage, so that the model trained by the method can be linked to the paraphrases for 'difficulty' in the dictionary in the process of correcting the sentence: "difficulty" (the name) problem or obstacle which is difficult to solve in work and life, overcome- "and then link to the sentence and appear" overcome ", then can make correct judgement.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing is a further detailed description of the invention in connection with specific/preferred embodiments and it is not intended to limit the invention to the specific embodiments described. It will be apparent to those skilled in the art that numerous alterations and modifications can be made to the described embodiments without departing from the inventive concepts herein, and such alterations and modifications are to be considered as within the scope of the invention. In the description herein, references to the description of the term "one embodiment," "some embodiments," "preferred embodiments," "an example," "a specific example," or "some examples" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Those skilled in the art will be able to combine and combine features of different embodiments or examples and features of different embodiments or examples described in this specification without contradiction. Although embodiments of the present invention and its advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the scope of the application.

Claims

1. A method for detecting and correcting wrongly written characters is characterized by comprising the following steps:

s3, model reasoning: and only the main module is reserved for reasoning so as to ensure the reasoning efficiency of the model.

2. The method for detecting and correcting wrongly written words as claimed in claim 1, wherein the step S2 of comparing and learning includes: the word-pronunciation comparison learning task draws the distance between characters with similar pronunciation in a model representation space and pushes the distance between characters with different pronunciations away, the word-shape comparison learning task training model can distinguish the Chinese characters with similar character patterns from the Chinese characters with dissimilar character patterns in the representation space, and the dictionary comparison learning task enhances the capability of the model in understanding word definitions and common knowledge and guides the model to be linked with the related word definitions and common knowledge when detecting and correcting spelling errors.

3. The method for detecting and correcting wrongly written words as claimed in claim 2, wherein the training process of the dictionary contrast learning task comprises the following steps:

a2, obtaining the paraphrase of the phrase in the dictionary as the positive example of the dictionary comparison learning task

Randomly selecting other words in a dictionaryN paraphrasing sentences corresponding to the language are used as negative examples of the task

A3, the sentence X with the wrongly written characters obtains the representation D of each character in the corresponding sentence through an encoder of the main module ^o The paraphrase sentences of positive examples and negative examples respectively obtain the representation D of each character in the sentences through the dictionary coding module in the auxiliary module ^p And

4. The method for detecting and correcting a wrongly written word as claimed in claim 3, wherein the similarity in step A4 is expressed by the following formula:

5. The method as claimed in claim 2, wherein the word-pronunciation comparison learning task, the word-shape comparison learning task and the dictionary comparison learning task all use InfoNCE as an objective function, and the pre-training language model is a BERT pre-training language model.

6. The method for detecting and correcting wrongly written words as claimed in claim 2, wherein the training process of the pronunciation-contrast learning task comprises the steps of:

b1, obtaining a sentence X containing wrongly written characters;

b2, replacing the wrongly written characters with characters similar to pinyin to obtain a new sentence, and taking the sentence as a positive example of a pronunciation comparison learning task

Replacing wrongly-written characters with other random Chinese characters to obtain N negative examples

7. The method for detecting and correcting a wrongly written word as claimed in claim 6, wherein the similarity in step B4 is expressed by the following formula:

where s represents the location of the modified Chinese character.

8. The method for detecting and correcting wrongly written characters as claimed in claim 2, wherein the training process of the glyph comparison learning task comprises the steps of:

c1, obtaining a sentence X containing wrongly written characters;

c2, replacing the wrongly written characters in the sentence X with the wrongly written characters with similar characters and taking the new sentence as a positive example of the character comparison learning task

C3, the sentence X with the wrongly written characters respectively obtains the representation V of each character in the corresponding sentence through the main module encoder, the positive example and the negative example through the character pattern encoding module in the auxiliary module ^o ，V ^p And

and C4, respectively calculating the similarity of the sentence X with the wrongly written characters and the positive example and the negative example expressed at the word tone level.

9. The method for detecting and correcting a wrongly written word as claimed in claim 8, wherein the similarity in step C4 is expressed by the following formula:

where s represents the location of the modified Chinese character.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, is adapted to carry out the method for detecting and correcting wrongly written words as claimed in any one of the claims 1 to 9.