CN112836496A

CN112836496A - Text error correction method based on BERT and feedforward neural network

Info

Publication number: CN112836496A
Application number: CN202110098015.6A
Authority: CN
Inventors: 潘法昱; 曹斌; 於其之
Original assignee: Zhejiang University of Technology ZJUT; Zhejiang Lab
Current assignee: Zhejiang University of Technology ZJUT; Zhejiang Lab
Priority date: 2021-01-25
Filing date: 2021-01-25
Publication date: 2021-05-25
Anticipated expiration: 2041-01-25
Also published as: CN112836496B

Abstract

The invention discloses a text error correction method based on a BERT and a feedforward neural network, which can quickly and accurately identify and correct errors of large-scale linguistic data. The method comprises the steps of preprocessing a text, performing semantic coding on the text by using BERT, judging whether the text is correct or not by using the integral semantic information of the text, finding out the specific position where an error occurs in the text by using a sequence labeling method for the text judged to be the error, and finally generating the corresponding correct text by using a feedforward neural network in combination with the wrong context information. The text error correction method constructed by the invention has the characteristics of high reasoning speed and good interpretability.

Description

Text error correction method based on BERT and feedforward neural network

Technical Field

The invention belongs to the field of artificial intelligence and natural language processing, and particularly relates to a text error correction method based on BERT and a feedforward neural network.

Technical Field

Text error correction is a natural language processing technology for correcting error contents in texts, and specifically comprises error correction objects such as spelling error correction, grammar error correction and semantic pragmatic error correction in characteristic scenes. The spelling error correction is characterized in that the length of a text is not changed, and only wrongly written characters appearing in the text are corrected one by one; the grammar error correction and the semantic language error correction need to process errors such as multi-word errors, few-word errors, word errors and word sequence errors in the text, and the length of the text can be changed.

In recent years, large-scale deep pre-training language models such as BERT promote rapid development of the natural language processing field, so that a better initial text semantic representation can be obtained when a specific text processing task is carried out, and the time and cost required by model convergence are reduced.

The traditional text error correction mainly adopts a method based on rules or a translation model, wherein the method based on the rules mainly depends on manual definition of a replacement word dictionary and can only correct specific errors; text error correction using translation models is currently the mainstream method, and neural network-based translation models have been used for error correction instead of statistical-based translation models, which solves text error correction as a translation problem from a wrong sentence to a correct sentence, although it is effective and smooth, but requires a large amount of training data, and there is a problem of long time consumption in use. In addition, if only the spelling errors are corrected, the current method mainly adopts a sequence marking method, can quickly correct the wrongly written characters, but is not suitable for correcting other errors.

Disclosure of Invention

The invention aims to provide a text error correction method based on BERT and a feedforward neural network, aiming at the defects of the prior art. The invention adopts a simple model to identify and correct various errors in the text.

The purpose of the invention is realized by the following technical scheme: a text error correction method based on BERT and feedforward neural network comprises the following steps:

1) and preprocessing the text error correction corpus data.

2) And (2) carrying out BERT coding on the input text preprocessed in the step 1) to obtain feature representation and semantic representation.

3) And judging whether the text is a correct text or not based on the semantic representation of the input text obtained in the step 2).

4) And detecting the position of an error in the text based on the input text feature representation obtained in the step 2) and the judgment result in the step 3).

5) And generating a correct text corresponding to the error text based on the error position found in the step 4).

Further, in the step 1), the data preprocessing method comprises:

1.1) carrying out preprocessing operation on the acquired text data.

1.2) performing word segmentation on the text, and if the text is Chinese, performing word segmentation according to the unit of characters; if the English is English, word segmentation is carried out according to the word piece form.

1.3) add a special character "[ CLS ]" at the beginning of the text and a special character "[ SEP ]" at the end.

1.4) if the text is training data, calculating a text wrong label, a character wrong type label and a label of a wrong corresponding correct text by comparing the segmented source character string and the segmented target character string.

Further, in the step 2), based on the BERT encoded text representation:

2.1) pre-training word vectors and position vectors by using BERT, embedding words and positions into an input text, and obtaining a text preliminary vector representation:

wherein E is_wordIs a word-embedding matrix, E_posIs a position-embedding matrix, where the size of the word-embedding matrix is [ V, E ]]V is the word list size defined by BERT, E is the embedding dimension, and the size of the position embedding matrix is [512, E]。

2.2) Using L layer Transformer Module in BERTObtaining semantic feature representations for each character

The calculation method comprises the following steps:

H^l＝Transformer(H^l-1)

2.3) Using "[ CLS]Character corresponding feature

Obtaining the text overall semantic representation:

further, in the step 3), it is determined whether the text is a correct text:

3.1) selecting the text integral semantic representation c output by the BERT in the step 2.3) as the characteristic for judging the text error.

3.2) using a feedforward neural network to map c to a numerical value, and then using a sigmoid function to calculate the probability that the predicted text is incorrect:

P^rw＝sigmoid(W^rwc+b^rw)

wherein, W^rwAnd b^rwAre weight parameters for deep learning model learning.

3.3) adding P^rwAnd manually set threshold

And comparing to judge whether the text is wrong or not, and if the text is smaller than the threshold value, determining that the text is correct.

And 3.4) for the input text judged to be correct, directly outputting the input text as an error correction result without performing subsequent error correction operation.

3.5) during model training, calculating the loss value of the text to misjudgment by using a binary cross entropy loss function:

Loss^rw＝BCELoss(P^rw,y^rw)

wherein, y^rwThe true value of the text error is obtained by comparing whether the source character string and the target character string are equal or not.

Further, in the step 4), the position of the error in the text is detected:

4.1) selecting the characteristic representation H of each word output by BERT in step 2.2)^LAs a feature of the error type detection.

4.2) defining the type of each character as correct, redundant, correct but later missing content and wording error, wherein each character corresponds to one of the four types, and the corresponding operation labels are respectively reserved, deleted, added and replaced later.

4.3) carrying out sequence labeling on the input text by using a feedforward neural network and combining a softmax function, and detecting the operation required to be carried out on each character:

wherein the content of the first and second substances,

respectively representing the probability of one character to be reserved, deleted, added and replaced, and taking the operation corresponding to the maximum probability as the detection result.

4.4) after obtaining the predicted tag sequence of the input text, obtaining the error position pos in the text as (s ', e') based on the rule: for a continuous deleted label or a continuous replaced label, the derived error starting position s 'and the derived error ending position e' are the previous position and the next position of the position interval [ s, e ] where the continuous label is located, i.e. s '-s-1, e' -e + 1; for each tag in the sequence of tags to be added later, the derived error start position s 'is the position s of the tag itself, and the error end position e' is a position after the start position, i.e. e '═ s' + 1.

4.5) for the input text of which the prediction sequence labels are all formed by the reserved labels, the subsequent error correction operation is not carried out, and the input text is directly output as an error correction result.

4.6) during model training, calculating the loss value of the sequence label by using a cross entropy loss function:

wherein the content of the first and second substances,

is the real operation tag corresponding to the i1 th character.

Further, in the step 5), a correct text corresponding to the wrong text is generated:

5.1) all character feature vectors H obtained from step 2.2) depending on the error position obtained in step 4.4)^LTruncating the input feature vector for correct text generation:

where s and e are the starting and ending positions of an error,

is the above information that precedes the error,

for context information after error, h_midAs the message of the error itselfWhen the error start position and the error end position are adjacent to each other, h_midA special vector h for self-learning of a model_empOtherwise h_midIs based on the average of the vectors between the error start and end positions

Obtained h_infoThe number is equal to the number of errors detected in step 4.4).

5.2) defining a deep learning model learned position embedding matrix E'_posThe system is used for controlling characters corresponding to different positions of a generated text and consists of MAX _ LEN vectors with dimension being POS _ DIM, and MAX _ LEN represents the maximum length of the generated text. In correcting each error, use the j dimension of the matrix as vector E 'of POS _ DIM'_pos(j) As the position information when the j-th word is generated.

5.3) extracting correct text features by utilizing a multilayer feedforward neural network and combining error information and position embedding vectors:

h_i3,j＝MLP([h_info,E′_pos(j)])

wherein h is_i3,jIs the characteristic that the i3 th error in the text corresponds to the j' th correct word generated.

5.4) combining a softmax function, mapping the error characteristic representation into a dictionary size dimension defined by BERT, and taking the word with the highest probability in the dictionary as the generated jth word:

the weight parameter used by the last layer of the multi-layer feedforward network is a word embedding matrix E in BERT_wordThe transposing of (1).

5.5) during model training and use, MAX _ LEN words of corrected text are generated for an error, but only the text before the special character 'EOP' appears in the generated text is intercepted as a result, and if the special character 'EOP' is not generated, all the generated text is taken as a result.

5.6) using the error position detected in the step 4.4) and combining the generated correct text to replace the error content in the input text, and deleting the added special characters "[ CLS ]" and "[ SEP ]", thereby obtaining the final error correction output.

5.7) during model training, calculating a text generation loss value by using a cross entropy loss function:

wherein m is the number of errors in a text; k is a radical of_i3Is the length of the i3 th error corresponding to the correct text, including the end character "[ EOP ]]”；

Is the jth word in the correct text that is corresponding to the i3 th error.

The invention has the beneficial effects that: the method uses the sequence label for the processing object of text error correction, so that various types of errors can be quickly and accurately corrected by using a sequence label method, and the method is not limited to spelling error correction; the invention carries out text error correction based on BERT, can carry out error correction on wrong texts in large-scale linguistic data and generate correct texts, simultaneously improves the problem of long time consumption of the traditional error correction method based on a translation model, and optimizes the serial process of text error correction for generating correct sentences word by word into a parallel process of error correction only aiming at wrong contents by using a feedforward neural network.

Drawings

FIG. 1 is a flow chart of a proposed method of the present invention;

FIG. 2 is a diagram of a text error correction model architecture designed by the present invention;

FIG. 3 is a diagram of the internal structure of the BERT model employed in the present invention.

Detailed Description

The method of the present invention will be described in further detail with reference to the accompanying drawings and specific examples.

The text error correction method based on the BERT and the feedforward neural network uses a deep learning method and combines a pre-training language model BERT, and semantic information such as word part of speech, syntactic structure and the like can be effectively extracted from a text, so that the feature representation of each word in the context is obtained. In addition, three different feedforward neural networks designed by the invention can respectively perform the functions of text misjudgment, error position detection and correct text generation by utilizing the extracted characteristic information, and organically combine all modules together, thus realizing the purpose of text error correction; as shown in fig. 1, the method comprises the following steps:

1. data pre-processing

For text error correction corpus data, firstly performing word segmentation operation on each text; particularly, for Chinese text, segmentation is carried out according to the unit of a character; for English words, besides segmenting each word according to a blank space, each word is segmented into a word piece form by using a statistical result of large-scale English corpus. And after word segmentation, setting a stop word dictionary according to actual requirements, and filtering stop words in the text. In addition, it is necessary to add a special character "[ CLS ]" indicating the start of the text at the beginning of each text and a special character "[ SEP ]" indicating the end of the text at the end of the text.

For the training data set, the input text X ═ X needs to be input by comparison₁,x₂,…,x_nI1 ═ 1 to n and target text T ═ T₁,t₂,…,t_n′Obtaining three target values of model training, i2 being 1-n', respectively: label y whether text is correct or wrong^rwTag sequence for marking error type

i1 ═ 1 to n and corrected target text

i3∈[1,m](ii) a Wherein:

label y whether text is correct or wrong_rwE, e {0,1}, judging whether X and T are equal by a calculation method, and if the X and T are equal, indicating that the text is correct, taking the value as 0; otherwise, the sentence is wrong, and the value is 1.

Tag sequence elements for marking error types

Respectively indicating that the ith 1 character word in the input text X needs to be reserved (0), deleted (1), added with characters later (2) and replaced (3); the acquisition method comprises the steps of comparing X and T, analyzing an operation sequence converted from a text X to T by utilizing a sequence Matcher function in a Python self-contained difflib, wherein each operation sequence consists of 5 parts, namely an operation type and a starting position s of the X_XEnd position e of X_XStart position s of T_TAnd end position e of T_TIs shown to be

By the operation change to

The function relates to four operation types, namely 'equal', 'delete', 'insert' and 'place', which correspond to four labels. If the operation type is "equivalent", it means in X

Word and T in

Are the same, then handle

Are all set to 0, indicating that these words in X are to be retained; if the operation type is "delete", it means in X

If it needs to be deleted, then handle

Are all set to 1, indicating that these words in X are to be deleted; if the operation type is "insert", it indicates the s-th in X_XThe s-th word needs to be inserted into T before the word_T～e_TWord, this time s_XAnd e_XEqual, operation pointing to the(s) th_X-1) words and an s_XIn the middle of a word, then handle

Set to 2, indicating a need to

Adding a word after the word; if the operation type is "replace", it represents the s-th in X_X～e_XThe word needs to be replaced by the s-th in T_T～e_TWord, then handle

Are set to 3 indicating that these words in X need to be replaced.

Wherein the corrected target text

The obtaining method needs to use the operation sequence, and if m operations which are not in the 'equivalent' type exist in the operation sequence, the error correction text corresponding to m targets

If the operation type is "insert" or "replace", then

Is s in T_T～(e_T-1) words plus a special end character "[ EOP ]]"; if the operation type is "delete", then

For special characters "[ NONE]"and" [ EOP]”。Y^slAnd

have corresponding relation between them, which is reflected on deleting, adding and replacing labels and corresponding error correcting content except 'equivalent' operation. For each target text

Wherein the last character

Are all special characters representing the end "[ EOP ]]”。

2. BERT coding

The operation of BERT coding the input text X is mainly divided into two steps:

the first step is word embedding, which converts each word in X into a vector representation, using the two embedding matrices defined by BERT, respectively the word embedding matrix E_wordAnd a position embedding matrix E_pos(ii) a Wherein the size of the word embedding matrix is [ V, E ]]V is the vocabulary size defined by BERT, E is the embedding dimension; the size of the position embedding matrix is [512, E ]]. The calculation method for word embedding in BERT comprises the following steps:

the second step is a self-attention coding expression module based on the Transfomer, which is composed of L-layer Transfomer modules, each layer is calculated in the same way, and the output of the previous layer is used as the input:

H^l＝Transformer(H^l-1)

wherein L is 1 to L; the input of the first layer is the text after embedding words

i1 is 1 to n. Finally, the output of the BERT L layer transform module is taken

As a feature representation of the input text X.

In addition, BERT pre-trains the first input character "[ CLS ]]"corresponding output

The feature expression vector c as the prediction judgment of the next sentence can be used as the semantic expression of the whole input text X, and the calculation method is as follows:

wherein, tanh is an activation function, W^cParameter matrices learned for the model, b^cBias vectors learned for the model.

3. Text error judgment

Judging whether the input text X is a correct text, if so, not performing subsequent error position detection and text error correction operation, wherein the judgment method is to utilize the semantic representation c of the input text obtained in the step 2 and combine a feedforward neural network to perform two classification tasks of error judgment:

P^rw＝sigmoid(W^rwc+b^rw)

wherein, the output result of the two classification tasks is the error probability P of the input text^rw∈[0,1]If it is greater than the set threshold value

The input text is considered to be wrong, otherwise, the judgment is correct; w^rwAnd b^rwIs a weight parameter for deep learning model learning; sigmoid is an activation function.

During model training, a binary cross entropy loss function BCELoss is used for calculating a loss value of text misjudgment:

Loss^rw＝BCELoss(P^rw,y^rw)

4. error location detection

Test articleAnd (4) determining which positions in the text are wrong, and labeling the corresponding positions according to the error types. Processing the feature representation H of the input text obtained in the step 2 by a feedforward neural network by adopting a sequence labeling method^L：

Wherein i1 is 1 to n; w^slParameter matrices learned for the model, b^slA bias vector for model learning;

is a vector composed of 4 elements, which respectively represent the probability of reserving, deleting, adding and replacing the i1 th word in the input text, and corresponds to

Taking the operation corresponding to the value with the maximum probability as the result of sequence labeling; softmax is the activation function.

During model training, the cross entropy loss function CrossEntropLosss is used to calculate the loss value of the sequence annotation:

after the feedforward neural network is used for sequence marking, and the operation required to be carried out on each word is detected, the starting position and the ending position of the error in the text can be calculated. Suppose that { X ] is input among n operations obtained after inputting text X of n characters_s,…,x_eThe corresponding operations of (e is more than or equal to s) are the same and are not reserved operations; then if the operation is delete, define the error position as pos ═ s-1, e + 1; wherein the content of the first and second substances,s-1 and e +1 represent the starting and ending positions of the error, respectively; if the operation is replacement, defining the error position as pos ═ s-1, e + 1; if the operation is added later, e-s +1 error locations, pos respectively, are defined_i4＝(s+i4-1,s+i4),i4∈[1,e-s+1]Similarly, s + i4-1 and s + i4 represent the starting and ending positions of the error. After the position was calculated as described above, pos ═ s ', e' was assigned for convenience.

In the invention, the defined error starting position is moved forward by one unit than the actual position, and the defined error ending position is moved backward by one unit than the actual position, so that the input data for generating the correct text can be conveniently acquired, and the processing flows of various error types can be unified. For multiple word errors and word errors, deletion and replacement operations need to be carried out on the multiple word errors and the word errors, the positions of errors can be directly defined on wrong words in a text, and for few word errors, missing words need to be inserted between two words, the two words have no errors, so that the range of the positions of errors needs to be expanded by one word, namely the initial position is the wrong previous word, and the end position is the wrong next word.

5. Correct text generation

After the error position is known, based on the context semantic information of the error, generating correct texts corresponding to the errors in parallel by a feedforward neural network method; the method needs to be carried out in three steps:

firstly, intercepting the text representation H obtained in the step 2 according to the error position obtained in the step 4^L. For an error position pos in the text being (s ', e'), the states of the error starting position s 'and the error ending position e' are taken out first

And

then taking out the intermediate state vector

And taking the average value of the vectors to obtain the characteristic vector of the error:

where mean represents the averaging. In this step, if few word errors are processed, s '+ 1 ═ e', i.e. there is no intermediate state vector, for which an initial value is set to a random value, and the variable h optimized in the model training is set_empInstead of h in this case_mid. Finally, the three vectors are spliced together to serve as context information of error content:

and secondly, extracting correct text features by utilizing a multilayer feedforward neural network and combining error information and position embedded vectors. Due to the need to utilize an erroneous context information h_infoRapid generation of correct text composed of multiple words or special characters, so a new position embedding matrix E 'is set'_posFor distinguishing the content of each word generated, the embedded dimension of the matrix is POS _ DIM. Feature h extraction using two-layer feedforward neural network_i3,jAn activation function MLP and normalization operations are also set in each layer:

h_i3,j＝MLP([h_info,E′_pos(j)])

wherein i3 refers to the i3 th error in the text, and j refers to the j th word generated by the i3 th error target. When the model is actually used for error correction, the correct text corresponding to the error is not known to be formed by a plurality of words, so that MAX _ LEN words are uniformly limited to be generated during training and using the model, and the words generated before the special character 'EOP' in the generated text are intercepted as an error correction result according to the position where the special character 'EOP' appears. If no special character "[ EOP ]" is generated, all generated text is used as a result.

Third, using a feedforward neural network to derive the feature h_i3,jExtract the correct text of the target

Wherein j is 1 to k_i3(ii) a Formula (a) to_i,jMapping the word to a vector with the size of V, and carrying out normalization operation, namely obtaining a probability distribution of the jth word generated when the ith 2 th error is corrected in a BERT definition dictionary, and taking the word corresponding to the value with the highest probability in the dictionary as the jth word generated.

Through the operation, the correct text corresponding to the error detected in the step 4 can be obtained, and the original text can be taken back to be modified by combining the position of the error in the input text, so that the text after error correction can be obtained.

During model training, the cross entropy loss function crossentryploss is used to calculate the loss value of text generation:

where m is the number of errors corresponding to a sample, k_i3Corresponding to correct text for the i3 th error (including the special character "[ SOP ]]") length, MAX _ LEN words are output for each error during training, but only the top k is calculated when calculating the penalty value_i3Loss value of each output.

In order to embody the universality of the invention for correcting various text error types, the text error correction method provided by the invention is explained by combining four text examples. Assuming that there is a correct input text X1 ═ protection intellectual property; an input text X2 with few word errors is "protected knowledge"; an input text with wording error X3 ═ protection of intellectual property right; an input text X4 with multiple word errors is "protected intellectual property", and how to correct the four texts is described below.

As shown in fig. 2 to 3, the method of the present embodiment comprises five steps:

step 1: preprocessing input text, performing word segmentation on Chinese text by taking a word as a unit, and adding a special character ([ CLS ] ") at the beginning of the text and a special character ([ SEP ]") at the end of the text.

The obtained processing results are X1 [ "[ CLS ]", "protection", "know", "identify", "produce", "weight", "[ SEP ]" ], X2 [ "[ CLS ]", "protection", "know", "identify", "" [ SEP ] "], X3 [" [ CLS ] "," protection "," know "," only "," produce "," weight "," [ SEP ] "], X4 [" [ CLS ] "," protection "," know "," identify "," produce "," weight "," [ SEP ] "].

It is assumed that the error correction model proposed by the present invention is trained and is in the stage of performing error correction by using the model, so that it is not necessary to calculate the target value required by the model training.

Step 2: input text is encoded by using BERT, and a basic version of Chinese pre-training BERT model is assumed to be used, wherein the word embedding dimension of the BERT model is 768, the size of a word table is 21128, and 12 layers of Transformer modules are stacked together.

The processing flow comprises the steps of embedding words and positions in texts, inputting the embedded texts into a Transformer module, taking the output of the last layer of Transformer module, and obtaining the overall characteristic representation c of each text and the semantic representation of each word in each text in the text

i1∈[1,n]Where n is the text length after the preprocessing of step 1, such as for the text X1, the text length is8。

And step 3: judging the text mismatching, inputting the text integral representation c obtained in the step 2 into a feedforward neural network, and then inputting a sigmoid function to obtain the probability P of the text mismatching^rwE (0, 1). Comparing the error probability with a manually set threshold

And comparing, and if the comparison result is larger than the threshold value, considering that the text is wrong, and otherwise, considering that the text is correct.

For example, the above four texts have prediction results of 0.1, 0.8, 0.9 and 0.8, respectively, and the threshold value set is 0.5, the first text X1 is considered to be correct, and the other three texts are wrong.

And for the text with correct prediction, directly taking the original input text as the corrected output of the model, and not performing subsequent error correction operation.

And 4, step 4: the sequence marks the location of the error. For the text containing errors detected in step 3, the specific location of the error is detected in this step by using a feed forward neural network to semantically represent each word obtained in step 2

The operation label probability required to be carried out on each word can be obtained by carrying out classification and combining with the softmax function

And finally, taking the position corresponding to the maximum probability value.

The operation types are four types including retention, deletion, addition and replacement, and their numerical values are respectively represented as 0,1, 2 and 3. For text X2, the probability assumption corresponding to the "recognition" word therein is

The type of operation that needs to be performed on it is 2, i.e. the word is added after the word is "recognized". Similarly, the operation sequence of the text X2 can be obtained as [0,0,0,0,2,0 ]]The operation sequence of the text X3 is [0,0,0,0,3,0,0]The operation sequence of the text X2 is [0,0,1,0,0,0,0 ]]。

After obtaining the operation sequence, the starting position and the ending position of the error in the input text can be calculated, such as that the error position of the text X2 is (5,6), that of X3 is (4,6), and that of X4 is (2, 4).

If a text has no error position detected in the step, that is, the obtained operation sequence is composed of all 0 s, the text is considered to be correct, and the text is directly output as the error correction result of the model.

And 5: the correct text is generated. This step requires obtaining a semantic representation of the words using step 2

And step 4, obtaining error positions, firstly taking out semantic expression vectors between corresponding positions according to the error positions, then keeping vectors at two ends as context information of error start and error end, taking the average value of intermediate vectors as the characteristic information of the error, and if no intermediate vector exists, using a vector h representing a null state_empRepresenting, finally splicing the three vectors to obtain wrong representing information h_info. For example, the error indication information corresponding to the text X2 is

Then by giving h_infoPosition vector E 'with splicing embedding dimension POS _ DIM of 200'_posAnd inputting a twice feedforward neural network to obtain an intermediate representation vector of a j word to be generated. Setting the maximum length MAX _ LEN of the text that can be generated to 3, a corrected text of 3 words is generated for each error, and for errors in the text X2, the input at step 5 is

And

finally word embedding using BERT itselfInto matrix E_wordAnd a softmax function, which maps the intermediate vector to a word in the dictionary to complete the generation of the correct text. The three words finally generated are respectively ' produce ', ' weight ' and ' [ EOP ]]Adding two characters of 'property right' to the back of the original input text 'protection intellectual property', and finally obtaining the text after error correction as 'protection intellectual property right'; for text X3, a similar would generate "identify", "[ EOP]"and" is "three words, but here only" [ EOP ]]"preceding words are the result and following words are discarded.

It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims

1. A text error correction method based on BERT and feedforward neural network is characterized by comprising the following steps:

1) and preprocessing the text error correction corpus data.

2. The text error correction method based on the BERT and feedforward neural network as claimed in claim 1, wherein in the step 1), the data preprocessing method:

1.1) carrying out preprocessing operation on the acquired text data.

3. The method of text error correction based on BERT and feedforward neural networks as claimed in claim 2, wherein in said step 2), the text representation is encoded based on BERT:

2.2) obtaining semantic feature representation of each character by utilizing an L-layer Transformer module in BERT

The calculation method comprises the following steps:

H^l＝Transformer(H^l-1)

2.3) Using "[ CLS]Character corresponding feature

Obtaining the text overall semantic representation:

4. the text error correction method based on BERT and feedforward neural networks as claimed in claim 3, wherein in the step 3), it is determined whether the text is a correct text:

P^rw＝sigmoid(W^rwc+b^rw)

wherein, W^rwAnd b^rwAre weight parameters for deep learning model learning.

3.3) adding P^rwAnd manually set threshold

Loss^rw＝BCELoss(P^rw，y^rw)

5. The text error correction method based on the BERT and feedforward neural networks as claimed in claim 4, wherein in the step 4), the position of the error in the text is detected:

wherein the content of the first and second substances,

wherein the content of the first and second substances,

is the real operation tag corresponding to the i1 th character.

6. The text error correction method based on the BERT and feedforward neural networks as claimed in claim 5, wherein in the step 5), the correct text corresponding to the wrong text is generated:

where s and e are the starting and ending positions of an error,

is the above information that precedes the error,

for context information after error, h_midFor error information, when the error startsWhen the position and the end position are adjacent, h_midA special vector h for self-learning of a model_empOtherwise h_midIs based on the average of the vectors between the error start and end positions

h_i3，j＝MLP([h_info，E′_pos(j)])

wherein h is_i3，jIs the characteristic that the i3 th error in the text corresponds to the j' th correct word generated.

Is the jth word in the correct text that is corresponding to the i3 th error.