CN114064856A - XLNET-BiGRU-based text error correction method - Google Patents

XLNET-BiGRU-based text error correction method Download PDF

Info

Publication number
CN114064856A
CN114064856A CN202111394371.9A CN202111394371A CN114064856A CN 114064856 A CN114064856 A CN 114064856A CN 202111394371 A CN202111394371 A CN 202111394371A CN 114064856 A CN114064856 A CN 114064856A
Authority
CN
China
Prior art keywords
xlnet
model
embedding
text
error correction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111394371.9A
Other languages
Chinese (zh)
Inventor
王伦
张发雨
王宁
党章
吴兴龙
孟奥
冯立二
杨正云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Future Networks Innovation Institute
Original Assignee
Jiangsu Future Networks Innovation Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Future Networks Innovation Institute filed Critical Jiangsu Future Networks Innovation Institute
Priority to CN202111394371.9A priority Critical patent/CN114064856A/en
Publication of CN114064856A publication Critical patent/CN114064856A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Probability & Statistics with Applications (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a text error correction method based on XLNET-BiGRU, which is characterized by comprising the following steps: s1, training an XLNT (generalized automated training for Language understanding) Chinese Model based on large-scale unlabeled corpus, wherein the XLNT Model mainly comprises a ranking Language Model, a double-Stream Attention machine (Two-Stream Self-Attention) and a Transformer-XL core component; s2, preprocessing and labeling the text error correction corpus data; s3, constructing an XLNet-BiGRU neural network model on the basis of the XLNet pre-training Chinese model trained in S1, wherein the model mainly comprises a detection network and an error correction network, and is trained by using the marked data in S2. The invention improves the problem of long time consumption of the traditional translation model-based error correction method, optimizes the serial process of text error correction for generating correct sentences word by word into a parallel process of error correction by using an XLNET neural network only aiming at the wrong contents.

Description

XLNET-BiGRU-based text error correction method
Technical Field
The invention relates to the field of artificial intelligence and natural language processing, in particular to an XLNet-BiGRU text error correction method.
Background
Text error correction is a natural language processing technology for correcting error contents in texts, and specifically comprises error correction objects such as spelling error correction, grammar error correction and semantic pragmatic error correction in characteristic scenes. The spelling error correction is characterized in that the length of the text is not changed, and only the wrongly written characters in the text are corrected one by one; the grammar error correction and the semantic language error correction need to process errors such as multi-word errors, few-word errors, word errors and word sequence errors in the text, and the length of the text can be changed.
In recent years, large-scale deep pre-training language models such as BERT and XLNET promote rapid development of the natural language processing field, so that a better initial text semantic representation can be obtained when a specific text processing task is carried out, and the time and cost required by model convergence are reduced.
The traditional text error correction mainly adopts a method based on rules or a translation model, wherein the method based on the rules mainly depends on manual definition of a replacement word dictionary and can only correct specific errors; text error correction using translation models is currently the mainstream method, and neural network-based translation models have been used for error correction instead of statistical-based translation models, which solves text error correction as a translation problem from a wrong sentence to a correct sentence, although it works well, sentences are smooth, but requires a large amount of training data, and there is a problem of long time consumption in use. In addition, if only misspellings are corrected, the current method mainly adopts a sequence marking method, which can quickly correct wrongly written characters but is not suitable for correcting other errors.
Disclosure of Invention
Chinese text correction is a challenging task because models must have human-level language understanding capability to achieve a satisfactory solution, and conventional text correction is difficult to achieve satisfactorily using methods based on rules or translation models. The invention aims to provide a text error correction method based on XLNET-BiGRU, which aims to solve the problems in the background technology.
In order to achieve the purpose, the invention adopts the following technical scheme:
an XLNET-BiGRU text error correction method is characterized by comprising the following steps:
s1, training an XLNT (generalized automated training for Language understanding) Chinese Model based on large-scale unlabeled corpus, wherein the XLNT Model mainly comprises a ranking Language Model, a double-Stream Attention machine (Two-Stream Self-Attention) and a Transformer-XL core component;
s2, preprocessing and labeling the text error correction corpus data;
s3, constructing an XLNet-BiGRU neural network model on the basis of the XLNet pre-training Chinese model trained in S1, wherein the model mainly comprises a detection network and an error correction network, and is trained by using the marked data in S2.
The step S1 specifically includes: the purpose of the permutation language model included in the XLNET model is to randomly shuffle the Chinese characters of the sentences in the text, for the Chinese character xiHan { x } originally appearing behind iti+1,…,xnIt can also appear in front of it, assuming that the text sequence of length T is [1,2, …, T]All combinations of (A)T,atFor the t-th element in the sequence, a < t represents a permutation combination case, i.e., a ∈ ATThe modeling process can be expressed as:
Figure BDA0003369424790000021
wherein θ is a model parameter with training;
further, XLNet adopts a dual Stream Attention mechanism, in which a Content Stream Attention indicates a Self-Attention mechanism containing position information and Content information, and a Query Stream Attention indicates an input Stream only containing position information;
when the Query Stream attribute is used for predicting the required predicted position, no content information of the current position is revealed, the two types of information supplement each other, the characteristics related to the context information are better extracted, and a specific double-flow attention mechanism is as follows:
Figure BDA0003369424790000022
Figure BDA0003369424790000023
wherein the content of the first and second substances,
Figure BDA0003369424790000031
only the position information of the input text, as the Q matrix in Self-orientation,
Figure BDA0003369424790000032
content information including input text, as K and V matrixes in Self-orientation;
furthermore, the XLNET language model takes a transform framework as a core, introduces a circulation mechanism and relative position coding, can better utilize context semantic information, and excavates potential hidden relations in text vectors; introducing a relative position coding mechanism formula:
Figure BDA0003369424790000033
wherein
Figure BDA0003369424790000034
Text vectors, R, representing words i, j, respectivelyi-jRepresenting the relative position vector of the words i, j and W representing the weight matrix.
The step S3 specifically includes:
building a model by using XLNT pre-training word vector and inputting Embedding sequence E ═ (E)1,e2,…,en);
Wherein eiRepresenting a character xiAn Embedding vector, which is the sum of the word Embedding (word Embedding), position Embedding (position Embedding) and segment Embedding (segment Embedding) of each character;
further inputting the input sequence E into a detection network BiGRU (bidirectional gated recurrent unit) neural network model;
the BiGRU is actually a simplification of the LSTM, which controls the passing and interception of information through the gate; the specific state calculation formula is as follows:
zt=σ(wz·[ht-1,xt])
rt=σ(wr·[ht-1,xt])
Figure BDA0003369424790000035
Figure BDA0003369424790000036
wherein σ is sigmoid function, zt,rtRespectively, update gate and reset gate, from the current time xtAnd the last moment ht-1And determining that the control information is saved and abandoned, wherein the updating door is used for controlling the degree of the state information of the previous moment being brought into the current state, and the larger the value of the updating door is, the more the state information of the previous moment is brought into the current state. How much information is written to the current candidate set before reset gate controls the previous state
Figure BDA0003369424790000041
The above step (1); and htThen the output of the current moment and the input of the hidden vector of the next moment are used; w is az,wr
Figure BDA0003369424790000042
Resetting the gate candidate set weight parameters for the update gates, respectively;
output vector G ═ G of BiGRU1,g2,…,gn) Is a probability label between 0 and 1 for each character, a smaller value indicating a higher probability of the corresponding character being erroneous;
further, the error probability of each character in the text sequence is calculated by the following formula:
Figure BDA0003369424790000043
wherein P isd(gi1| X) is the error probability calculated by the detection network, σ is the sigmoid function, W is the weight matrix for the full connection, bdIn order to be a term of the offset,
Figure BDA0003369424790000044
the last layer weight of the BiGRU;
further, the sequence e is inputiAnd use ofDetecting p calculated by networkiAnd (3) weighted summation, constructing soft-mask Embedding:
e′i=pi.emask+(1-pi).ei
wherein eiTo input an Embedding vector, emaskIs the mask Embedding vector; if the error probability is high, e'iApproximation mask Embedding vector emask(ii) a Otherwise e'iWill be nearly equal to the input Embedding vector ei
Further, the error correction network is a sequence multi-classification model based on XLNET; the input is soft-mask Embedding sequence E '═ E'1,e′2,…,e′n) The output sequence Y ═ Y (Y)1,y2,…,yn);
Further, all hidden states h of the last layer Encoder in the XLNet model are takeni cAnd the input Embedding sequence vector eiCorresponding addition is carried out with Residual Connection (Residual Connection) to obtain Residual Connection value h'i
Figure BDA0003369424790000045
Wherein the content of the first and second substances,
Figure BDA0003369424790000046
all hidden states of the last layer of Encoder in the XLNet model; then connecting the residual errors to a value h'iInputting the residual word into a full-connection layer, and mapping the residual connection value into a vector with the same dimension as the candidate word list by using a hidden state of XLNet; the probability that character i can be corrected to candidate character j is then output using the softmax function:
Pc(yi|X)=softmax(Wh′i+b)[j]。
compared with the prior art, the invention has the beneficial effects that:
the XLNET model used by the invention is based on the fact that unsupervised training is carried out on large-scale label-free data, pre-training can be carried out by combining context semantic information, and the characteristics of word level, syntactic structure and context semantic information are learned, so that the defect that static word embedding cannot represent word ambiguity is solved; the method uses the sequence label for the processing object of text error correction, so that various types of errors can be quickly and accurately corrected by using a sequence label method, and the method is not limited to spelling error correction; the method carries out text error correction based on XLNET, can carry out error correction on error texts in large-scale linguistic data and generate correct texts, simultaneously improves the problem of long time consumption of the traditional error correction method based on a translation model, and optimizes the serial process of generating correct sentences one by one for text error correction into the parallel process of carrying out error correction only on error contents by using the XLNET neural network.
Drawings
FIG. 1 is a flow chart of a text error correction method based on XLNET-BiGRU in the invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it is to be understood that the described embodiments are only a part of the embodiments of the present invention, rather than all embodiments, and all other embodiments obtained by those skilled in the art without any inventive work based on the embodiments of the present invention belong to the protection scope of the present invention.
Example 1
As shown in fig. 1, the text error correction method based on XLNet-BiGRU of the present invention includes the following steps:
s1, training XLNT (generalized automated forecasting for Language understanding) Chinese model based on large-scale unmarked corpus.
The XLNET Model mainly comprises a ranking Language Model (Permutation Language Model), a Two-Stream Attention machine Model (Two-Stream Self-Attention) and a transform-XL core component.
Further, the contained permutation language model in the XLNET model aims to randomly shuffle the Chinese characters of the sentence in the text, for Chinese character xiHan { x } originally appearing behind iti+1,…,xnIt can also appear in front of it, assuming that the text sequence of length T is [1,2, …, T]All combinations of (A)T,atFor the t-th element in the sequence, a < t represents a permutation combination case, i.e. a is equal to ATThe modeling process can be expressed as:
Figure BDA0003369424790000061
where θ is the model parameter with training.
Further, XLNet employs a dual Stream Attention mechanism, in which a Content Stream Attention indicates a Self-Attention mechanism that includes both location information and Content information, and a Query Stream Attention indicates a location-only input Stream. Therefore, when the Query Stream attribute is used for predicting the required predicted position, no content information of the current position is leaked, the Query Stream attribute and the content information supplement each other, and the characteristics related to the context information are better extracted, wherein a specific double-flow attention mechanism is as follows:
Figure BDA0003369424790000062
Figure BDA0003369424790000063
wherein the content of the first and second substances,
Figure BDA0003369424790000064
only the position information of the input text, as the Q matrix in Self-orientation,
Figure BDA0003369424790000065
the content information containing the input text is used as K and V matrixes in the Self-orientation.
Furthermore, the XLNET language model takes a transform framework as a core, introduces a circulation mechanism and relative position coding, and can better utilize context semantic information to dig out potential hidden relations in text vectors. Introducing a relative position coding mechanism formula:
Figure BDA0003369424790000066
wherein
Figure BDA0003369424790000067
Text vectors, R, representing words i, j, respectivelyi-jRepresenting the relative position vector of the words i, j and W representing the weight matrix.
S2, preprocessing and labeling the text error correction corpus data, wherein the training data is a tuple pair consisting of an original sequence and a corrected sequence: (X)1,Y1),(X2,Y2),…,(XN,YN)。
S3, constructing an XLNet-BiGRU neural network model on the basis of the XLNet pre-training Chinese model trained in S1, wherein the model mainly comprises a detection network and an error correction network, and is trained by using the marked data in S2.
S3-1, further, using XLNet pre-training word vector construction model to input Embedding sequence E ═ (E)1,e2,…,en) Wherein e isiRepresenting a character xiThe Embedding vector is a sum of word Embedding (word Embedding), position Embedding (position Embedding), and segment Embedding (segment Embedding) of each character.
S3-2, further, the input sequence E is input into a bigru (bidirectional gated recurrent unit) neural network model. BiGRU is actually a simplification of LSTM, which controls the passing and interception of information through gates.
The specific state calculation formula is as follows:
zt=σ(wz·[ht-1,xt])
rt=σ(wr·[ht-1,xt])
Figure BDA0003369424790000071
Figure BDA0003369424790000072
wherein σ is sigmoid function, zt,rtRespectively, update gate and reset gate, from the current time xtAnd the last moment ht-1And determining, and controlling the storage and abandonment of the information. And htThen it is taken as the output of the current time instant and the input of the concealment vector for the next time instant.
Output vector G ═ G of BiGRU1,g2,…,gn) Is a probability label between 0 and 1 for each character, with smaller values indicating a greater likelihood of the corresponding character being erroneous.
Further, the error probability of each character in the text sequence is calculated by the following formula:
Figure BDA0003369424790000073
wherein P isd(gi1| X) is the error probability calculated by the detection network, σ is the sigmoid function, W is the weight matrix for the full connection, bdIn order to be a term of the offset,
Figure BDA0003369424790000074
is the last layer weight of BiGRU.
S3-3, further inputting the sequence eiAnd p calculated using the detection networkiAnd (3) weighted summation, constructing soft-mask Embedding:
e′i=pi.emask+(1-pi).ei
wherein eiTo input an Embedding vector, emaskIs the mask Embedding vector. If the error probability is high, e'iProximity maskCode Embedding vector emask(ii) a Otherwise e'iWill be nearly equal to the input Embedding vector ei
S3-4, further, the error correction network is a sequence multi-classification model based on XLNET. The input is soft-mask Embedding sequence E '═ E'1,e′2,…,e′n) The output sequence Y ═ Y (Y)1,y2,…,yn)。
Further, all hidden states of the Encoder at the last layer in the XLNet model and an input Embedding sequence vector e are takeniCorresponding addition is carried out with Residual Connection (Residual Connection) to obtain Residual Connection value h'i
Figure BDA0003369424790000081
Then connecting the residual errors to a value h'iThe input is to a fully-connected layer that maps residual join values into vectors of the same dimension as the candidate vocabulary using XLNet's hidden state. The probability that character i can be corrected to candidate character j is then output using the softmax function:
Pc(yi|X)=softmax(Wh′i+b)[j]
further, the character with the highest probability is taken to replace the text to be corrected.
According to the method, the pretreatment corpus sample of the XLNET-BiGRU text error correction model is constructed as follows:
Figure BDA0003369424790000082
Figure BDA0003369424790000091
the foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the preferred embodiments of the invention and described in the specification are only preferred and not intended to limit the invention, and that various changes and modifications may be made without departing from the novel spirit and scope of the invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (3)

1. An XLNET-BiGRU text error correction method is characterized by comprising the following steps:
s1, training an XLNT (generalized automated training for Language understanding) Chinese Model based on large-scale unlabeled corpus, wherein the XLNT Model mainly comprises a ranking Language Model, a double-Stream Attention machine (Two-Stream Self-Attention) and a Transformer-XL core component;
s2, preprocessing and labeling the text error correction corpus data;
s3, constructing an XLNet-BiGRU neural network model on the basis of the XLNet pre-training Chinese model trained in S1, wherein the model mainly comprises a detection network and an error correction network, and is trained by using the marked data in S2.
2. The XLNet-BiGRU-based text correction method of claim 1, wherein: the step S1 specifically includes: randomly disordering the Chinese characters of the sentence in the text, for the Chinese character xiHan { x } originally appearing behind iti+1,…,xnIt can also appear in front of it, assuming that the text sequence of length T is [1,2, …, T]All combinations of (A)T,atFor the t-th element in the sequence, a < t represents a permutation combination case, i.e. a epsilon ATThe modeling process can be expressed as:
Figure FDA0003369424780000011
wherein θ is a model parameter with training;
further, XLNet adopts a dual Stream Attention mechanism, in which a Content Stream Attention indicates a Self-Attention mechanism containing position information and Content information, and a Query Stream Attention indicates an input Stream only containing position information;
when the Query Stream attribute is used for predicting the required predicted position, no content information of the current position is revealed, the two types of information supplement each other, the characteristics related to the context information are better extracted, and a specific double-flow attention mechanism is as follows:
Figure FDA0003369424780000012
(QueryStream)
Figure FDA0003369424780000013
(ContentStream)
wherein the content of the first and second substances,
Figure FDA0003369424780000021
only the position information of the input text, as the Q matrix in Self-orientation,
Figure FDA0003369424780000022
the content information containing the input text is used as K and V matrixes in the Self-orientation;
furthermore, the XLNET language model takes a transform frame as a core, introduces a circulation mechanism and relative position coding, can better utilize context semantic information, and excavates a potential hidden relation in a text vector; introducing a relative position coding mechanism formula:
Figure FDA0003369424780000023
wherein
Figure FDA0003369424780000024
Text vectors, R, representing words i, j, respectivelyi-jRepresenting the relative position vector of the words i, j and W representing the weight matrix.
3. The XLNet-BiGRU-based text correction method of claim 1, wherein: the step S3 specifically includes:
building a model by using XLNT pre-training word vector and inputting Embedding sequence E ═ (E)1,e2,…,en);
Wherein eiRepresenting a character xiAn Embedding vector, which is the sum of the word Embedding (word Embedding), position Embedding (position Embedding) and segment Embedding (segment Embedding) of each character;
further inputting the input sequence E into a detection network BiGRU (bidirectional corrected neural unit) neural network model;
the BiGRU is actually a simplification of the LSTM, which controls the passing and interception of information through the gate; the specific state calculation formula is as follows:
zt=σ(wz·[ht-1,xt])
rt=σ(wr·[ht-1,xt])
Figure FDA0003369424780000025
Figure FDA0003369424780000026
wherein σ is sigmoid function, zt,rtRespectively, update gate and reset gate, from the current time xtAnd the last moment ht-1Determining whether the control information is saved or abandoned, and updating the door to control the degree of the state information of the previous time being brought into the current state, wherein the larger the value of the updating door is, the more the state information of the previous time isThe more information is brought in. How much information is written to the current candidate set before reset gate controls the previous state
Figure FDA0003369424780000031
The above step (1); and htThen the output of the current moment and the input of the hidden vector of the next moment are used; w is az,wr
Figure FDA0003369424780000032
Resetting the gate candidate set weight parameters for the update gates, respectively;
output vector G ═ G of BiGRU1,g2,…,gn) Is a probability label between 0 and 1 for each character, a smaller value indicating a higher probability of the corresponding character being erroneous;
further, the error probability of each character in the text sequence is calculated by the following formula:
Figure FDA0003369424780000033
wherein P isd(gi1| X) is the error probability calculated by the detection network, σ is the sigmoid function, W is the weight matrix for the full connection, bdIn order to be a term of the offset,
Figure FDA0003369424780000034
the last layer weight of the BiGRU;
further, the sequence e is inputiAnd p calculated using the detection networkiAnd (3) weighted summation, constructing soft-mask Embedding:
e′i=pi.emask+(1-pi).ei
wherein eiTo input an Embedding vector, emaskIs the mask Embedding vector; if the error probability is high, e'iApproximation mask Embedding vector emask(ii) a Otherwise e'iWill be nearly equal to the input Embedding vector ei
Further, the error correction network is a sequence multi-classification model based on XLNET; the input is soft-mask Embedding sequence E '═ E'1,e′2,…,e′n) The output sequence Y ═ Y (Y)1,y2,…,yn);
Further, all hidden states of the last layer Encoder in the XLNet model are taken
Figure FDA0003369424780000035
And the input Embedding sequence vector eiCorresponding addition is carried out with Residual Connection (Residual Connection) to obtain Residual Connection value h'i
Figure FDA0003369424780000036
Wherein the content of the first and second substances,
Figure FDA0003369424780000037
all hidden states of the last layer of Encoder in the XLNet model;
then connecting the residual errors to a value h'iInputting the residual error into a full-connection layer, and mapping the residual error connection value into a vector with the same dimension as the candidate word list by using a hidden state of XLNet; the probability that character i can be corrected to candidate character j is then output using the softmax function:
Pc(yi|X)=softmax(Wh′i+b)[j]。
CN202111394371.9A 2021-11-23 2021-11-23 XLNET-BiGRU-based text error correction method Pending CN114064856A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111394371.9A CN114064856A (en) 2021-11-23 2021-11-23 XLNET-BiGRU-based text error correction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111394371.9A CN114064856A (en) 2021-11-23 2021-11-23 XLNET-BiGRU-based text error correction method

Publications (1)

Publication Number Publication Date
CN114064856A true CN114064856A (en) 2022-02-18

Family

ID=80279483

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111394371.9A Pending CN114064856A (en) 2021-11-23 2021-11-23 XLNET-BiGRU-based text error correction method

Country Status (1)

Country Link
CN (1) CN114064856A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115017891A (en) * 2022-08-04 2022-09-06 海拓仪器(江苏)有限公司 Long text error correction method
CN115204143A (en) * 2022-09-19 2022-10-18 江苏移动信息系统集成有限公司 Method and system for calculating text similarity based on prompt

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115017891A (en) * 2022-08-04 2022-09-06 海拓仪器(江苏)有限公司 Long text error correction method
CN115204143A (en) * 2022-09-19 2022-10-18 江苏移动信息系统集成有限公司 Method and system for calculating text similarity based on prompt
CN115204143B (en) * 2022-09-19 2022-12-20 江苏移动信息系统集成有限公司 Method and system for calculating text similarity based on prompt

Similar Documents

Publication Publication Date Title
CN112733541A (en) Named entity identification method of BERT-BiGRU-IDCNN-CRF based on attention mechanism
CN107967262A (en) A kind of neutral net covers Chinese machine translation method
CN111767718B (en) Chinese grammar error correction method based on weakened grammar error feature representation
CN111666758B (en) Chinese word segmentation method, training device and computer readable storage medium
CN110008469A (en) A kind of multi-level name entity recognition method
CN114757182A (en) BERT short text sentiment analysis method for improving training mode
CN114064856A (en) XLNET-BiGRU-based text error correction method
CN111368542A (en) Text language association extraction method and system based on recurrent neural network
Ren et al. Detecting the scope of negation and speculation in biomedical texts by using recursive neural network
CN111428518B (en) Low-frequency word translation method and device
CN113190656A (en) Chinese named entity extraction method based on multi-label framework and fusion features
Wu et al. An effective approach of named entity recognition for cyber threat intelligence
CN110134950A (en) A kind of text auto-collation that words combines
CN114925170B (en) Text proofreading model training method and device and computing equipment
CN115759042A (en) Sentence-level problem generation method based on syntax perception prompt learning
CN111898337B (en) Automatic generation method of single sentence abstract defect report title based on deep learning
CN111274826B (en) Semantic information fusion-based low-frequency word translation method
CN115730232A (en) Topic-correlation-based heterogeneous graph neural network cross-language text classification method
CN113590745B (en) Interpretable text inference method
Xu Research on neural network machine translation model based on entity tagging improvement
CN114169345A (en) Method and system for day-to-day machine translation using homologous words
CN114417872A (en) Contract text named entity recognition method and system
CN113408267A (en) Word alignment performance improving method based on pre-training model
Chong Design and implementation of English grammar error correction system based on deep learning
Zhang et al. A multi-granularity neural network for answer sentence selection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination