CN115062611A - Training method, device, equipment and storage medium of grammar error correction model - Google Patents

Training method, device, equipment and storage medium of grammar error correction model Download PDF

Info

Publication number
CN115062611A
CN115062611A CN202210560454.9A CN202210560454A CN115062611A CN 115062611 A CN115062611 A CN 115062611A CN 202210560454 A CN202210560454 A CN 202210560454A CN 115062611 A CN115062611 A CN 115062611A
Authority
CN
China
Prior art keywords
model
training
original
error correction
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210560454.9A
Other languages
Chinese (zh)
Other versions
CN115062611B (en
Inventor
蒋盛益
林楠铠
林晓钿
武洪艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Foreign Studies
Original Assignee
Guangdong University of Foreign Studies
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Foreign Studies filed Critical Guangdong University of Foreign Studies
Priority to CN202210560454.9A priority Critical patent/CN115062611B/en
Publication of CN115062611A publication Critical patent/CN115062611A/en
Application granted granted Critical
Publication of CN115062611B publication Critical patent/CN115062611B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a method, a device, equipment and a storage medium for training a grammar error correction model, wherein an original model is constructed based on a Transformer; in each round of training, inputting a training set acquired in advance into the original model, and adjusting parameters in the original model by combining a moving average strategy; the training set comprises a plurality of training samples, the training samples comprise an original sentence consisting of a plurality of words and phrases and a target sentence corresponding to the original sentence and consisting of a plurality of labels, and the target sentence is obtained by carrying out grammatical error correction on the original sentence in advance; and when the training round reaches a preset time threshold, finishing the algorithm and taking the original model obtained in the last training round as the optimal grammar error correction model. According to the embodiment of the invention, the original model is constructed by using the Transformer, the model training is carried out by using the pre-obtained training set in combination with the moving average strategy, the optimal grammar error correction model is obtained, the overfitting is avoided, and the generalization capability of the model is improved.

Description

Training method, device, equipment and storage medium of grammar error correction model
Technical Field
The invention relates to the technical field of natural language processing, in particular to a method, a device, equipment and a storage medium for training a grammar error correction model.
Background
Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence, and researches various theories and methods that can realize effective communication between a person and a computer using Natural Language, wherein syntax error correction is one of the research directions. In the prior art, people adopt deep learning to correct grammar errors, but due to the fact that a constructed deep model has millions or even billions of parameters, overfitting is easy to occur, especially an end-to-end model is low in stability and large in performance fluctuation in the training process, and therefore a model training mode needs to be provided to avoid overfitting and improve the generalization capability of the deep model.
Disclosure of Invention
The embodiment of the invention aims to provide a method, a device, equipment and a storage medium for training a grammar error correction model.
In order to achieve the above object, an embodiment of the present invention provides a method for training a syntax error correction model, including:
constructing an original model based on a Transformer;
in each round of training, inputting a training set acquired in advance into the original model, and adjusting parameters in the original model by combining a moving average strategy; the training set comprises a plurality of training samples, the training samples comprise an original sentence consisting of a plurality of words and phrases and a target sentence corresponding to the original sentence and consisting of a plurality of labels, and the target sentence is obtained by carrying out grammatical error correction on the original sentence in advance;
and when the training round reaches a preset time threshold, finishing the algorithm and taking the original model obtained in the last training round as the optimal grammar error correction model.
As an improvement of the above scheme, in each round of training, inputting a training set obtained in advance into the original model, and adjusting parameters in the original model by combining a moving average strategy specifically includes:
in each round of training, randomly discarding a hidden unit when each word in an original sentence of each training sample acquired in advance passes through the original model, and sequentially transmitting forward twice to output a first label probability distribution and a second label probability distribution;
based on a sliding average strategy, calculating an average parameter according to the parameter of the original model obtained after the first forward transmission and the parameter of the original model obtained after the second forward transmission to be used as the parameter of the parameter average model;
calculating a first cross entropy of the original model obtained after the first forward transmission and a second cross entropy of the original model obtained after the second forward transmission;
calculating a model loss value corresponding to the word according to the first label probability distribution, the second label probability distribution, a third label probability distribution obtained by forward transmission when the word passes through the parameter average model, the first cross entropy and the second cross entropy;
calculating a loss function according to model loss values corresponding to all words in the training set;
and adjusting parameters of the original model according to the loss function.
As an improvement of the above solution, the average parameter is calculated by the following formula:
Figure BDA0003656379940000021
wherein v is t ' denotes an average parameter, ' denotes a variable taking into account a value of a historical parameter, ' v t-1 Parameters, v, representing the original model obtained after the first forward pass t Representing the parameters of the original model obtained after the second forward pass.
As an improvement of the above scheme, the model loss value is calculated by:
calculating KL divergence between the first label probability distribution, the second label probability distribution and the third label probability distribution respectively;
calculating cross entropy loss;
and calculating a model loss value according to the KL divergence and the cross entropy loss based on a preset weight rule.
As an improvement of the above solution, the model loss value is calculated by the following formula:
Figure BDA0003656379940000031
where Li denotes the model loss value of the ith word, α 'denotes the KL divergence weight, β' denotes the cross entropy loss weight, α '+ 2 β' ═ 1, L i KLM KL divergence, L, representing first and second label probability distributions for the ith word i KLT1 KL divergence, L, representing the second and third label probability distributions for the ith word i KLT2 KL divergence, L, representing the first and third tag probability distributions for the ith word i CE1 Cross entropy loss, L, of the first label probability distribution and answer label representing the ith word i CE2 A cross entropy loss of the second label probability distribution representing the ith word and the answer label.
As an improvement of the above solution, the loss function is calculated by the following formula:
Figure BDA0003656379940000032
where L represents a loss function, N represents the number of sentences of the training set, and M represents the number of words in a sentence.
As an improvement of the above, the original model comprises an encoder and a decoder;
the encoder consists of a Z-layer independent coding layer and a (K-Z) -layer redundant coding layer, and the independent coding layer and the redundant coding layer adopt a parameter sharing strategy to perform model compression;
the decoder consists of a Z-layer independent decoding layer and a (K-Z) -layer redundant decoding layer, wherein the independent decoding layer and the redundant decoding layer adopt a parameter sharing strategy to carry out model compression.
In order to achieve the above object, an embodiment of the present invention provides a training apparatus for a syntax error correction model, including:
the model building module is used for building an original model based on a Transformer;
the model training module is used for inputting a training set acquired in advance into the original model in each round of training and adjusting parameters in the original model by combining a moving average strategy; the training set comprises a plurality of training samples, the training samples comprise an original sentence consisting of a plurality of words and phrases and a target sentence corresponding to the original sentence and consisting of a plurality of labels, and the target sentence is obtained by carrying out grammatical error correction on the original sentence in advance;
and the model determining module is used for finishing the algorithm and taking the original model obtained in the last round of training as the optimal grammar error correction model when the training round reaches a preset time threshold value.
To achieve the above object, an embodiment of the present invention provides a training apparatus for a syntax error correction model, including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, where the processor implements a training method for the syntax error correction model according to any one of the above embodiments when executing the computer program.
To achieve the above object, an embodiment of the present invention provides a computer-readable storage medium, which includes a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, and when the processor executes the computer program, the processor implements the training method of the syntax error correction model according to any one of the above embodiments.
Compared with the prior art, the training method, the training device, the training equipment and the computer-readable storage medium of the grammar error correction model disclosed by the embodiment of the invention are used for constructing an original model based on a Transformer; in each round of training, parameters in the original model are adjusted by inputting a training set acquired in advance into the original model and combining a moving average strategy; the training set comprises a plurality of training samples, the training samples comprise an original sentence consisting of a plurality of words and phrases and a target sentence corresponding to the original sentence and consisting of a plurality of labels, and the target sentence is obtained by carrying out grammatical error correction on the original sentence in advance; and when the training round reaches a preset time threshold, finishing the algorithm and taking the original model obtained in the last training round as the optimal grammar error correction model. According to the embodiment of the invention, the original model is constructed by using the Transformer, the model training is carried out by using the pre-obtained training set in combination with the moving average strategy so as to obtain the optimal grammar error correction model, so that the overfitting is avoided and the generalization capability of the model is improved.
Drawings
FIG. 1 is a flowchart of a method for training a grammar error correction model according to an embodiment of the present invention;
FIG. 2 is a diagram of a model framework provided in accordance with an embodiment of the present invention;
fig. 3 is a schematic diagram of a CYCLE policy according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, which is a flowchart of a training method of a syntax error correction model according to an embodiment of the present invention, the training method of the syntax error correction model includes steps S1 to S3:
s1, constructing an original model based on a Transformer;
s2, inputting a training set acquired in advance into the original model in each training round, and adjusting parameters in the original model by combining a moving average strategy; the training set comprises a plurality of training samples, the training samples comprise an original sentence consisting of a plurality of words and phrases and a target sentence corresponding to the original sentence and consisting of a plurality of labels, and the target sentence is obtained by carrying out grammatical error correction on the original sentence in advance;
and S3, when the training round reaches a preset number threshold, ending the algorithm and taking the original model obtained by the last training round as the optimal grammar error correction model.
Referring to the model framework diagram shown in fig. 2, in the embodiment of the present invention, the sequence learning model in fig. 2 is an end-to-end model composed of a self-attention mechanism and capable of modeling sequence data, the model framework is composed of an encoder and a decoder, the encoder is responsible for encoding and learning input data, and the decoder is responsible for integrating the output of the encoder with the output of the previous time and then encoding the input data. The model framework comprises a grammar generalization module which is used for adjusting the parameters of the basic model besides the basic sequence learning model. Specifically, an original model is constructed based on a Transformer; acquiring a training set, wherein the training set consists of a plurality of training samples, the training samples comprise an original sentence consisting of a plurality of words and a target sentence corresponding to the original sentence and consisting of a plurality of labels, and the target sentence is obtained by grammatical error correction of the original sentence; carrying out a plurality of rounds of training on the original model by utilizing a training set, setting the total number of the rounds of training by a user according to actual requirements, and adjusting the parameters of the original model by adopting a sliding average strategy in each round of training; and when the training round meets a preset number threshold, ending the model training, taking the original model obtained by the last training round as an optimal grammar error correction model for grammar error correction, and providing a grammar generalization strategy of a moving average strategy on the basis of a Transformer model, thereby avoiding overfitting and providing the generalization capability of the model.
In an embodiment, in each round of training, inputting a training set obtained in advance into the original model, and adjusting parameters in the original model in combination with a moving average strategy specifically includes:
in each round of training, randomly discarding a hidden unit when each word in an original sentence of each training sample acquired in advance passes through the original model, and sequentially transmitting forward twice to output a first label probability distribution and a second label probability distribution;
based on a sliding average strategy, calculating an average parameter according to the parameter of the original model obtained after the first forward transfer and the parameter of the original model obtained after the second forward transfer to be used as the parameter of the parameter average model;
calculating a first cross entropy of the original model obtained after the first forward transmission and a second cross entropy of the original model obtained after the second forward transmission;
calculating a model loss value corresponding to the word according to the first label probability distribution, the second label probability distribution, a third label probability distribution obtained by forward transmission when the word passes through the parameter average model, the first cross entropy and the second cross entropy;
calculating a loss function according to model loss values corresponding to all words in the training set;
and adjusting parameters of the original model according to the loss function.
Specifically, in combination with the model framework diagram shown in fig. 2, the embodiment proposes a syntax generalization strategy, improves a discard regularized drop regularization (regularized drop) method, and proposes a multidimensional regularized drop method with training consistency and model consistency, which is applied to a syntax error correction task.
Obtaining a local mean value of model variables in a training process by adopting a moving average, and assuming that the variable of an original model at the time t is v t Incorporating the variable v of the original model at the last moment t-1 Performing sliding average to obtain average parameter at t moment, which corresponds to the average parameterThe model is a parameter average model at the time t.
In a multi-dimensional regulated drop formally, an input sentence S is given i It consists of a series of words w 1 ,w 2 ,w 3 ,…,w m And a series of labels y 1 ,y 2 ,y 3 ,…,y m Are composed of, each word w i Randomly discarding some hidden units when passing through the original model, and forwarding twice to obtain two different label probability distributions P 1 (y i |w i ) (first label probability distribution) and P 2 (y i |w i ) (second label probability distribution), and forward passing the label probability distribution P when passing through the parameter average model 3 (y i |w i ') (third label probability distribution). In order to force two distributions P 1 (y i |w i ) And P 2 (y i |w i ) Consistent with each other to achieve model consistency, minimizing the two-way Kullback-Leibler (KL) divergence between two distributions, where P is the sum of the cross-entropy as a loss function 1 (y i |w i ) And P 2 (y i |w i ) Is the word w i The probability distribution of the output values is obtained by two times of forward propagation through a Transformer model (original model); according to P 1 (y i |w i )、P 2 (y i |w i ) And P 3 (y i |w i ') and cross entropy to calculate the word w i And finally, adjusting parameters of the original model obtained by training of the training round according to the loss function. The embodiment adopts a grammar generalization module to improve the generalization and stability of the model; in the grammar generalization module, a multidimensional generalized dropout method with training consistency and model consistency is provided, and based on the multidimensional generalized dropout, a grammar generalization strategy is provided to improve the grammar generalization capability of the model, so that errors based on grammar rules can achieve good error correction effect without depending on a large amount of data, overfitting is avoided, and improvement is achievedThe model generalization capability is realized.
In one embodiment, the average parameter is calculated by the following formula:
Figure BDA0003656379940000071
wherein v is t ' denotes an average parameter, ' denotes a variable taking into account a value of a historical parameter, ' v t-1 Parameters, v, representing the original model obtained after the first forward pass t Representing the parameters of the original model obtained after the second forward pass.
Specifically, a local mean value of a model variable in a training process is obtained by adopting a moving average, and the variable of an original model at the time t is assumed to be v t Then, after using the running average, the variables of the model are updated as:
v t1 =δ·v t-1 +(1-δ)·v t
where δ is a variable that takes into account historical parameter values, the larger δ the more v is the value obtained by the running average t Particularly, the average value of the parameters of the first few rounds of the start of the sliding average still has large fluctuation, so that the v is subjected to the sliding average t1 Making a correction of v t1 Divided by (1-delta) t ) Correcting the estimate of the mean:
Figure BDA0003656379940000081
in one embodiment, the model loss value is calculated by:
respectively calculating KL divergence degrees between every two of the first label probability distribution, the second label probability distribution and the third label probability distribution;
calculating cross entropy loss;
and calculating a model loss value according to the KL divergence and the cross entropy loss based on a preset weight rule.
In particular, to achieve training consistency, the moldType also forces P 1 (y i |w i ) And P 3 (y i |w′ i )、P 2 (y i |w i ) And P 3 (y i |w′ i ) Consistent with each other:
Figure BDA0003656379940000082
Figure BDA0003656379940000083
Figure BDA0003656379940000084
wherein the content of the first and second substances,
Figure BDA0003656379940000085
is P 1 (y i |w i ) And P 2 (y i |w i ) The KL divergence of (a),
Figure BDA0003656379940000086
is P 1 (y i |w i ) And P 3 (y i |w′ i ) The KL divergence of (a),
Figure BDA0003656379940000087
is P 2 (y i |w i ) And P 3 (y i |w′ i ) KL divergence of (1).
After the KL divergence is obtained, we further weight the KL divergence and cross entropy loss
Figure BDA0003656379940000088
And
Figure BDA0003656379940000089
to obtain a sentence S i Chinese word w i Ultimate loss L of i
In one embodiment, the model loss value is calculated by the following formula:
Figure BDA00036563799400000810
where Li denotes the model loss value of the ith word, α 'denotes the KL divergence weight, β' denotes the cross entropy loss weight, α '+ 2 β' ═ 1, L i KLM KL divergence, L, representing first and second label probability distributions for the ith word i KLT1 KL divergence, L, representing the second and third label probability distributions for the ith word i KLT2 KL divergence, L, representing the first and third tag probability distributions for the ith word i CE1 And L i CE2 Represents the cross entropy loss of the ith word, the cross entropy losses of the two cross entropies are respectively the cross entropy losses of the answer label and the different probability distribution generated by two forward propagations, namely L i CE1 Cross entropy loss, L, of the first label probability distribution and answer label representing the ith word i CE2 A cross entropy loss of the second label probability distribution representing the ith word and the answer label.
Specifically, S i Chinese word w i Ultimate loss L of i (model loss value) was obtained by:
Figure BDA0003656379940000091
wherein α and β are loss weights, β is default set to 0.5, and after normalization of the loss weights, the loss value of the model is:
Figure BDA0003656379940000092
α′+2*β′=1。
in one embodiment, the loss function is calculated by the following equation:
Figure BDA0003656379940000093
where L represents a loss function, N represents the number of sentences of the training set, and M represents the number of words in a sentence.
Specifically, the loss function for the entire training set is calculated as follows:
Figure BDA0003656379940000094
wherein L represents a loss function, N represents the number of sentences of the training set, and M represents the number of words in the sentences. It can be understood that the model loss values of all words in each sentence are averaged to obtain the loss value of each sentence, and then the loss values of all sentences are averaged to obtain the loss function of the whole training set.
In one embodiment, the original model includes an encoder and a decoder;
the encoder consists of a Z-layer independent coding layer and a (K-Z) -layer redundant coding layer, and the independent coding layer and the redundant coding layer adopt a parameter sharing strategy to perform model compression;
the decoder consists of a Z-layer independent decoding layer and a (K-Z) -layer redundant decoding layer, wherein the independent decoding layer and the redundant decoding layer adopt a parameter sharing strategy to perform model compression.
Specifically, in conjunction with fig. 2, the model framework further includes a parameter sharing module that uses parameters of the Z layer in constructing a K layer Transformer-based encoder-decoder. Two strategies are provided for parameter allocation in syntax error correction tasks: CYCLE and CYCLE-REV. Illustratively, when Z is set to 3 and K to 6, two parameter allocation strategies are provided for the encoder module, each parameter being allocated to the decoder module in the same way as for the encoder. The parameter sharing strategy enables the model to share the same parameters in different coding layers or decoding layers, so that the number of the parameters of the model is reduced by one third compared with that of a Transformer, and the difficulty in the training process is reduced while the performance of the model is ensured.
In the CYCLE strategy, we stack layers with Z parameters independent of each other (independent coding layers). Then, we repeat stacking Z layers (redundancy-encoded layers) in the same order as the preceding Z layers until the total number of layers reaches K. When setting Z3 and K6, the parameter sharing strategy stacks the initial 3 layers twice. Since higher layers require more degrees of freedom to express them than lower layers, in other words, lower layers may have redundant parameters compared to higher layers, the present embodiment also proposes the CYCLE-REV strategy to reuse the parameters of lower layers in higher layers, see the CYCLE strategy diagram shown in fig. 3. Stacking Z layers is repeated in the same manner, similar to the CYCLE strategy. For the remaining layers, the Z layers are then stacked in reverse order, see the pseudo code for the parameter sharing policy shown in the table below.
Figure BDA0003656379940000101
Figure BDA0003656379940000111
Compared with the prior art, the training method of the grammar error correction model disclosed by the embodiment of the invention can be used for constructing the original model by using the Transformer, combining the sliding average strategy and carrying out model training by using the pre-obtained training set so as to obtain the optimal grammar error correction model, thereby avoiding overfitting and improving the generalization capability of the model; in addition, the model is compressed by adding a parameter sharing strategy, so that the training difficulty of the model is reduced.
The embodiment of the present invention further provides a training device for a grammar error correction model, including:
the model building module is used for building an original model based on a Transformer;
the model training module is used for inputting a training set acquired in advance into the original model in each round of training and adjusting parameters in the original model by combining a moving average strategy; the training set comprises a plurality of training samples, the training samples comprise an original sentence consisting of a plurality of words and phrases and a target sentence corresponding to the original sentence and consisting of a plurality of labels, and the target sentence is obtained by carrying out grammatical error correction on the original sentence in advance;
and the model determining module is used for finishing the algorithm and taking the original model obtained in the last round of training as the optimal grammar error correction model when the training round reaches a preset time threshold value.
It should be noted that, for a specific working process of the training apparatus for the syntax error correction model, reference may be made to the working process of the training method for the syntax error correction model in the foregoing embodiment, and details are not repeated here.
The training device for the grammar error correction model provided by the embodiment of the invention can be used for constructing the original model by using the Transformer, combining the moving average strategy and carrying out model training by using the pre-obtained training set so as to obtain the optimal grammar error correction model, thereby avoiding overfitting and improving the generalization capability of the model.
Embodiments of the present invention further provide a training apparatus for a syntax error correction model, including a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, where the processor executes the computer program to implement the steps in the above embodiments of the training method for the syntax error correction model, such as steps S1 to S3 shown in fig. 1; alternatively, the processor, when executing the computer program, implements the functions of the modules in the above-described device embodiments, such as a model building module.
Illustratively, the computer program may be partitioned into one or more modules that are stored in the memory and executed by the processor to implement the invention. The one or more modules may be a series of computer program instruction segments capable of performing specific functions, which are used for describing the execution process of the computer program in the training device of the grammar error correction model. For example, the computer program may be divided into a plurality of modules, each module having the following specific functions:
the model building module is used for building an original model based on a Transformer;
the model training module is used for inputting a training set acquired in advance into the original model in each round of training and adjusting parameters in the original model by combining a moving average strategy; the training set comprises a plurality of training samples, the training samples comprise an original sentence consisting of a plurality of words and phrases and a target sentence corresponding to the original sentence and consisting of a plurality of labels, and the target sentence is obtained by carrying out grammatical error correction on the original sentence in advance;
and the model determining module is used for finishing the algorithm and taking the original model obtained in the last round of training as the optimal grammar error correction model when the training round reaches a preset time threshold value.
The specific working process of each module may refer to the working process of the training apparatus for a syntax error correction model described in the above embodiment, and is not described herein again.
The training device of the grammar error correction model can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing devices. The training device of the grammar error correction model may include, but is not limited to, a processor, a memory. It will be understood by those skilled in the art that the schematic diagram is merely an example of a training device for the syntax error correction model, and does not constitute a limitation of the training device for the syntax error correction model, and may include more or less components than those shown, or combine some components, or different components, for example, the training device for the syntax error correction model may further include an input-output device, a network access device, a bus, etc.
The Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, the processor being the control center of the training apparatus of the syntax error correction model, the various parts of the training apparatus of the entire syntax error correction model being connected by various interfaces and lines.
The memory may be used to store the computer program and/or module, and the processor may implement various functions of the training apparatus of the syntax error correction model by executing or executing the computer program and/or module stored in the memory and calling data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the mobile phone, and the like. In addition, the memory may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.
Wherein, the module integrated by the training device of the grammar error correction model can be stored in a computer readable storage medium if the module is realized in the form of a software functional unit and sold or used as an independent product. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments described above may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims (10)

1. A method for training a grammar error correction model, comprising:
constructing an original model based on a Transformer;
in each round of training, inputting a training set acquired in advance into the original model, and adjusting parameters in the original model by combining a moving average strategy; the training set comprises a plurality of training samples, the training samples comprise an original sentence consisting of a plurality of words and phrases and a target sentence corresponding to the original sentence and consisting of a plurality of labels, and the target sentence is obtained by carrying out grammatical error correction on the original sentence in advance;
and when the training round reaches a preset time threshold, finishing the algorithm and taking the original model obtained in the last training round as the optimal grammar error correction model.
2. The method for training the grammar error correction model according to claim 1, wherein in each training round, a training set obtained in advance is input into the original model, and parameters in the original model are adjusted by combining a moving average strategy, which specifically includes:
in each round of training, randomly discarding a hidden unit when each word in an original sentence of each training sample acquired in advance passes through the original model, and sequentially transmitting forward twice to output a first label probability distribution and a second label probability distribution;
based on a sliding average strategy, calculating an average parameter according to the parameter of the original model obtained after the first forward transfer and the parameter of the original model obtained after the second forward transfer to be used as the parameter of the parameter average model;
calculating a first cross entropy of the original model obtained after the first forward transmission and a second cross entropy of the original model obtained after the second forward transmission;
calculating a model loss value corresponding to the word according to the first label probability distribution, the second label probability distribution, a third label probability distribution obtained by forward transmission when the word passes through the parameter average model, the first cross entropy and the second cross entropy;
calculating a loss function according to model loss values corresponding to all words in the training set;
and adjusting parameters of the original model according to the loss function.
3. The method for training the syntax error correction model according to claim 2, wherein the average parameter is calculated by the following formula:
Figure FDA0003656379930000021
wherein v is t ' denotes an average parameter, ' denotes a variable taking into account a value of a historical parameter, ' v t-1 Parameters, v, representing the original model obtained after the first forward pass t Representing the parameters of the original model obtained after the second forward pass.
4. The method for training the syntax error correction model according to claim 2, wherein the model loss value is calculated by:
calculating KL divergence between the first label probability distribution, the second label probability distribution and the third label probability distribution respectively;
calculating cross entropy loss;
and calculating a model loss value according to the KL divergence and the cross entropy loss based on a preset weight rule.
5. The method for training the syntax error correction model according to claim 4, wherein the model loss value is calculated by the following formula:
Figure FDA0003656379930000022
where Li denotes the model loss value of the ith word, α 'denotes the KL divergence weight, β' denotes the cross entropy loss weight, α '+ 2 β' ═ 1, L i KLM KL divergence, L, representing first and second label probability distributions for the ith word i KLT1 KL divergence, L, representing the second and third label probability distributions for the ith word i KLT2 KL divergence, L, representing the first and third tag probability distributions for the ith word i CE1 Cross entropy loss, L, of the first label probability distribution and answer label representing the ith word i CE2 A cross entropy loss of the second label probability distribution representing the ith word and the answer label.
6. The method for training the syntactic error correction model of claim 5, wherein the loss function is calculated by the following formula:
Figure FDA0003656379930000031
where L represents a loss function, N represents the number of sentences of the training set, and M represents the number of words in a sentence.
7. The method for training the syntax error correction model of claim 1, wherein the original model comprises an encoder and a decoder;
the encoder consists of a Z layer independent encoding layer and a (K-Z) layer redundant encoding layer, and the independent encoding layer and the redundant encoding layer adopt a parameter sharing strategy to carry out model compression;
the decoder consists of a Z-layer independent decoding layer and a (K-Z) -layer redundant decoding layer, wherein the independent decoding layer and the redundant decoding layer adopt a parameter sharing strategy to perform model compression.
8. An apparatus for training a grammar error correction model, comprising:
the model building module is used for building an original model based on a Transformer;
the model training module is used for inputting a training set acquired in advance into the original model in each round of training and adjusting parameters in the original model by combining a moving average strategy; the training set comprises a plurality of training samples, the training samples comprise an original sentence consisting of a plurality of words and phrases and a target sentence corresponding to the original sentence and consisting of a plurality of labels, and the target sentence is obtained by carrying out grammatical error correction on the original sentence in advance;
and the model determining module is used for finishing the algorithm and taking the original model obtained in the last round of training as the optimal grammar error correction model when the training round reaches a preset time threshold value.
9. Training device of a syntactic error correction model, comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the training method of a syntactic error correction model according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the training method of the syntax error correction model according to any one of claims 1 to 7 when executing the computer program.
CN202210560454.9A 2022-05-23 2022-05-23 Training method, device, equipment and storage medium of grammar error correction model Active CN115062611B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210560454.9A CN115062611B (en) 2022-05-23 2022-05-23 Training method, device, equipment and storage medium of grammar error correction model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210560454.9A CN115062611B (en) 2022-05-23 2022-05-23 Training method, device, equipment and storage medium of grammar error correction model

Publications (2)

Publication Number Publication Date
CN115062611A true CN115062611A (en) 2022-09-16
CN115062611B CN115062611B (en) 2023-05-05

Family

ID=83198468

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210560454.9A Active CN115062611B (en) 2022-05-23 2022-05-23 Training method, device, equipment and storage medium of grammar error correction model

Country Status (1)

Country Link
CN (1) CN115062611B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767731A (en) * 2020-07-09 2020-10-13 北京猿力未来科技有限公司 Training method and device of grammar error correction model and grammar error correction method and device
CN111767717A (en) * 2020-05-13 2020-10-13 广东外语外贸大学 Indonesia grammar error correction method, device, equipment and storage medium
CN112597753A (en) * 2020-12-22 2021-04-02 北京百度网讯科技有限公司 Text error correction processing method and device, electronic equipment and storage medium
CN113204645A (en) * 2021-04-01 2021-08-03 武汉大学 Knowledge-guided aspect-level emotion analysis model training method
CN113345418A (en) * 2021-06-09 2021-09-03 中国科学技术大学 Multilingual model training method based on cross-language self-training
CN114154518A (en) * 2021-12-02 2022-03-08 泰康保险集团股份有限公司 Data enhancement model training method and device, electronic equipment and storage medium
WO2022077891A1 (en) * 2020-10-13 2022-04-21 苏州大学 Multi-labeled data-based dependency and syntactic parsing model training method and apparatus
US20220147715A1 (en) * 2019-05-16 2022-05-12 Huawei Technologies Co., Ltd. Text processing method, model training method, and apparatus

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220147715A1 (en) * 2019-05-16 2022-05-12 Huawei Technologies Co., Ltd. Text processing method, model training method, and apparatus
CN111767717A (en) * 2020-05-13 2020-10-13 广东外语外贸大学 Indonesia grammar error correction method, device, equipment and storage medium
CN111767731A (en) * 2020-07-09 2020-10-13 北京猿力未来科技有限公司 Training method and device of grammar error correction model and grammar error correction method and device
WO2022077891A1 (en) * 2020-10-13 2022-04-21 苏州大学 Multi-labeled data-based dependency and syntactic parsing model training method and apparatus
CN112597753A (en) * 2020-12-22 2021-04-02 北京百度网讯科技有限公司 Text error correction processing method and device, electronic equipment and storage medium
CN113204645A (en) * 2021-04-01 2021-08-03 武汉大学 Knowledge-guided aspect-level emotion analysis model training method
CN113345418A (en) * 2021-06-09 2021-09-03 中国科学技术大学 Multilingual model training method based on cross-language self-training
CN114154518A (en) * 2021-12-02 2022-03-08 泰康保险集团股份有限公司 Data enhancement model training method and device, electronic equipment and storage medium

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
NANKAI LIN ET.AL: "Unsupervised Character Embedding Correction and Candidate Word Denoising", 《IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING》 *
NEOWU ET.AL: "Deep Transformer Models for Time Series Forecasting:The Influenza Prevalence Case", 《ARXIV:2001.08317V1》 *
佚名: "深度学习中的滑动平均算法原理详解", 《HTTPS://BLOG.CSDN.NET/SINAT_36618660/ARTICLE/DETAILS/99896539》 *
王辰成 等: "基于Transformer增强架构的中文语法纠错方法", 《中文信息学报》 *
蒋盛益 等: "印尼语、马来语自然语言处理研究综述", 《模式识别与人工智能》 *

Also Published As

Publication number Publication date
CN115062611B (en) 2023-05-05

Similar Documents

Publication Publication Date Title
CN107516129B (en) Dimension self-adaptive Tucker decomposition-based deep network compression method
Strouse et al. The deterministic information bottleneck
CN108052512B (en) Image description generation method based on depth attention mechanism
EP3748545A1 (en) Sparsity constraints and knowledge distillation based learning of sparser and compressed neural networks
Bordes et al. SGD-QN: Careful quasi-Newton stochastic gradient descent
Setiono Feedforward neural network construction using cross validation
Seo et al. Semantics-native communication with contextual reasoning
Prakash et al. IoT device friendly and communication-efficient federated learning via joint model pruning and quantization
Kang et al. Learning multi-granular quantized embeddings for large-vocab categorical features in recommender systems
US11610124B2 (en) Learning compressible features
CN111177348B (en) Training method and device for problem generation model, electronic equipment and storage medium
CN111767697B (en) Text processing method and device, computer equipment and storage medium
CN109992785B (en) Content calculation method, device and equipment based on machine learning
CN112560456A (en) Generation type abstract generation method and system based on improved neural network
Ollivier Auto-encoders: reconstruction versus compression
CN111737406B (en) Text retrieval method, device and equipment and training method of text retrieval model
Su et al. Nonlinear statistical learning with truncated gaussian graphical models
CN117574429A (en) Federal deep learning method for privacy enhancement in edge computing network
Nagy et al. Privacy-preserving Federated Learning and its application to natural language processing
Parada-Mayorga et al. Convolutional filters and neural networks with non commutative algebras
Dekel From online to batch learning with cutoff-averaging
Deng et al. Adaptive federated learning with negative inner product aggregation
CN115062611A (en) Training method, device, equipment and storage medium of grammar error correction model
Huang et al. Flow of renyi information in deep neural networks
CN112330361B (en) Intelligent big data analysis design method oriented to online shopping user consumption habit

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant