WO2022121178A1 - 文本纠错模型训练方法、识别方法、装置及计算机设备 - Google Patents

文本纠错模型训练方法、识别方法、装置及计算机设备 Download PDF

Info

Publication number
WO2022121178A1
WO2022121178A1 PCT/CN2021/084171 CN2021084171W WO2022121178A1 WO 2022121178 A1 WO2022121178 A1 WO 2022121178A1 CN 2021084171 W CN2021084171 W CN 2021084171W WO 2022121178 A1 WO2022121178 A1 WO 2022121178A1
Authority
WO
WIPO (PCT)
Prior art keywords
words
word
soft mask
vector information
model
Prior art date
Application number
PCT/CN2021/084171
Other languages
English (en)
French (fr)
Inventor
邓悦
郑立颖
徐亮
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022121178A1 publication Critical patent/WO2022121178A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular, to a soft mask-based text error correction model training method, recognition method, apparatus, computer equipment, and computer-readable storage medium.
  • Text error correction has always been a scene of current natural language attention, for example, correcting errors in meeting minutes or government documents.
  • Text error correction grammar models currently used on the market are divided into two categories: machine learning models, which generate candidates and select the best candidates for replacement through error recognition; the other is deep learning models, which use sequence-to-sequence grammar correction. wrong way.
  • the inventor realized that the machine learning model cannot fit the data, resulting in a low accuracy rate; while the deep learning model requires a large amount of corpus, which has a huge demand for training corpus and a long training time.
  • the main purpose of this application is to provide a soft mask-based text error correction model training method, recognition method, device, computer equipment and computer-readable storage medium, aiming to solve the problem that existing machine learning models cannot fit data, As a result, the accuracy rate is low, and the deep learning model requires a large amount of corpus, which has a huge demand for training corpus and a technical problem of long training time.
  • a first aspect of the embodiments of the present application provides a soft mask-based text error correction model training method, which may include:
  • a second aspect of the embodiments of the present application provides a soft mask-based text error correction model recognition method, which may include:
  • word error correction will be performed on the text to be corrected, and the text after word error correction on the text to be corrected will be obtained, wherein the text error correction model is the above-mentioned soft mask-based
  • the text error correction model training method of the code is obtained.
  • a third aspect of the embodiments of the present application provides a soft mask-based text error correction model training device, including:
  • an acquisition and conversion module used for acquiring the text to be modified, and converting the text to be modified into word vector information of each word
  • an acquisition module configured to train a preset soft mask language model according to the word vector information of each of the words, and obtain a corresponding loss function
  • an update and determination module configured to update the model parameters of the preset soft mask language model based on the loss function, and determine whether the preset soft mask language model is in a convergent state
  • the generating module is configured to generate a corresponding text error correction model if it is determined that the preset soft mask language model is in a convergent state.
  • the present application also provides a soft mask-based text error correction model recognition device, including:
  • the first acquisition module is used to acquire the text to be corrected
  • the second obtaining module is configured to perform word error correction on the text to be corrected based on a text error correction model, and obtain the text after word error correction is performed on the text to be corrected, wherein the text is corrected for errors
  • the model is obtained by the above-mentioned soft mask-based text error correction model training method.
  • the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, wherein when the computer program is executed by a processor, the following steps are implemented:
  • the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, wherein the computer program implements the following steps when executed by a processor:
  • word error correction will be performed on the text to be corrected, and the text after word error correction on the text to be corrected will be obtained, wherein the text error correction model is the above-mentioned soft mask-based
  • the text error correction model training method of the code is obtained.
  • the present application also provides a computer device, the computer device comprising a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program is executed by the The processor implements the following steps when executing:
  • the present application further provides a computer device, the computer device comprising a processor, a memory, and a computer program stored on the memory and executable by the processor, wherein the computer program is executed by the The processor implements the following steps when executing:
  • word error correction will be performed on the text to be corrected, and the text after word error correction on the text to be corrected will be obtained, wherein the text error correction model is the above-mentioned soft mask-based
  • the text error correction model training method of the code is obtained.
  • the present application provides a soft mask-based text error correction model training method, recognition method, device, computer equipment and computer-readable storage medium, by acquiring the text to be modified and converting the text to be modified into the word vector information; train a preset soft mask language model according to the word vector information of each of the words, and obtain a corresponding loss function; update the model parameters of the preset soft mask language model based on the loss function, and determine Whether the preset soft mask language model is in a convergent state; if it is determined that the preset soft mask language model is in a convergent state, a corresponding text error correction model is generated, and the words are processed through the soft mask to achieve Without a large number of training predictions, not only the training time of the model is shortened, but also the data is fitted and the accuracy of the model is improved.
  • FIG. 1 is a schematic flowchart of a soft mask-based text error correction model training method provided by an embodiment of the present application
  • Fig. 2 is the sub-step flow chart of the text error correction model training method based on soft mask in Fig. 1;
  • Fig. 3 is the sub-step flow chart of the text error correction model training method based on soft mask in Fig. 1;
  • FIG. 4 is a schematic flowchart of a method for identifying a text error correction model based on a soft mask provided by an embodiment of the present application
  • FIG. 5 is a schematic block diagram of a soft mask-based text error correction model training apparatus provided by an embodiment of the present application.
  • FIG. 6 is a schematic block diagram of a soft mask-based text error correction model recognition device provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural block diagram of a computer device according to an embodiment of the present application.
  • the term “if” may be contextually interpreted as “when” or “once” or “in response to determining” or “in response to detecting” .
  • the phrases “if it is determined” or “if the [described condition or event] is detected” may be interpreted, depending on the context, to mean “once it is determined” or “in response to the determination” or “once the [described condition or event] is detected. ]” or “in response to detection of the [described condition or event]”.
  • Embodiments of the present application provide a soft mask-based text error correction model training method, recognition method, apparatus, computer equipment, and computer-readable storage medium.
  • the soft mask-based text error correction model training method and the soft mask-based text error correction model recognition method can be applied to computer equipment, and the computer equipment may be electronic equipment such as notebook computers and desktop computers.
  • FIG. 1 is a schematic diagram of the implementation flow of a soft mask-based text error correction model training method provided by an embodiment of the present application. As shown in the figure, the method may include the following steps:
  • Step S101 Acquire the text to be modified, and convert the text to be modified into word vector information of each word.
  • the text to be modified is obtained, and the text to be modified may be a short sentence, a short text, or the like.
  • the manner of obtaining the text to be modified includes obtaining from a preset storage path, for example, obtaining from a preset blockchain.
  • the text to be modified is converted into vector information of each word, wherein the conversion method includes a preset model or a preset soft mask language model.
  • the text to be modified is converted based on a preset model or a preset soft mask language model to obtain word vectors of each word of the text to be modified.
  • the text to be modified includes 10 words
  • the preset model is a pre-trained model
  • the length of the word vector is 768
  • the text to be modified is converted by the preset model
  • the length of each word is obtained as 768 word vector information, for example, word vector information is (10,768).
  • Step S102 Train a preset soft mask language model according to the word vector information of each of the words, and obtain a corresponding loss function.
  • a preset soft mask language model is trained by using the word vector information of each word to obtain a corresponding loss function.
  • the preset soft mask language model is trained by the word vector of each word, and the modification rate of each word and the replacement rate of each target word for each word are obtained.
  • a corresponding loss function is obtained through the modification rate and replacement rate of each word, wherein the replacement rate is the replacement rate of each word corresponding to each replaced word.
  • step S102 includes: sub-step S1021 to sub-step S1027 .
  • Sub-step S1021 according to the detection network model and the word vector information of each of the words, obtain the soft mask component information of each of the words.
  • word vector information of each word of the text to be modified is respectively input into a preset soft mask language model, where the preset soft mask model includes a detection network model and a modifier network model.
  • the word vector information of each input word is processed through the hidden layer of the detection network model to obtain the soft mask component information of each word, wherein the soft mask component information is obtained by the mask in the hidden layer Component information for each word.
  • the detection network model includes a two-way gate recurrent neural network
  • the two-way gate recurrent neural network includes a forward-gate recurrent neural network and a backward-gate recurrent neural network
  • the word vector information of the words, and obtaining the soft mask component information of each of the words including: based on the forward gate recurrent neural network and the word vector information of each of the words, obtaining each of the words
  • the first final hidden layer vector information corresponding to the word vector information of the word based on the backward gate recurrent neural network and the word vector information of each of the words, obtain the corresponding word vector information of each of the words
  • the second final hidden layer vector information of each said word according to combining the first final hidden layer vector information of each said word and the second final hidden layer vector information of each said word, the soft mask component of each said word is obtained. information.
  • the detection network model includes a two-way gate recurrent neural network
  • the two-way gate recurrent neural network includes a forward gate recurrent neural network and a backward gate recurrent neural network, wherein the function of the two-way gate recurrent neural network is to obtain each word to obtain the soft mask component information of each word.
  • the first final hidden layer vector information corresponding to the word vector information of each word is obtained through the forward gate recurrent neural network and the word vector information of each word.
  • the forward gate recurrent neural network model includes an update gate, a reset gate, a candidate hidden state and a hidden state.
  • the update gate controls the update ratio of the output of the entire gate unit in the current layer.
  • the first value output by the update gate is obtained.
  • the first value includes 1 and 0, where 1 It means that the output is determined by the current hidden layer information, that is, completely updated; 0 means that the current output is forgotten, and the output is determined by the previous hidden layer information, and does not need to be updated or completely updated.
  • update the gate formula by preset Get the first value corresponding to the update gate, where, is the first value corresponding to the update gate, ⁇ is the preset sigmoid activation function, is the hidden output of the previous layer, x t is the word vector information of the word, and is a preset constant.
  • the reset gate controls the utilization ratio of the previous information from the previous layer. 1 represents complete utilization, that is, no reset at all, and 0 represents not fully utilized, that is, it needs to be reset.
  • word vector information to obtain the second value corresponding to the reset gate.
  • reset gate formula by preset Obtain the second value corresponding to the reset gate, wherein, is the second value corresponding to the reset gate, ⁇ is the preset sigmoid activation function, is the hidden output of the previous layer, x t is the word vector information of the word, and is a preset constant.
  • the subsequent hidden state calculation is assisted by the candidate hidden state, and the information of the current candidate hidden state is obtained through the candidate hidden state, the second value and the word vector information of each word. For example, if the element value in the reset gate is close to 0, the hidden state element corresponding to the reset gate is 0, that is, the hidden state of the previous time step is discarded. If the element value is close to 1, it means that the hidden state of the previous time step is retained. Then, the result of the element-wise multiplication is concatenated with the input of the current time step, and the candidate hidden state is calculated through the fully connected layer with the activation function tan, and the value range of all its elements is [-1, 1].
  • candidate hidden state formula is the information of the current candidate hidden state, is the second value corresponding to the reset gate, is the hidden output of the previous layer, x t is the word vector information of the word, and is a preset constant, and ⁇ is the symbol for element-wise multiplication.
  • the hidden state is the candidate information output by the hidden layer of this layer, and the candidate information output by the hidden layer of this layer is obtained by updating the first value corresponding to the gate and the information of the candidate hidden state.
  • the candidate information output by the hidden layer of this layer is obtained by updating the first value corresponding to the gate and the information of the candidate hidden state.
  • the hidden state formula in is the candidate information output by the hidden layer of this layer.
  • To update the first value corresponding to the gate is the hidden output of the previous layer, is the information of the current candidate hidden state, ⁇ is the element multiplication symbol
  • the candidate information output by the hidden layer of this layer is the first final hidden layer vector information.
  • the backward gate recurrent neural network model includes an update gate, a reset gate, a candidate hidden state and a hidden state.
  • the update gate controls the update ratio of the output of the entire gate unit in the current layer.
  • the first value output by the update gate is obtained.
  • the first value includes 1 and 0, where 1 Indicates that the output is determined by the current hidden layer information, that is, completely updated to 0, indicating that the current output is forgotten, and the output is determined by the previous hidden layer information, and does not need to be updated or completely updated.
  • update the gate formula by preset Get the first value corresponding to the update gate, where, is the first value corresponding to the update gate, ⁇ is the preset sigmoid activation function, is the hidden output of the previous layer, x t is the word vector information of the word, and is a preset constant.
  • the reset gate controls the utilization ratio of the subsequent information from the previous layer. 1 represents complete utilization, that is, no reset at all, and 0 represents not fully utilized, that is, it needs to be reset.
  • the word vector information of obtains the second value corresponding to the reset gate.
  • reset gate formula by preset Obtain the second value corresponding to the reset gate, wherein, is the second value corresponding to the reset gate, ⁇ is the preset sigmoid activation function, is the hidden output of the previous layer, x t is the word vector information of the word, and is a preset constant.
  • the subsequent hidden state calculation is assisted by the candidate hidden state, and the information of the current candidate hidden state is obtained through the candidate hidden state, the second value and the word vector information of each word. For example, if the element value in the reset gate is close to 0, the hidden state element corresponding to the reset gate is 0, that is, the hidden state of the previous time step is discarded. If the element value is close to 1, it means that the hidden state of the previous time step is retained. Then, the result of the element-wise multiplication is concatenated with the input of the current time step, and the candidate hidden state is calculated through the fully connected layer with the activation function tan, and the value range of all its elements is [-1, 1].
  • candidate hidden state formula is the information of the current candidate hidden state, is the second value corresponding to the reset gate, is the hidden output of the previous layer, x t is the word vector information of the word, and is a preset constant, and ⁇ is the symbol for element-wise multiplication.
  • the hidden state is the candidate information output by the hidden layer of this layer, and the candidate information output by the hidden layer of this layer is obtained by updating the first value corresponding to the gate and the information of the candidate hidden state.
  • the candidate information output by the hidden layer of this layer is obtained by updating the first value corresponding to the gate and the information of the candidate hidden state.
  • the hidden state formula in is the candidate information output by the hidden layer of this layer.
  • To update the first value corresponding to the gate is the hidden output of the previous layer, is the information of the current candidate hidden state, ⁇ is the element multiplication symbol
  • the candidate information output by the hidden layer of this layer is the second final hidden layer vector information.
  • the first final hidden layer of each word is obtained.
  • the vector information and the second final hidden layer vector information of each word are combined to obtain the soft mask component information of each word. For example, by combining formulas in is the first final hidden layer vector information of each word, is the second final hidden layer vector information of each word, is the soft mask component information of each word.
  • Sub-step S1022 based on the soft mask component information of each of the words and the first preset activation function, obtain the soft mask modification probability of each of the words.
  • the first preset activation function is in the first preset activation layer
  • the first preset activation function is a Softmax activation function
  • the Softmax activation function is used for more than one output neuron to ensure that the output neuron
  • the sum is 1.0
  • the output is generally a probability value less than 1.
  • the soft mask component information of each word is respectively input into the first preset activation layer, and the soft mask modification probability of each word is obtained through the Softmax activation function in the first preset activation layer.
  • Sub-step S1023 Obtain soft mask coverage vector information of each of the words according to the soft mask modification probability of each of the words and the word vector information of each of the words.
  • Sub-step S1024 Obtain a first loss function of the detection network model based on the soft mask probability of each of the words.
  • the first loss function corresponding to the detection network model is obtained through the first preset loss function and the soft mask probability of each word.
  • X is the preset given sequence
  • n is the preset length of the preset given sequence X
  • X) is the soft mask probability corresponding to the i-th word output by the detection network model, and we get The first loss function corresponding to the detection network.
  • Sub-step S1025 Obtain the replacement probability of each of the words corresponding to the target word according to the modifier network model and the soft mask coverage vector information of each of the words.
  • the obtained soft mask coverage vector information of each word is input into the modifier network model, and the input soft mask coverage vector information of each word is processed by the modifier network model, Output the replacement probability of each word corresponding to the target word, and the target word is the replacement word of the word, wherein the modifier network model includes an attention mechanism, and the attention mechanism can be a dot-multiply attention mechanism, or is the multi-head attention mechanism.
  • the soft mask coverage vector information of each input word is processed through the dot product attention mechanism and/or the multi-head attention mechanism, so as to obtain the attention vector information corresponding to each word.
  • the modifier network model also includes a second preset activation function in a preset linear layer, the second preset activation function is a Softmax activation function, and the Softmax activation function is used for more than one output
  • the neurons of ensure that the sum of the output neurons is 1.0, and generally output a probability value less than 1.
  • Input the attention vector information corresponding to each word into the preset linear layer calculate the attention vector information corresponding to each word through the Softmax activation function in the preset linear layer, and obtain each word corresponding to each target word word replacement probability.
  • the modifier network model includes an attention mechanism, and the attention mechanism includes a point product attention mechanism and a multi-head attention mechanism; Soft mask coverage vector information to obtain the replacement probability of each of the words corresponding to the target word, including: obtaining each of the described words according to the dot product attention mechanism and the soft mask coverage vector information of each of the words The multi-head attention vector information of the word; according to the multi-head attention mechanism and the dot-multiply attention vector information of each of the words, the multi-head attention vector information of each word is obtained; based on the multi-head attention of each word The vector information and the preset linear layer are used to obtain the replacement probability of each said word corresponding to the target word.
  • the modifier network model includes an attention mechanism, and the attention mechanism includes a point product attention mechanism and a multi-head attention mechanism, the point product attention mechanism and the multi-head attention mechanism are 12 layers, and each layer includes at least A multi-head attention mechanism and at least one multi-head attention mechanism.
  • the dot product attention mechanism pre-sets three vectors for the soft mask coverage vector of each word.
  • the three vectors are the search vector Q (Query), the importance vector k (Key) and the scoring vector v (Value).
  • the search vector Q and the importance vector k will be multiplied to calculate the score of each word. After dividing the score of each word by the square root of the preset word dimension of each current word, each word is obtained. The word's importance score, which is how much importance needs to be placed on the word.
  • the importance score of each word For example, by preset importance formula Obtain the importance score of each word, where Score is the importance score of each word, Q is the search vector of each word, k is the importance vector of each word, T is the preset transfer multiplication, d k is Preset term dimensions for each current term.
  • Score is the importance score of each word
  • Q is the search vector of each word
  • k is the importance vector of each word
  • T is the preset transfer multiplication
  • d k Preset term dimensions for each current term.
  • the dot product attention mechanism calculates the importance of different words in a sentence through three different vectors.
  • the multi-head attention mechanism is introduced.
  • the multi-head attention mechanism is a number of different point-multiply attention mechanisms.
  • the parameters in the multi-head attention mechanism are not shared with each other.
  • different attention mechanisms are used to focus on different information.
  • the attention mechanism will eventually analyze sentences from different perspectives.
  • Each dot product attention mechanism will obtain the output of a word vector, for example, there are 10 different dot product attention mechanisms, and the dimension of the word vector is 768, the final obtained vector size is (768, 10), and Compress the vector size (768, 10) to 1, and the resulting vector is (768, 1).
  • the first output vector of the multi-head attention mechanism When the first output vector of the multi-head attention mechanism is obtained, the first output vector of the output multi-head attention mechanism is connected to a forward feedback network, and several layers of different attention mechanisms are used in the modifier network model. Stacking, for example, when the modifier network model uses 12 layers, each layer is fed into the next layer through a multi-head attention mechanism plus a feed-forward network. For example, the Relu activation function layer is first input through a linear fully connected layer, and then the next layer is input through the linear fully connected layer, and the Relu activation function layer includes the Relu activation function.
  • W 1 , b 1 , W 2 , b 2 is the preset parameter
  • X is the first vector output by the multi-head attention mechanism
  • FN(X) is the second vector output by the multi-head attention mechanism.
  • the second vector of each word output by the multi-head attention mechanism is added to the word vector information of each word to obtain the residual vector of each word, and the residual vector is used as the multi-head attention vector of each word. information.
  • Input the obtained multi-head attention vector information of each word into the preset linear layer calculate the multi-head attention vector information of each word through the preset
  • the second loss function corresponding to the modifier network model is obtained through the second preset loss function and the replacement probability of each word corresponding to the target word.
  • X is the preset given sequence
  • n is the preset length of the preset given sequence X
  • X) is the replacement of the ith word output by the modifier network model corresponding to each target word probability
  • the second loss function corresponding to the modifier network is obtained.
  • Sub-step S1027 Obtain a third loss function of the preset soft mask language model according to the first loss function and the second loss function.
  • a third loss function of the preset soft mask language model is obtained through a third preset loss function formula.
  • the gradient value of the neural network model, the preset parameter of ⁇ is 0.8.
  • Step S103 Update the model parameters of the preset soft mask language model based on the loss function, and determine whether the preset soft mask language model is in a convergent state.
  • the corresponding gradient value is calculated based on the loss function, and the model parameters in the preset soft mask language model are optimized by the gradient value.
  • the gradient value optimizes the variable parameters in the preset soft mask language model.
  • After updating the model parameters of the preset soft mask language model through the loss function it is determined whether the preset soft mask language model is in a convergent state. For example, when the corresponding loss function is obtained, the gradient value corresponding to the loss function is obtained, and the gradient value is compared with the preset gradient value. If the gradient value is less than or equal to the preset gradient value, the preset soft value is determined.
  • the masked language model is in a state of convergence. Or, obtain the variable parameter of the updated preset soft-mask language model, compare the variable parameter with the preset variable parameter, and determine the preset soft-mask language model if the variable parameter is smaller than the preset variable parameter is in a convergent state.
  • step S103 includes: sub-step S1031 to sub-step S1032.
  • Sub-step S1031 update the model parameters of the detection network model based on the first loss function.
  • X is the preset given sequence
  • n is the preset length of the preset given sequence X
  • X) is the soft mask modification probability corresponding to the i-th word output by the detection network model.
  • X) are substituted into the first loss function, and the gradient value L d of the current detection network model is obtained .
  • the model parameters of the detection network model are optimized.
  • Sub-step S1032 Update model parameters of the modifier network model based on the second loss function.
  • X is the preset given sequence
  • n is the preset length of the preset given sequence X
  • X) is the replacement of the ith word output by the modifier network model corresponding to each target word probability, substitute the X, n, p c (y i
  • the model parameters are optimized.
  • Sub-step S1033 Update model parameters of the preset soft mask language model based on the third loss function.
  • Step S104 if it is determined that the preset soft mask language model is in a convergent state, generate a corresponding text error correction model.
  • the preset soft mask language model For example, if it is determined that the preset soft mask language model is in a convergent state, then the preset soft mask language model generates a corresponding text error correction model, and the text error correction model can identify typos in the text, and Predict the replacement word corresponding to the typo to complete text error correction.
  • a preset soft mask language model is trained through the text to be modified to obtain a corresponding loss function, and the model parameters of the preset soft mask language model are optimized through the loss function to generate corresponding text
  • the error correction model uses soft masks to process words, which not only shortens the training time of the model, but also fits the data and improves the accuracy of the model without requiring a lot of training expectations.
  • FIG. 1 is a schematic flowchart of a method for identifying a text error correction model based on a soft mask according to an embodiment of the present application.
  • the method for identifying a text error correction model based on a soft mask includes steps S201 to S202.
  • Step S201 acquiring the text to be corrected.
  • the to-be-corrected text is obtained, the to-be-corrected text may be text without word errors, and the to-be-corrected text includes short sentence text and the like.
  • Step S202 word error correction will be performed on the text to be corrected based on a text error correction model, and the text after word error correction has been performed on the text to be corrected is obtained, wherein the text error correction model is the above-mentioned text error correction model.
  • the text error correction model training method based on soft mask is obtained
  • the text to be corrected is acquired, the text to be corrected is input into a text error correction model, and the text to be corrected is converted into vector information corresponding to each word.
  • the text error correction model includes a detection network model and a modifier network model, and the vector information of each word is detected by the detection network model to determine whether each word needs to be modified.
  • the vector information of each word is processed by the two-way gate recurrent neural network model in the detection network model to obtain the soft mask modification probability of each word, and each word is determined by the soft mask modification probability of each word. need to be modified.
  • the soft mask coverage rate vector information of the modified words is obtained through the soft mask modification probability of the modified words and the vector information of the modified words.
  • the modified word is processed through the modifier network, and the replacement rate of the modified word to the corresponding replacement word is obtained.
  • the soft mask coverage vector information of the modified word is processed through the dot product attention mechanism and the head-to-head attention mechanism in the modifier network model to obtain the replacement rate of the modified word to the corresponding replacement word.
  • the replacement rate of the modified word to the corresponding replacement word determines the replacement word that replaces the modified word.
  • the replacement word is replaced with the modified word in the text to be corrected to obtain the error-corrected text.
  • word error correction is performed on the text to be corrected by the text error correction model to obtain the text after word error correction, and the detection network model and the modifier network model in the text error correction model are used to quickly and accurately to get word-corrected text
  • FIG. 5 is a schematic block diagram of a soft mask-based text error correction model training apparatus provided by an embodiment of the present application.
  • the soft mask-based text error correction model training device 400 includes: an acquisition and conversion module 401 , an acquisition module 402 , an update and determination module 403 , and a generation module 404 .
  • the acquisition and conversion module 401 is used to acquire the text to be modified, and convert the text to be modified into word vector information of each word;
  • An obtaining module 402 configured to train a preset soft mask language model according to the word vector information of each of the words, and obtain a corresponding loss function
  • an update and determination module 403 configured to update the model parameters of the preset soft mask language model based on the loss function, and determine whether the preset soft mask language model is in a convergent state;
  • the generating module 404 is configured to generate a corresponding text error correction model if it is determined that the preset soft mask language model is in a convergent state.
  • the obtaining module 402 is also specifically used for:
  • the detection network model and the word vector information of each of the words obtain the soft mask component information of each of the words
  • the replacement probability of each of the words corresponding to the target word is obtained;
  • the obtaining module 402 is also specifically used for:
  • the soft mask component information of each of the words is obtained.
  • the obtaining module 402 is also specifically used for:
  • the replacement probability of each of the words corresponding to the target word is obtained, including:
  • the dot product attention vector information of each of the words is obtained;
  • the multi-head attention vector information of each word is obtained;
  • the replacement probability of each of the words corresponding to the target word is obtained.
  • update and determination module 403 is also specifically used for:
  • the model parameters of the preset soft mask language model are updated based on the third loss function.
  • FIG. 6 is a schematic block diagram of an apparatus for identifying a text error correction model based on a soft mask according to an embodiment of the present application.
  • the apparatus 500 for identifying a text error correction model based on a soft mask includes: a first obtaining module 501 and a second obtaining module 502 .
  • the first obtaining module 501 is used to obtain the text to be corrected
  • the second obtaining module 502 is configured to perform word error correction on the text to be corrected based on a text error correction model, and obtain the text after word error correction is performed on the text to be corrected, wherein the text correction
  • the error model is obtained by the above-mentioned soft mask-based text error correction model training method.
  • the apparatuses provided by the above embodiments may be implemented in the form of a computer program, and the computer program may be executed on the computer device as shown in FIG. 7 .
  • FIG. 7 is a schematic block diagram of the structure of a computer device according to an embodiment of the present application.
  • the computer device may be a terminal.
  • the computer device includes a processor, a memory, and a network interface connected by a system bus, wherein the memory may include a non-volatile storage medium and an internal memory.
  • the nonvolatile storage medium can store operating systems and computer programs.
  • the computer program includes program instructions, which, when executed, can cause the processor to execute any one of the soft mask-based text error correction model training methods and the soft mask-based text error correction model recognition methods.
  • the processor is used to provide computing and control capabilities to support the operation of the entire computer equipment.
  • the internal memory provides an environment for running a computer program in a non-volatile storage medium.
  • the processor can execute any soft mask-based text error correction model training method and soft mask-based text error correction model training method. Code text error correction model recognition method.
  • the network interface is used for network communication, such as sending assigned tasks.
  • the network interface is used for network communication, such as sending assigned tasks.
  • FIG. 7 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.
  • the processor may be a central processing unit (Central Processing Unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (Digital Signal Processors, DSP), application specific integrated circuits (Application Specific Integrated circuits) Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor can be a microprocessor or the processor can also be any conventional processor or the like.
  • the processor is configured to run a computer program stored in the memory to implement the following steps:
  • the preset soft mask language model includes a detection network model and a modifier network model, and the loss function includes a first loss function, a second loss function, and a third loss function;
  • the detection network model and the word vector information of each of the words obtain the soft mask component information of each of the words
  • the replacement probability of each of the words corresponding to the target word is obtained;
  • the detection network model includes a two-way gate recurrent neural network, and the two-way gate recurrent neural network includes a forward gate recurrent neural network and a backward gate recurrent neural network; the detection network model and each When the word vector information of the word is obtained, when the soft mask component information of each of the words is obtained, the processor is used to realize:
  • the soft mask component information of each of the words is obtained.
  • the modifier network model includes an attention mechanism, and the attention mechanism includes a point product attention mechanism and a multi-head attention mechanism; Soft mask coverage vector information, when obtaining the replacement probability of each of the words corresponding to the target word, the processor is used to achieve:
  • the dot product attention vector information of each of the words is obtained;
  • the multi-head attention vector information of each word is obtained;
  • the replacement probability of each of the words corresponding to the target word is obtained.
  • the processor when the model parameters of the preset soft mask language model are updated based on the loss function, the processor is configured to:
  • the model parameters of the preset soft mask language model are updated based on the third loss function.
  • the processor is configured to run a computer program stored in the memory to implement the following steps:
  • word error correction will be performed on the text to be corrected, and the text after word error correction on the text to be corrected will be obtained, wherein the text error correction model is the above-mentioned soft mask-based
  • the text error correction model training method of the code is obtained.
  • Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, the computer program includes program instructions, and the method implemented when the program instructions are executed may refer to this document Various embodiments of a soft mask-based text error correction model training method and a soft mask-based text error correction model recognition method are applied.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the computer-readable storage medium may be an internal storage unit of the computer device described in the foregoing embodiments, such as a hard disk or a memory of the computer device.
  • the computer-readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk equipped on the computer device, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) ) card, Flash Card, etc.
  • the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function, and the like; The data created by the use of the node, etc.
  • the blockchain referred to in this application is a new application mode of computer technologies such as storage, point-to-point transmission, consensus mechanism, and encryption algorithm of preset soft-mask language model models.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

本申请适用于人工智能技术领域,提供了一种基于软掩码的文本纠错模型训练方法、识别方法、装置、计算机设备及计算机可读存储介质,该方法包括:通过获取待修改文本,并将所述待修改文本转换为各个字词的字词向量信息;根据各个所述字词的字词向量信息训练预置软掩码语言模型,获取对应的损失函数;基于损失函数更新所述预置软掩码语言模型的模型参数,并确定所述预置软掩码语言模型是否处于收敛状态;若确定所述预置软掩码语言模型处于收敛状态,则生成对应的文本纠错模型,通过软掩码对字词进行处理,实现了在不需要大量训练预料的情况下,不仅缩短模型的训练时长,还对数据进行拟合,并提高了模型的准确率。

Description

文本纠错模型训练方法、识别方法、装置及计算机设备
本申请要求于2020年12月11日提交中国专利局、申请号为202011453441.9、发明名称为“文本纠错模型训练、识别方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种基于软掩码的文本纠错模型训练方法、识别方法、装置、计算机设备及计算机可读存储介质。
背景技术
文本纠错一直是当前自然语言比较关注的一个场景,例如,对会议的纪要或者政府的公文进行纠错。目前市面上使用的文本纠错语法模型分成两大类:机器学习模型,通过错误识别,将候选生成和选择最佳候选进行替换;另外一种则是深度学习模型,通过序列到序列的语法纠错方式。发明人意识到机器学习模型无法对数据进行拟合,导致准确率较低;而深度学习模型需要大量的语料,对于训练语料量有巨大的需求同时训练时间长。
技术问题
本申请的主要目的在于提供一种基于软掩码的文本纠错模型训练方法、识别方法、装置、计算机设备及计算机可读存储介质,旨在解决现有机器学习模型无法对数据进行拟合,导致准确率较低,而深度学习模型需要大量的语料,对于训练语料量有巨大的需求同时训练时间长的技术问题。
技术解决方案
为解决上述技术问题,本申请实施例采用的技术方案是:
本申请实施例的第一方面,提供了一种基于软掩码的文本纠错模型训练方法,可以包括:
获取待修改文本,并将所述待修改文本转换为各个字词的字词向量信息;
根据各个所述字词的字词向量信息训练预置软掩码语言模型,获取对应的损失函数;
基于损失函数更新所述预置软掩码语言模型的模型参数,并确定所述预置软掩码语言模型是否处于收敛状态;
若确定所述预置软掩码语言模型处于收敛状态,则生成对应的文本纠错模型。
本申请实施例的第二方面,提供了一种基于软掩码的文本纠错模型识别方法,可以包括:
获取待纠错文本;
基于文本纠错模型对所述待纠错文本将进行字词纠错,获取对所述待纠错文本进行字词纠错后的文本,其中,所述文本纠错模型为上述的基于软掩码的文本纠错模型训练方法得到的。
本申请实施例的第三方面,提供了一种基于软掩码的文本纠错模型训练装置,包括:
获取及转换模块,用于获取待修改文本,并将所述待修改文本转换为各个字词的字词向量信息;
获取模块,用于根据各个所述字词的字词向量信息训练预置软掩码语言模型,获取对应的损失函数;
更新及确定模块,用于基于损失函数更新所述预置软掩码语言模型的模型参数,并确定所述预置软掩码语言模型是否处于收敛状态;
生成模块,用于若确定所述预置软掩码语言模型处于收敛状态,则生成对应的文本纠错模型。
第四方面,本申请还提供一种基于软掩码的文本纠错模型识别装置,包括:
第一获取模块,用于获取待纠错文本;
第二获取模块,用于基于文本纠错模型对所述待纠错文本将进行字词纠错,获取对所述待纠错文本进行字词纠错后的文本,其中,所述文本纠错模型为上述的基于软掩码的文本纠错模型训练方法得到的。
第五方面,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,其中所述计算机程序被处理器执行时实现如下步骤:
获取待修改文本,并将所述待修改文本转换为各个字词的字词向量信息;
根据各个所述字词的字词向量信息训练预置软掩码语言模型,获取对应的损失函数;
基于损失函数更新所述预置软掩码语言模型的模型参数,并确定所述预置软掩码语言模型是否处于收敛状态;
若确定所述预置软掩码语言模型处于收敛状态,则生成对应的文本纠错模型。
第六方面,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,其中所述计算机程序被处理器执行时实现如下步骤:
获取待纠错文本;
基于文本纠错模型对所述待纠错文本将进行字词纠错,获取对所述待纠错文本进行字词纠错后的文本,其中,所述文本纠错模型为上述的基于软掩码的文本纠错模型训练方法得到的。
第七方面,本申请还提供一种计算机设备,所述计算机设备包括处理器、存储器、以及存储在所述存储器上并可被所述处理器执行的计算机程序,其中所述计算机程序被所述处理器执行时实现如下步骤:
获取待修改文本,并将所述待修改文本转换为各个字词的字词向量信息;
根据各个所述字词的字词向量信息训练预置软掩码语言模型,获取对应的损失函数;
基于损失函数更新所述预置软掩码语言模型的模型参数,并确定所述预置软掩码语言模型是否处于收敛状态;
若确定所述预置软掩码语言模型处于收敛状态,则生成对应的文本纠错模型。
第八方面,本申请还提供一种计算机设备,所述计算机设备包括处理器、存储器、以及存储在所述存储器上并可被所述处理器执行的计算机程序,其中所述计算机程序被所述处理器执行时实现如下步骤:
获取待纠错文本;
基于文本纠错模型对所述待纠错文本将进行字词纠错,获取对所述待纠错文本进行字词纠错后的文本,其中,所述文本纠错模型为上述的基于软掩码的文本纠错模型训练方法得到的。
有益效果
本申请提供一种基于软掩码的文本纠错模型训练方法、识别方法、装置、计算机设备及计算机可读存储介质,通过获取待修改文本,并将所述待修改文本转换为各个字词的字词向量信息;根据各个所述字词的字词向量信息训练预置软掩码语言模型,获取对应的损失函数;基于损失函数更新所述预置软掩码语言模型的模型参数,并确定所述预置软掩码语言模型是否处于收敛状态;若确定所述预置软掩码语言模型处于收敛状态,则生成对应的文本纠错模型,通过软掩码对字词进行处理,实现了在不需要大量训练预料的情况下,不仅缩短模型的训练时长,还对数据进行拟合,并提高了模型的准确率。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一种基于软掩码的文本纠错模型训练方法的流程示意图;
图2为图1中的基于软掩码的文本纠错模型训练方法的子步骤流程示意图;
图3为图1中的基于软掩码的文本纠错模型训练方法的子步骤流程示意图;
图4为本申请实施例提供的一种基于软掩码的文本纠错模型识别方法的流程示意图;
图5为本申请实施例提供的一种基于软掩码的文本纠错模型训练装置的示意性框图;
图6为本申请实施例提供的一种基于软掩码的文本纠错模型识别装置的示意性框图;
图7为本申请一实施例涉及的计算机设备的结构示意框图。
本申请的实施方式
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本申请实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本申请的描述。
应当理解,当在本说明书和所附权利要求书中使用时,术语“包括”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。
还应当理解,在此本申请说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本申请。如在本申请说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。
还应当进一步理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。
如在本说明书和所附权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。
本申请实施例提供一种基于软掩码的文本纠错模型训练方法、识别方法、装置、计算机设备及计算机可读存储介质。其中,该基于软掩码的文本纠错模型训练方法和基于软掩码的文本纠错模型识别方法可应用于计算机设备中,该计算机设备可以是笔记本电脑、台式电脑等电子设备。
为了说明本申请所述的技术方案,下面通过具体实施例来进行说明。
图1是本申请实施例提供的一种基于软掩码的文本纠错模型训练方法的实现流程示意图,如图所示,所述方法可以包括以下步骤:
步骤S101、获取待修改文本,并将所述待修改文本转换为各个字词的字词向量信息。
示范性的,获取待修改文本,该待修改文本可以是短句、短文本等。其中,获取该待修改文本的方式包括从预置存储路径中获取,例如,从预置区块链中获取。在获取到待修改文本时,将该待修改文本转换为各个字词的向量信息,其中转换的方式包括预置模型或预置软掩码语言模型。例如,基于预置模型或预置软掩码语言模型对该待修改文本进行转换,得到该待修改文本的各个字词的字词向量。例如,该待修改文本包括10个字词,该预置模型为预先训练好的模型,字词向量的长度为768,通过该预置模型对该待修改文本进行转换,得到各个字词长度为768的字词向量信息,例如字词向量信息为(10,768)。
步骤S102、根据各个所述字词的字词向量信息训练预置软掩码语言模型,获取对应的损失函数。
示例性的,在获取到待修改文本的各个字词的字词向量时,通过各个字词的字词向量信息训练预置软掩码语言模型,得到对应的损失函数。例如,通过该各个字词的字词向量训练预置软掩码语言模型,获取到该各个字词的修改率和各个目标字词替换各个字词的替换率。通过该各个字词的修改率和替换率,得到对应的损失函数,其中,该替换率为各个字词对应各个替换单词的替换率。
在一实施例中,具体地,参照图2,步骤S102包括:子步骤S1021至子步骤S1027。
子步骤S1021、根据所述检测网络模型和各个所述字词的字词向量信息,得到各个所 述字词的软掩码分量信息。
示范性的,将该待修改文本的各个字词的字词向量信息分别输入到预置软掩码语言模型中,该预置软掩码模型包括检测网络模型和修改器网络模型。通过该检测网络模型的隐藏层对输入的各个字词的字词向量信息进行处理,得到该各个字词的软掩码分量信息,其中,该软掩码分量信息为通过隐藏层中的掩码对各个字词的分量信息。
在一实施例中,所述检测网络模型包括双向门递归神经网络,所述双向门递归神经网络包括前向门递归神经网络和后向门递归神经网络;所述根据所述检测网络模型和各个所述字词的字词向量信息,得到各个所述字词的软掩码分量信息,包括:基于所述前向门递归神经网络和各个所述字词的字词向量信息,获取各个所述字词的字词向量信息对应的第一最终隐层向量信息;基于所述后向门递归神经网络和各个所述字词的字词向量信息,获取各个所述字词的字词向量信息对应的第二最终隐层向量信息;根据合并各个所述字词的第一最终隐层向量信息和各个所述字词的第二最终隐层向量信息,得到各个所述字词的软掩码分量信息。
示范性的,该检测网络模型包括双向门递归神经网络,该双向门递归神经网络包括前向门递归神经网络和后向门递归神经网络,其中该双向门递归神经网络的作用是获取各个字词的上下文信息,从而得到各个字词的软掩码分量信息。示范例的,通过前向门递归神经网络和各个字词的字词向量信息,得到各个字词的字词向量信息对应的第一最终隐层向量信息。其中,该前向门递归神经网络模型包括更新门、重置门、候选隐藏状态和隐藏状态。
更新门控制整个门单元在当前层往后输出的更新比例,通过更新门和各个字词的字词向量信息,得到该更新门输出的第一数值,该第一数值包括1和0,其中1表示输出由当前的隐层信息决定,即完全更新;0,表示当前的输出被遗忘,输出由之前的隐层信息决定,不需要更新或不需要完全更新。例如,通过预置更新门公式
Figure PCTCN2021084171-appb-000001
得到更新门对应的第一数值,其中,
Figure PCTCN2021084171-appb-000002
为更新门对应的第一数值,σ为预置sigmoid激活函数,
Figure PCTCN2021084171-appb-000003
为上一层的隐藏输出,x t为字词的字词向量信息,
Figure PCTCN2021084171-appb-000004
Figure PCTCN2021084171-appb-000005
为预置常量。
通过重置门控制着上一层传来的前文信息利用的比例,1代表完全利用,即完全不重置,0代表没有完全利用,即需要重置,通过该重置门和各个字词的字词向量信息,得到该重置门对应的第二数值。例如,通过预置重置门公式
Figure PCTCN2021084171-appb-000006
得到该重置门对应的第二数值,其中,
Figure PCTCN2021084171-appb-000007
为重置门对应的第二数值,σ为预置sigmoid激活函数,
Figure PCTCN2021084171-appb-000008
为上一层的隐藏输出,x t为字词的字词向量信息,
Figure PCTCN2021084171-appb-000009
Figure PCTCN2021084171-appb-000010
为预置常量。
通过候选隐藏状态来辅助之后的隐藏状态计算,通过该候选隐藏状态、第二数值以及各个字词的字词向量信息,得到当前候选隐藏状态的信息。例如,如果重置门中元素值接近0,该重置门对应隐藏状态元素为0,即丢弃上一时间步的隐藏状态。如果元素值接近1,那么表示保留上一时间步的隐藏状态。然后,将按元素乘法的结果与当前时间步的输入连结,再通过含激活函数tan的全连接层计算出候选隐藏状态,其所有元素的值域为[-1,1]。通过候选隐藏状态公式
Figure PCTCN2021084171-appb-000011
为当前候选隐藏状态的信息,
Figure PCTCN2021084171-appb-000012
为重置门对应的第二数值,
Figure PCTCN2021084171-appb-000013
为上一层的隐藏输出,x t为字词的字词向量信息,
Figure PCTCN2021084171-appb-000014
Figure PCTCN2021084171-appb-000015
为预置常量,Θ为元素相乘符号。
隐藏状态为本层隐藏层输出的候选信息,通过更新门对应的第一数值和候选隐藏状态 的信息得到本层隐藏层输出的候选信息。例如,通过预置隐藏状态公式
Figure PCTCN2021084171-appb-000016
其中,
Figure PCTCN2021084171-appb-000017
为本层隐藏层输出的候选信息,
Figure PCTCN2021084171-appb-000018
为更新门对应的第一数值,
Figure PCTCN2021084171-appb-000019
为上一层的隐藏输出,
Figure PCTCN2021084171-appb-000020
为当前候选隐藏状态的信息,Θ为元素相乘符号,将本层隐藏层输出的候选信息最为第一最终隐层向量信息。
通过后向门递归神经网络和各个字词的字词向量信息,得到各个字词的字词向量信息对应的第二最终隐层向量信息。其中,该后向门递归神经网络模型包括更新门、重置门、候选隐藏状态和隐藏状态。
更新门控制整个门单元在当前层往后输出的更新比例,通过更新门和各个字词的字词向量信息,得到该更新门输出的第一数值,该第一数值包括1和0,其中1表示输出由当前的隐层信息决定,即完全更新0,表示当前的输出被遗忘,输出由之前的隐层信息决定,不需要更新或不需要完全更新。例如,通过预置更新门公式
Figure PCTCN2021084171-appb-000021
得到更新门对应的第一数值,其中,
Figure PCTCN2021084171-appb-000022
为更新门对应的第一数值,σ为预置sigmoid激活函数,
Figure PCTCN2021084171-appb-000023
为上一层的隐藏输出,x t为字词的字词向量信息,
Figure PCTCN2021084171-appb-000024
Figure PCTCN2021084171-appb-000025
为预置常量。
通过重置门控制着上一层传来的后文信息利用的比例,1代表完全利用,即完全不重置,0代表没有完全利用,即需要重置,通过该重置门和各个字词的字词向量信息,得到该重置门对应的第二数值。例如,通过预置重置门公式
Figure PCTCN2021084171-appb-000026
得到该重置门对应的第二数值,其中,
Figure PCTCN2021084171-appb-000027
为重置门对应的第二数值,σ为预置sigmoid激活函数,
Figure PCTCN2021084171-appb-000028
为上一层的隐藏输出,x t为字词的字词向量信息,
Figure PCTCN2021084171-appb-000029
Figure PCTCN2021084171-appb-000030
为预置常量。
通过候选隐藏状态来辅助之后的隐藏状态计算,通过该候选隐藏状态、第二数值以及各个字词的字词向量信息,得到当前候选隐藏状态的信息。例如,如果重置门中元素值接近0,该重置门对应隐藏状态元素为0,即丢弃上一时间步的隐藏状态。如果元素值接近1,那么表示保留上一时间步的隐藏状态。然后,将按元素乘法的结果与当前时间步的输入连结,再通过含激活函数tan的全连接层计算出候选隐藏状态,其所有元素的值域为[-1,1]。通过候选隐藏状态公式
Figure PCTCN2021084171-appb-000031
为当前候选隐藏状态的信息,
Figure PCTCN2021084171-appb-000032
为重置门对应的第二数值,
Figure PCTCN2021084171-appb-000033
为上一层的隐藏输出,x t为字词的字词向量信息,
Figure PCTCN2021084171-appb-000034
Figure PCTCN2021084171-appb-000035
为预置常量,Θ为元素相乘符号。
隐藏状态为本层隐藏层输出的候选信息,通过更新门对应的第一数值和候选隐藏状态的信息得到本层隐藏层输出的候选信息。例如,通过预置隐藏状态公式
Figure PCTCN2021084171-appb-000036
其中,
Figure PCTCN2021084171-appb-000037
为本层隐藏层输出的候选信息,
Figure PCTCN2021084171-appb-000038
为更新门对应的第一数值,
Figure PCTCN2021084171-appb-000039
为上一层的隐藏输出,
Figure PCTCN2021084171-appb-000040
为当前候选隐藏状态的信息,Θ为元素相乘符号,将本层隐藏层输出的候选信息最为第二最终隐层向量信息。
在得到前向门递归神经网络的各个字词的第一最终隐层向量信息和后向门递归神经网络的各个字词的第二最终隐层向量信息,将各个字词的第一最终隐层向量信息和各个字词的第二最终隐层向量信息进行合并,得到各个字词的软掩码分量信息。例如,通过合并 公式
Figure PCTCN2021084171-appb-000041
其中
Figure PCTCN2021084171-appb-000042
为各个字词的第一最终隐层向量信息,
Figure PCTCN2021084171-appb-000043
为各个字词的第二最终隐层向量信息,
Figure PCTCN2021084171-appb-000044
为各个字词的软掩码分量信息。
子步骤S1022、基于各个所述字词的软掩码分量信息和第一预置激活函数,得到各个所述字词的软掩码修改概率。
示范性的,该第一预置激活函数处于第一预置激活层中,该第一预置激活函数为Softmax激活函数,该Softmax激活函数用于多于一个输出的神经元,保证输出神经元之和为1.0,一般输出的是小于1的概率值。将各个字词的软掩码分量信息分别输入到第一预置激活层中,通过该第一预置激活层中的Softmax激活函数,得到各个字词的软掩码修改概率。例如,将h t输入到第一预置激活层,通过该第一预置激活层中的Softmax激活函数对该h t进行计算,得到该h t对应的软掩码修改概率,该h t为任意字词中的一个软掩码分量信息。
子步骤S1023、根据所述各个所述字词的软掩码修改概率和各个所述字词的字词向量信息,得到各个所述字词的软掩码覆盖率向量信息。
示范性的,在得到各个字词的软掩码修改概率时,通过该各个字词的字词向量信息,得到各个字词的软掩码覆盖率向量信息,该软掩码覆盖率向量信息为掩码对各个字词的覆盖率的向量信息。例如,通过预置软掩码覆盖率向量信息公式e i'=p i*e mask+(1-p i)*e i,其中,e i'为各个字词的覆盖率的向量信息,p i为各个字词的软掩码修改概率,e mask为预置一个掩码的词向量,e i为当前各个字词的字词向量信息。
子步骤S1024、基于各个所述字词的软掩码概率,获取所述检测网络模型的第一损失函数。
示范性的,在获取到各个字词的软掩码概率,通过第一预置损失函数和各个字词的软掩码概率,得到该检测网络模型对应的第一损失函数。例如,通过第一预置损失函数
Figure PCTCN2021084171-appb-000045
其中,X为预置给定序列,n为预置给定序列X的预置长度,p d(g i|X)为检测网络模型输出的第i个字词对应的软掩码概率,得到检测网络对应的第一损失函数。
子步骤S1025、根据所述修改器网络模型和各个所述字词的软掩码覆盖率向量信息,得到各个所述字词对应目标字词的替换概率。
示范性的,将获取到的各个字词的软掩码覆盖率向量信息输入到修改器网络模型中,通过该修改器网络模型对输入的各个字词的软掩码覆盖率向量信息进行处理,输出各个字词对应目标字词的替换概率,目标字词为字词的替换字词,其中,该修改器网络模型包括注意力机制,且该注意力机制可以是点乘注意力机制,也可以是多头注意力机制。例如,通过该点乘注意力机制和/或多头注意力机制对输入的各个字词的软掩码覆盖率向量信息进行处理,得到各个字词对应的注意向量信息。该修改器网络模型还包括第二预置激活函数,该第二预置激活函数处于预置线性层中,该第二预置激活函数为Softmax激活函数,该Softmax激活函数用于多于一个输出的神经元,保证输出神经元之和为1.0,一般输出的是小于1的概率值。将该各个字词对应的注意向量信息输入到预置线性层中,通过该预置线性层中中的Softmax激活函数对各个字词对应的注意向量信息进行计算,得到各个字词对应各个目标字词的替换概率。
在一实施例中,所述修改器网络模型包括注意力机制,所述注意力机制包括点乘注意力机制和多头注意力机制;所述根据所述修改器网络模型和各个所述字词的软掩码覆盖率向量信息,得到各个所述字词对应目标字词的替换概率,包括:根据所述点乘注意力机制和各个所述字词的软掩码覆盖率向量信息得到各个所述字词的点乘注意力向量信息;根据所述多头注意力机制和各个所述字词的点乘注意力向量信息,得到各个字词的多头注意力 向量信息;基于各个字词的多头注意力向量信息和预置线性层,得到各个所述字词对应目标字词的替换概率。
示范性的,该修改器网络模型包括注意力机制,且该注意机制包括点乘注意力机制和多头注意力机制,该点乘注意力机制和多头注意力机制为12层,每一层包括至少一个该点乘注意力机制和至少一个多头注意力机制。
点乘注意力机制预先为各个字词的软掩码覆盖率向量设置三个向量,该三个向量为寻找向量Q(Query),重要程度向量k(Key)和评分向量v(Value),通过寻找向量Q和重要程度向量k将将进行相乘,计算出各个字词的评分程度,在通过该各个字词的评分程度除以当前各个字词的预置字词维度的平方根,得到各个字词的重要程度分数,该重要程度分数为需要给字词放置多少的重要程度。例如,通过预置重要程度公式
Figure PCTCN2021084171-appb-000046
得到各个单词的重要程度分数,其中,Score为各个字词的重要程度分数,Q为各个字词的寻找向量,k为各个字词的重要程度向量,T为预置转职乘法,d k为当前各个字词的预置字词维度。在得到各个字词的重要程度分数时,对各个字词的重要程度分数进行归一化处理,得到各个单词对当前字词的重要程度分数的总和为1,其中,各个单词为各个字词对应目标字词的替换字词。在将该各个字词的重要程度分数进行归一化处理后的重要程度分数乘以评分向量V,得到各个单词对当前字词的点乘注意力向量信息。例如,通过预置点乘注意力向量公式
Figure PCTCN2021084171-appb-000047
得到各个单词对当前字词的点乘注意分数,其中,Score为各个字词的重要程度分数,Q为各个字词的寻找向量,k为各个字词的重要程度向量,V为各个字词的评分向量,T为预置转职乘法,d k为当前各个字词的预置字词维度。
点乘注意力机制是通过三个不同的向量计算一句话中的不同单词之间的重要程度。同时引入了多头注意力机制,多头注意力机制是多个不同的点乘注意力机制,其中多头注意力机制中的各个参数互不共享,同时使用不同的注意力机制去关注不同的信息,不同的注意力机制最终将从不同的角度对句子进行分析。每个点乘注意力机制都会获取一个词向量的输出,例如,有10个不同的点乘注意力机制,而词向量的维度为768,最终获取到的向量大小为(768,10),并对向量大小为(768,10)进行压缩为1个,得到的向量为(768,1)。将该向量(768,1)乘以预置权重向量(10,1),即做一个注意力得分维度的加权平均,最终就可以获得(768,1)的向量,并将该(768,1)的向量作为多头注意力机制的的第一输出向量。
当获取到多头注意力机制的第一输出向量,将该输出的多头注意力机制的第一输出向量连接到一个前向反馈网络当中,修改器网络模型中使用了几层不同的注意力机制进行叠加,例如,当修改器网络模型使用了12层,每一层都经过多头注意力机制加前向网络输入到下一层中。例如,即先通过一个线性全连接输入Relu激活函数层,再通过线性全连接层输入下一层,该Relu激活函数层包括Relu激活函数。例如,Relu激活函数为FN(X)=max(0,XW 1+b 1)W 2+b 2,得到多头注意力机制的第二输出向量,其中,W 1、b 1、W 2、b 2为预置参数,X为多头注意力机制输出的第一向量,FN(X)为多头注意力机制输出的第二向量在获取到多头注意力机制输出的各个字词的第二向量时,将多头注意力机制输出的各个字词的第二向量与各个字词的字词向量信息相加,得到各个字词的残差向量,并将该残差向量作为各个字词的多头注意力向量信息。将得到的各个字词的多头注意力向量信息分别输入到预置线性层中,通过预置线性层对各个字词的多头注意力向量信息进行计算,得到各个字词对应各个目标字词的替换概率。
子步骤S1026、基于各个所述字词对应目标字词的替换概率,获取所述修改器网络模 型的第二损失函数。
示范性的,在获取到各个字词对应目标字词的替换概率,通过第二预置损失函数和各个字词对应目标字词的替换概率,得到该修改器网络模型对应的第二损失函数。例如,通过第二预置损失函数
Figure PCTCN2021084171-appb-000048
其中,X为预置给定序列,n为预置给定序列X的预置长度,p c(y i|X)为修改器网络模型输出的第i个字词对应各个目标字词的替换概率,得到修改器网络对应的第二损失函数。
子步骤S1027、根据所述第一损失函数和所述第二损失函数,获取所述预置软掩码语言模型的第三损失函数。
示范性的,在得到检测网络模型的第一损失函数和修改器网络模型的第二损失函数时,通过第三预置损失函数公式,得到预置软掩码语言模型的第三损失函数。例如,第三预置损失函数公式为L=λ*L d+(1-λ)*L C,其中,λ为预置参数,L d为当前检测网络模型的梯度值,L C为当前修改器网络模型的梯度值,该λ的预置参数为0.8。
步骤S103、基于损失函数更新所述预置软掩码语言模型的模型参数,并确定所述预置软掩码语言模型是否处于收敛状态。
示范性的,在得到损失函数时,基于该损失函数计算对应的梯度值,通过该梯度值对预置软掩码语言模型中的模型参数进行优化,例如,该模型参数包括变量参数,通过该梯度值对该预置软掩码语言模型中的变量参数进行优化。在通过损失函数更新预置软掩码语言模型的模型参数后,确定该预置软掩码语言模型是否处于收敛状态。例如,在得到对应的损失函数时,获取该损失函数对应的梯度值,将该梯度值与预置梯度值进行比对,若该梯度值小于或等于预置梯度值,则确定该预置软掩码语言模型处于收敛状态。或者,获取更新后的预置软掩码语言模型的变量参数,将该变量参数与预置变量参数进行比对,若该变量参数小于预置变量参数,则确定该预置软掩码语言模型处于收敛状态。
在一实施例中,具体地,参照图3,步骤S103包括:子步骤S1031至子步骤S1032。
子步骤S1031、基于所述第一损失函数更新所述检测网络模型的模型参数。
示范性的,在得到第一损失函
Figure PCTCN2021084171-appb-000049
其中,X为预置给定序列,n为预置给定序列X的预置长度,p d(g i|X)为检测网络模型输出的第i个字词对应软掩码修改概率,将该X、n、p d(g i|X)p c(y i|X)代入到该第一损失函数中,得到该当前检测网络模型的梯度值L d,通过该梯度值L d对该检测网络模型的模型参数进行优化。
子步骤S1032、基于所述第二损失函更新所述修改器网络模型的模型参数。
示范性的,在得到第二损失函
Figure PCTCN2021084171-appb-000050
其中,X为预置给定序列,n为预置给定序列X的预置长度,p c(y i|X)为修改器网络模型输出的第i个字词对应各个目标字词的替换概率,将该X、n、p c(y i|X)代入到该第一损失函数中,得到该当前修改器网络模型的梯度值L C,通过该梯度值L C对该修改器网络模型的模型参数进行优化。
子步骤S1033、基于所述第三损失函数更新所述预置软掩码语言模型的模型参数。
示范性的,在得到第三损失函L=λ*L d+(1-λ)*L C,其中,λ为预置参数,L d为当前检测网络模型的梯度值,L C为当前修改器网络模型的梯度值,该λ的预置参数为0.8,将λ、梯度值L d、梯度值L C代入到该第三损失函中,得到梯度值L,通过该梯度值L对该预置软掩码语言模型的模型参数进行优化。
步骤S104、若确定所述预置软掩码语言模型处于收敛状态,则生成对应的文本纠错模型。
示范例的,若确定该预置软掩码语言模型处于收敛状态,则将该预置软掩码语言模型生成对应的文本纠错模型,该文本纠错模型能识别出文本中的错别字,并预测该错别字对应的替换字来完成文本纠错。
在本申请实施例中,通过待修改文本训练预置软掩码语言模型,得到对应的损失函数,并通过该损失函数对该预置软掩码语言模型的模型参数进行优化,生成对应的文本纠错模型,通过软掩码对字词进行处理,实现了在不需要大量训练预料的情况下,不仅缩短模型的训练时长,还对数据进行拟合,并提高了模型的准确率。
请参照图1,图1为本申请的实施例提供的一种基于软掩码的文本纠错模型识别方法的流程示意图。
如图4所示,该基于软掩码的文本纠错模型识别方法包括步骤S201至步骤S202。
步骤S201、获取待纠错文本。
示范性的,获取待纠错本,该待纠错文本可以是无字词错别文本,且待纠错文本包括短句文本等。
步骤S202、基于文本纠错模型对所述待纠错文本将进行字词纠错,获取对所述待纠错文本进行字词纠错后的文本,其中,所述文本纠错模型为上述的基于软掩码的文本纠错模型训练方法得到的
示范性的,在获取到待纠错文本时,将该待纠错文本输入到文本纠错模型中,将该待纠错文本转换为对应各个字词的向量信息。该文本纠错模型包括检测网络模型和修改器网络模型,通过该检测网络模型对该各个字词的向量信息进行检测,确定该各个字词是否需要修改。例如,通过检测网络模型中的双向门递归神经网络模型对各个字词的向量信息进行处理,得到各个字词的软掩码修改概率,通过该各个字词的软掩码修改概率确定各个字词是否需要修改。在确定各个字中需要修改的修改字词,通过修改字词的软掩码修改概率和修改字词的向量信息,得到修改字词的软掩码覆盖率向量信息。通过修改器网络修改字词进行处理,得到该修改字词对对应替换字词的替换率。例如,通过修改器网络模型中的点乘注意机制和都头注意力机制对该修改字词的软掩码覆盖率向量信息进行处理,得到该修改字词对对应替换字词的替换率,通过该该修改字词对对应替换字词的替换率,确定替换该修改字词的替换字词。将该替换字词替换该待纠错文本中的修改字词,得到纠错后的文本。
在本申请实施例中,通过文本纠错模型对待纠错文本进行字词纠错,得到字词纠错后的文本,通过该文本纠错模型中的检测网络模型和修改器网络模型,快速准确的得到字词纠错后的文本
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
请参照图5,图5为本申请实施例提供的一种基于软掩码的文本纠错模型训练装置的示意性框图。
如图5所示,该基于软掩码的文本纠错模型训练装置400,包括:获取及转换模块401、获取模块402、更新及确定模块403、生成模块404。
获取及转换模块401,用于获取待修改文本,并将所述待修改文本转换为各个字词的字词向量信息;
获取模块402,用于根据各个所述字词的字词向量信息训练预置软掩码语言模型,获取对应的损失函数;
更新及确定模块403,用于基于损失函数更新所述预置软掩码语言模型的模型参数,并确定所述预置软掩码语言模型是否处于收敛状态;
生成模块404,用于若确定所述预置软掩码语言模型处于收敛状态,则生成对应的文本纠错模型。
其中,获取模块402具体还用于:
根据所述检测网络模型和各个所述字词的字词向量信息,得到各个所述字词的软掩码分量信息;
基于各个所述字词的软掩码分量信息和第一预置激活函数,得到各个所述字词的软掩码修改概率;
根据所述各个所述字词的软掩码修改概率和各个所述字词的字词向量信息,得到各个所述字词的软掩码覆盖率向量信息;
基于各个所述字词的软掩码概率,获取所述检测网络模型的第一损失函数;
根据所述修改器网络模型和各个所述字词的软掩码覆盖率向量信息,得到各个所述字词对应目标字词的替换概率;
基于各个所述字词对应目标字词的替换概率,获取所述修改器网络模型的第二损失函数;
根据所述第一损失函数和所述第二损失函数,获取所述预置软掩码语言模型的第三损失函数。
其中,获取模块402具体还用于:
基于所述前向门递归神经网络和各个所述字词的字词向量信息,获取各个所述字词的字词向量信息对应的第一最终隐层向量信息;
基于所述后向门递归神经网络和各个所述字词的字词向量信息,获取各个所述字词的字词向量信息对应的第二最终隐层向量信息;
根据合并各个所述字词的第一最终隐层向量信息和各个所述字词的第二最终隐层向量信息,得到各个所述字词的软掩码分量信息。
其中,获取模块402具体还用于:
所述根据所述修改器网络模型和各个所述字词的软掩码覆盖率向量信息,得到各个所述字词对应目标字词的替换概率,包括:
根据所述点乘注意力机制和各个所述字词的软掩码覆盖率向量信息得到各个所述字词的点乘注意力向量信息;
根据所述多头注意力机制和各个所述字词的点乘注意力向量信息,得到各个字词的多头注意力向量信息;
基于各个字词的多头注意力向量信息和预置线性层,得到各个所述字词对应目标字词的替换概率。
其中,更新及确定模块403具体还用于:
基于所述第一损失函数更新所述检测网络模型的模型参数;
基于所述第二损失函更新所述修改器网络模型的模型参数;
基于所述第三损失函数更新所述预置软掩码语言模型的模型参数。
需要说明的是,所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的装置和各模块及单元的具体工作过程,可以参考前述基于软掩码的文本纠错模型训练装置实施例中的对应过程,在此不再赘述。
请参照图6,图6为本申请实施例提供的一种基于软掩码的文本纠错模型识别装置的示意性框图。
如图6所示,该种基于软掩码的文本纠错模型识别装置500,包括:第一获取模块501、第二获取模块502。
第一获取模块501,用于获取待纠错文本;
第二获取模块502,用于基于文本纠错模型对所述待纠错文本将进行字词纠错,获取对所述待纠错文本进行字词纠错后的文本,其中,所述文本纠错模型为上述的基于软掩码的文本纠错模型训练方法得到的。
需要说明的是,所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的装置和各模块及单元的具体工作过程,可以参考前述基于软掩码的文本纠错模型 识别方法实施例中的对应过程,在此不再赘述。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
上述实施例提供的装置可以实现为一种计算机程序的形式,该计算机程序可以在如图7所示的计算机设备上运行。
请参阅图7,图7为本申请实施例提供的一种计算机设备的结构示意性框图。该计算机设备可以为终端。
如图7所示,该计算机设备包括通过系统总线连接的处理器、存储器和网络接口,其中,存储器可以包括非易失性存储介质和内存储器。
非易失性存储介质可存储操作系统和计算机程序。该计算机程序包括程序指令,该程序指令被执行时,可使得处理器执行任意一种基于软掩码的文本纠错模型训练方法和基于软掩码的文本纠错模型识别方法。
处理器用于提供计算和控制能力,支撑整个计算机设备的运行。
内存储器为非易失性存储介质中的计算机程序的运行提供环境,该计算机程序被处理器执行时,可使得处理器执行任意一种基于软掩码的文本纠错模型训练方法和基于软掩码的文本纠错模型识别方法。
该网络接口用于进行网络通信,如发送分配的任务等。本领域技术人员可以理解,图7中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
应当理解的是,处理器可以是中央处理单元(Central Processing Unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
其中,在一个实施例中,所述处理器用于运行存储在存储器中的计算机程序,以实现如下步骤:
获取待修改文本,并将所述待修改文本转换为各个字词的字词向量信息;
根据各个所述字词的字词向量信息训练预置软掩码语言模型,获取对应的损失函数;
基于损失函数更新所述预置软掩码语言模型的模型参数,并确定所述预置软掩码语言模型是否处于收敛状态;
若确定所述预置软掩码语言模型处于收敛状态,则生成对应的文本纠错模型。
在一个实施例中,所述预置软掩码语言模型包括检测网络模型和修改器网络模型,所述损失函数包括第一损失函数、第二损失函数和第三损失函数;所述根据各个所述字词的字词向量信息训练预置软掩码语言模型,获取对应的损失函数实现时,所述处理器用于实现:
根据所述检测网络模型和各个所述字词的字词向量信息,得到各个所述字词的软掩码分量信息;
基于各个所述字词的软掩码分量信息和第一预置激活函数,得到各个所述字词的软掩码修改概率;
根据所述各个所述字词的软掩码修改概率和各个所述字词的字词向量信息,得到各个所述字词的软掩码覆盖率向量信息;
基于各个所述字词的软掩码概率,获取所述检测网络模型的第一损失函数;
根据所述修改器网络模型和各个所述字词的软掩码覆盖率向量信息,得到各个所述字词对应目标字词的替换概率;
基于各个所述字词对应目标字词的替换概率,获取所述修改器网络模型的第二损失函数;
根据所述第一损失函数和所述第二损失函数,获取所述预置软掩码语言模型的第三损失函数。
在一个实施例中,所述检测网络模型包括双向门递归神经网络,所述双向门递归神经网络包括前向门递归神经网络和后向门递归神经网络;所述根据所述检测网络模型和各个所述字词的字词向量信息,得到各个所述字词的软掩码分量信息实现时,所述处理器用于实现:
基于所述前向门递归神经网络和各个所述字词的字词向量信息,获取各个所述字词的字词向量信息对应的第一最终隐层向量信息;
基于所述后向门递归神经网络和各个所述字词的字词向量信息,获取各个所述字词的字词向量信息对应的第二最终隐层向量信息;
根据合并各个所述字词的第一最终隐层向量信息和各个所述字词的第二最终隐层向量信息,得到各个所述字词的软掩码分量信息。
在一个实施例中,所述修改器网络模型包括注意力机制,所述注意力机制包括点乘注意力机制和多头注意力机制;所述根据所述修改器网络模型和各个所述字词的软掩码覆盖率向量信息,得到各个所述字词对应目标字词的替换概率实现时,所述处理器用于实现:
根据所述点乘注意力机制和各个所述字词的软掩码覆盖率向量信息得到各个所述字词的点乘注意力向量信息;
根据所述多头注意力机制和各个所述字词的点乘注意力向量信息,得到各个字词的多头注意力向量信息;
基于各个字词的多头注意力向量信息和预置线性层,得到各个所述字词对应目标字词的替换概率。
在一个实施例中,所述基于损失函数更新所述预置软掩码语言模型的模型参数实现时,所述处理器用于实现:
基于所述第一损失函数更新所述检测网络模型的模型参数;
基于所述第二损失函更新所述修改器网络模型的模型参数;
基于所述第三损失函数更新所述预置软掩码语言模型的模型参数。
其中,在一个实施例中,所述处理器用于运行存储在存储器中的计算机程序,以实现如下步骤:
获取待纠错文本;
基于文本纠错模型对所述待纠错文本将进行字词纠错,获取对所述待纠错文本进行字词纠错后的文本,其中,所述文本纠错模型为上述的基于软掩码的文本纠错模型训练方法得到的。
本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序中包括程序指令,所述程序指令被执行时所实现的方法可参照本申请基于软掩码的文本纠错模型训练方法和基于软掩码的文本纠错模型识别方法的各个实施例。
其中,所述计算机可读存储介质可以是非易失性,也可以是易失性。所述计算机可读存储介质可以是前述实施例所述的计算机设备的内部存储单元,例如所述计算机设备的硬盘或内存。所述计算机可读存储介质也可以是所述计算机设备的外部存储设备,例如所述计算机设备上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。
进一步地,所述计算机可读存储介质可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序等;存储数据区可存储根据区块链节点的使用所创建的数据等。
本申请所指区块链是预置软掩码语言模型模型的存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种基于软掩码的文本纠错模型训练方法,其中,包括:
    获取待修改文本,并将所述待修改文本转换为各个字词的字词向量信息;
    根据各个所述字词的字词向量信息训练预置软掩码语言模型,获取对应的损失函数;
    基于损失函数更新所述预置软掩码语言模型的模型参数,并确定所述预置软掩码语言模型是否处于收敛状态;
    若确定所述预置软掩码语言模型处于收敛状态,则生成对应的文本纠错模型。
  2. 如权利要求1所述的基于软掩码的文本纠错模型训练方法,其中,所述预置软掩码语言模型包括检测网络模型和修改器网络模型,所述损失函数包括第一损失函数、第二损失函数和第三损失函数;
    所述根据各个所述字词的字词向量信息训练预置软掩码语言模型,获取对应的损失函数,包括:
    根据所述检测网络模型和各个所述字词的字词向量信息,得到各个所述字词的软掩码分量信息;
    基于各个所述字词的软掩码分量信息和第一预置激活函数,得到各个所述字词的软掩码修改概率;
    根据所述各个所述字词的软掩码修改概率和各个所述字词的字词向量信息,得到各个所述字词的软掩码覆盖率向量信息;
    基于各个所述字词的软掩码概率,获取所述检测网络模型的第一损失函数;
    根据所述修改器网络模型和各个所述字词的软掩码覆盖率向量信息,得到各个所述字词对应目标字词的替换概率;
    基于各个所述字词对应目标字词的替换概率,获取所述修改器网络模型的第二损失函数;
    根据所述第一损失函数和所述第二损失函数,获取所述预置软掩码语言模型的第三损失函数。
  3. 如权利要求2所述的基于软掩码的文本纠错模型训练方法,其中,所述检测网络模型包括双向门递归神经网络,所述双向门递归神经网络包括前向门递归神经网络和后向门递归神经网络;
    所述根据所述检测网络模型和各个所述字词的字词向量信息,得到各个所述字词的软掩码分量信息,包括:
    基于所述前向门递归神经网络和各个所述字词的字词向量信息,获取各个所述字词的字词向量信息对应的第一最终隐层向量信息;
    基于所述后向门递归神经网络和各个所述字词的字词向量信息,获取各个所述字词的字词向量信息对应的第二最终隐层向量信息;
    根据合并各个所述字词的第一最终隐层向量信息和各个所述字词的第二最终隐层向量信息,得到各个所述字词的软掩码分量信息。
  4. 如权利要求2所述的基于软掩码的文本纠错模型训练方法,其中,所述修改器网络模型包括注意力机制,所述注意力机制包括点乘注意力机制和多头注意力机制;
    所述根据所述修改器网络模型和各个所述字词的软掩码覆盖率向量信息,得到各个所述字词对应目标字词的替换概率,包括:
    根据所述点乘注意力机制和各个所述字词的软掩码覆盖率向量信息得到各个所述字词的点乘注意力向量信息;
    根据所述多头注意力机制和各个所述字词的点乘注意力向量信息,得到各个字词的多头注意力向量信息;
    基于各个字词的多头注意力向量信息和预置线性层,得到各个所述字词对应目标字词的替换概率。
  5. 如权利要求2所述的基于软掩码的文本纠错模型训练方法,其中,所述基于损失函数更新所述预置软掩码语言模型的模型参数,包括:
    基于所述第一损失函数更新所述检测网络模型的模型参数;
    基于所述第二损失函更新所述修改器网络模型的模型参数;
    基于所述第三损失函数更新所述预置软掩码语言模型的模型参数。
  6. 一种基于软掩码的文本纠错模型识别方法,其中,所述方法包括:
    获取待纠错文本;
    基于文本纠错模型对所述待纠错文本将进行字词纠错,获取对所述待纠错文本进行字词纠错后的文本,其中,所述文本纠错模型为如权利要求1-5所述的基于软掩码的文本纠错模型训练方法得到的。
  7. 一种基于软掩码的文本纠错模型训练装置,其中,包括:
    获取及转换模块,用于获取待修改文本,并将所述待修改文本转换为各个字词的字词向量信息;
    获取模块,用于根据各个所述字词的字词向量信息训练预置软掩码语言模型,获取对应的损失函数;
    更新及确定模块,用于基于损失函数更新所述预置软掩码语言模型的模型参数,并确定所述预置软掩码语言模型是否处于收敛状态;
    生成模块,用于若确定所述预置软掩码语言模型处于收敛状态,则生成对应的文本纠错模型。
  8. 如权利要求7所述的基于软掩码的文本纠错模型训练装置,其中,所述预置软掩码语言模型包括检测网络模型和修改器网络模型,所述损失函数包括第一损失函数、第二损失函数和第三损失函数;
    所述获取模块具体还用于:
    根据所述检测网络模型和各个所述字词的字词向量信息,得到各个所述字词的软掩码分量信息;
    基于各个所述字词的软掩码分量信息和第一预置激活函数,得到各个所述字词的软掩码修改概率;
    根据所述各个所述字词的软掩码修改概率和各个所述字词的字词向量信息,得到各个所述字词的软掩码覆盖率向量信息;
    基于各个所述字词的软掩码概率,获取所述检测网络模型的第一损失函数;
    根据所述修改器网络模型和各个所述字词的软掩码覆盖率向量信息,得到各个所述字词对应目标字词的替换概率;
    基于各个所述字词对应目标字词的替换概率,获取所述修改器网络模型的第二损失函数;
    根据所述第一损失函数和所述第二损失函数,获取所述预置软掩码语言模型的第三损失函数。
  9. 如权利要求8所述的基于软掩码的文本纠错模型训练装置,其中,所述检测网络模型包括双向门递归神经网络,所述双向门递归神经网络包括前向门递归神经网络和后向门递归神经网络;
    所述获取模块具体还用于:
    基于所述前向门递归神经网络和各个所述字词的字词向量信息,获取各个所述字词的字词向量信息对应的第一最终隐层向量信息;
    基于所述后向门递归神经网络和各个所述字词的字词向量信息,获取各个所述字词的字词向量信息对应的第二最终隐层向量信息;
    根据合并各个所述字词的第一最终隐层向量信息和各个所述字词的第二最终隐层向量信息,得到各个所述字词的软掩码分量信息。
  10. 如权利要求8所述的基于软掩码的文本纠错模型训练装置,其中,所述修改器网络模型包括注意力机制,所述注意力机制包括点乘注意力机制和多头注意力机制;
    所述获取模块具体还用于:
    根据所述点乘注意力机制和各个所述字词的软掩码覆盖率向量信息得到各个所述字词的点乘注意力向量信息;
    根据所述多头注意力机制和各个所述字词的点乘注意力向量信息,得到各个字词的多头注意力向量信息;
    基于各个字词的多头注意力向量信息和预置线性层,得到各个所述字词对应目标字词的替换概率。
  11. 如权利要求8所述的基于软掩码的文本纠错模型训练装置,其中,所述更新及确定模块具体还用于:
    基于所述第一损失函数更新所述检测网络模型的模型参数;
    基于所述第二损失函更新所述修改器网络模型的模型参数;
    基于所述第三损失函数更新所述预置软掩码语言模型的模型参数。
  12. 一种基于软掩码的文本纠错模型识别装置,其中,包括:
    第一获取模块,用于获取待纠错文本;
    第二获取模块,用于基于文本纠错模型对所述待纠错文本将进行字词纠错,获取对所述待纠错文本进行字词纠错后的文本,其中,所述文本纠错模型为如权利要求1-5所述的基于软掩码的文本纠错模型训练方法得到的。
  13. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其中,所述计算机程序被处理器执行时实现如下步骤:
    获取待修改文本,并将所述待修改文本转换为各个字词的字词向量信息;
    根据各个所述字词的字词向量信息训练预置软掩码语言模型,获取对应的损失函数;
    基于损失函数更新所述预置软掩码语言模型的模型参数,并确定所述预置软掩码语言模型是否处于收敛状态;
    若确定所述预置软掩码语言模型处于收敛状态,则生成对应的文本纠错模型。
  14. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其中,所述计算机程序被处理器执行时实现如下步骤:
    获取待纠错文本;
    基于文本纠错模型对所述待纠错文本将进行字词纠错,获取对所述待纠错文本进行字词纠错后的文本,其中,所述文本纠错模型为如权利要求1-5所述的基于软掩码的文本纠错模型训练方法得到的。
  15. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现如下步骤:
    获取待修改文本,并将所述待修改文本转换为各个字词的字词向量信息;
    根据各个所述字词的字词向量信息训练预置软掩码语言模型,获取对应的损失函数;
    基于损失函数更新所述预置软掩码语言模型的模型参数,并确定所述预置软掩码语言模型是否处于收敛状态;
    若确定所述预置软掩码语言模型处于收敛状态,则生成对应的文本纠错模型。
  16. 如权利要求15所述的计算机设备,其中,所述预置软掩码语言模型包括检测网络模型和修改器网络模型,所述损失函数包括第一损失函数、第二损失函数和第三损失函数;
    所述根据各个所述字词的字词向量信息训练预置软掩码语言模型,获取对应的损失函数,包括:
    根据所述检测网络模型和各个所述字词的字词向量信息,得到各个所述字词的软掩码分量信息;
    基于各个所述字词的软掩码分量信息和第一预置激活函数,得到各个所述字词的软掩码修改概率;
    根据所述各个所述字词的软掩码修改概率和各个所述字词的字词向量信息,得到各个所述字词的软掩码覆盖率向量信息;
    基于各个所述字词的软掩码概率,获取所述检测网络模型的第一损失函数;
    根据所述修改器网络模型和各个所述字词的软掩码覆盖率向量信息,得到各个所述字词对应目标字词的替换概率;
    基于各个所述字词对应目标字词的替换概率,获取所述修改器网络模型的第二损失函数;
    根据所述第一损失函数和所述第二损失函数,获取所述预置软掩码语言模型的第三损失函数。
  17. 如权利要求16所述的计算机设备,其中,所述检测网络模型包括双向门递归神经网络,所述双向门递归神经网络包括前向门递归神经网络和后向门递归神经网络;
    所述根据所述检测网络模型和各个所述字词的字词向量信息,得到各个所述字词的软掩码分量信息,包括:
    基于所述前向门递归神经网络和各个所述字词的字词向量信息,获取各个所述字词的字词向量信息对应的第一最终隐层向量信息;
    基于所述后向门递归神经网络和各个所述字词的字词向量信息,获取各个所述字词的字词向量信息对应的第二最终隐层向量信息;
    根据合并各个所述字词的第一最终隐层向量信息和各个所述字词的第二最终隐层向量信息,得到各个所述字词的软掩码分量信息。
  18. 如权利要求16所述的计算机设备,其中,所述修改器网络模型包括注意力机制,所述注意力机制包括点乘注意力机制和多头注意力机制;
    所述根据所述修改器网络模型和各个所述字词的软掩码覆盖率向量信息,得到各个所述字词对应目标字词的替换概率,包括:
    根据所述点乘注意力机制和各个所述字词的软掩码覆盖率向量信息得到各个所述字词的点乘注意力向量信息;
    根据所述多头注意力机制和各个所述字词的点乘注意力向量信息,得到各个字词的多头注意力向量信息;
    基于各个字词的多头注意力向量信息和预置线性层,得到各个所述字词对应目标字词的替换概率。
  19. 如权利要求16所述的计算机设备,其中,所述基于损失函数更新所述预置软掩码语言模型的模型参数,包括:
    基于所述第一损失函数更新所述检测网络模型的模型参数;
    基于所述第二损失函更新所述修改器网络模型的模型参数;
    基于所述第三损失函数更新所述预置软掩码语言模型的模型参数。
  20. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现如下步骤:
    获取待纠错文本;
    基于文本纠错模型对所述待纠错文本将进行字词纠错,获取对所述待纠错文本进行字词纠错后的文本,其中,所述文本纠错模型为如权利要求1-5所述的基于软掩码的文本纠错模型训练方法得到的。
PCT/CN2021/084171 2020-12-11 2021-03-30 文本纠错模型训练方法、识别方法、装置及计算机设备 WO2022121178A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011453441.9A CN112528634A (zh) 2020-12-11 2020-12-11 文本纠错模型训练、识别方法、装置、设备及存储介质
CN202011453441.9 2020-12-11

Publications (1)

Publication Number Publication Date
WO2022121178A1 true WO2022121178A1 (zh) 2022-06-16

Family

ID=74998820

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/084171 WO2022121178A1 (zh) 2020-12-11 2021-03-30 文本纠错模型训练方法、识别方法、装置及计算机设备

Country Status (2)

Country Link
CN (1) CN112528634A (zh)
WO (1) WO2022121178A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115935957A (zh) * 2022-12-29 2023-04-07 广东南方网络信息科技有限公司 一种基于句法分析的句子语法纠错方法及系统
CN116244432A (zh) * 2022-12-28 2023-06-09 北京百度网讯科技有限公司 语言模型的预训练方法、装置及电子设备

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112528634A (zh) * 2020-12-11 2021-03-19 平安科技(深圳)有限公司 文本纠错模型训练、识别方法、装置、设备及存储介质
CN113343674B (zh) * 2021-07-09 2022-04-01 北京海泰方圆科技股份有限公司 生成文本纠错模型训练语料的方法、装置、设备及介质
CN113515931B (zh) * 2021-07-27 2023-07-21 中国平安人寿保险股份有限公司 文本纠错方法、装置、计算机设备及存储介质
CN113591475B (zh) * 2021-08-03 2023-07-21 美的集团(上海)有限公司 无监督可解释分词的方法、装置和电子设备
CN113705203A (zh) * 2021-09-02 2021-11-26 上海极链网络科技有限公司 文本纠错方法、装置、电子设备及计算机可读存储介质
CN114330512B (zh) * 2021-12-13 2024-04-26 腾讯科技(深圳)有限公司 数据处理方法、装置、电子设备及计算机可读存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110196894A (zh) * 2019-05-30 2019-09-03 北京百度网讯科技有限公司 语言模型的训练方法和预测方法
CN111931490A (zh) * 2020-09-27 2020-11-13 平安科技(深圳)有限公司 文本纠错方法、装置及存储介质
CN112528634A (zh) * 2020-12-11 2021-03-19 平安科技(深圳)有限公司 文本纠错模型训练、识别方法、装置、设备及存储介质

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110196894A (zh) * 2019-05-30 2019-09-03 北京百度网讯科技有限公司 语言模型的训练方法和预测方法
CN111931490A (zh) * 2020-09-27 2020-11-13 平安科技(深圳)有限公司 文本纠错方法、装置及存储介质
CN112528634A (zh) * 2020-12-11 2021-03-19 平安科技(深圳)有限公司 文本纠错模型训练、识别方法、装置、设备及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SHAOHUA ZHANG; HAORAN HUANG; JICONG LIU; HANG LI: "Spelling Error Correction with Soft-Masked BERT", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 15 May 2020 (2020-05-15), 201 Olin Library Cornell University Ithaca, NY 14853 , XP081674179 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116244432A (zh) * 2022-12-28 2023-06-09 北京百度网讯科技有限公司 语言模型的预训练方法、装置及电子设备
CN116244432B (zh) * 2022-12-28 2023-11-14 北京百度网讯科技有限公司 语言模型的预训练方法、装置及电子设备
CN115935957A (zh) * 2022-12-29 2023-04-07 广东南方网络信息科技有限公司 一种基于句法分析的句子语法纠错方法及系统
CN115935957B (zh) * 2022-12-29 2023-10-13 广东南方网络信息科技有限公司 一种基于句法分析的句子语法纠错方法及系统

Also Published As

Publication number Publication date
CN112528634A (zh) 2021-03-19

Similar Documents

Publication Publication Date Title
WO2022121178A1 (zh) 文本纠错模型训练方法、识别方法、装置及计算机设备
US20230021852A1 (en) Multi-Turn Dialogue Response Generation Via Mutual Information Maximization
CN108959246B (zh) 基于改进的注意力机制的答案选择方法、装置和电子设备
CN107704625B (zh) 字段匹配方法和装置
WO2021212749A1 (zh) 命名实体标注方法、装置、计算机设备和存储介质
US20190156817A1 (en) Slim embedding layers for recurrent neural language models
CN110825857B (zh) 多轮问答识别方法、装置、计算机设备及存储介质
WO2022116445A1 (zh) 文本纠错模型建立方法、装置、介质及电子设备
WO2022103682A1 (en) Face recognition from unseen domains via learning of semantic features
WO2021057884A1 (zh) 语句复述方法、训练语句复述模型的方法及其装置
CN111368037A (zh) 基于Bert模型的文本相似度计算方法和装置
CN111339775A (zh) 命名实体识别方法、装置、终端设备及存储介质
WO2023071581A1 (zh) 用于确定响应语句的方法、设备、装置和介质
CN116822651A (zh) 基于增量学习的大模型参数微调方法、装置、设备及介质
CN111027681B (zh) 时序数据处理模型训练方法、数据处理方法、装置及存储介质
Amidi et al. Vip cheatsheet: Recurrent neural networks
CN116956835A (zh) 一种基于预训练语言模型的文书生成方法
CN112835798B (zh) 聚类学习方法、测试步骤聚类方法及相关装置
WO2022116443A1 (zh) 语句判别方法、装置、设备及存储介质
CN117591547A (zh) 数据库的查询方法、装置、终端设备以及存储介质
CN111951785B (zh) 语音识别方法、装置及终端设备
CN112183072A (zh) 一种文本纠错方法、装置、电子设备及可读存储介质
CN116432608A (zh) 基于人工智能的文本生成方法、装置、计算机设备及介质
CN113177406B (zh) 文本处理方法、装置、电子设备和计算机可读介质
CN112507081B (zh) 相似句匹配方法、装置、计算机设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21901894

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21901894

Country of ref document: EP

Kind code of ref document: A1