CN116822498B - Text error correction processing method, model processing method, device, equipment and medium - Google Patents

Text error correction processing method, model processing method, device, equipment and medium Download PDF

Info

Publication number
CN116822498B
CN116822498B CN202311100345.XA CN202311100345A CN116822498B CN 116822498 B CN116822498 B CN 116822498B CN 202311100345 A CN202311100345 A CN 202311100345A CN 116822498 B CN116822498 B CN 116822498B
Authority
CN
China
Prior art keywords
text
model
layer
error correction
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311100345.XA
Other languages
Chinese (zh)
Other versions
CN116822498A (en
Inventor
陈东来
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Qianhai Huanrong Lianyi Information Technology Service Co Ltd
Original Assignee
Shenzhen Qianhai Huanrong Lianyi Information Technology Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Qianhai Huanrong Lianyi Information Technology Service Co Ltd filed Critical Shenzhen Qianhai Huanrong Lianyi Information Technology Service Co Ltd
Priority to CN202311100345.XA priority Critical patent/CN116822498B/en
Publication of CN116822498A publication Critical patent/CN116822498A/en
Application granted granted Critical
Publication of CN116822498B publication Critical patent/CN116822498B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/232Orthographic correction, e.g. spell checking or vowelisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Character Discrimination (AREA)

Abstract

The application relates to the technical field of deep learning and natural language processing, and discloses a text error correction processing method, a model processing method, a device, equipment and a medium, which are used for reducing the misjudgment rate of text recognition, wherein the method comprises the following steps: text correction is carried out on the identification text through a text correction module of the target text correction model, corrected text is obtained, and the target text correction model is obtained through training in the following mode: inputting the training text into a text correction module of the trained model to perform text correction processing to obtain a text correction result; inputting the training text into a word-staggering recognition module of the trained model to perform word-staggering probability recognition to obtain a word-staggering recognition result; obtaining the total model loss of the trained model according to the text error correction result and the wrong word recognition result; and taking the trained model with the total loss of the trained model conforming to the preset loss value as a target text error correction model.

Description

Text error correction processing method, model processing method, device, equipment and medium
Technical Field
The present application relates to the field of natural language processing technologies, and in particular, to a text error correction processing method, a model processing method, a device, equipment, and a medium.
Background
Common text correction methods include misplacement of word dictionary, edit distance, language model, and the like. The labor cost for constructing the wrongly written word dictionary is high, and the method is applicable to the partial vertical field with limited wrongly written words; the editing distance adopts a fuzzy matching method of similar character strings, and partial common wrongly written characters and Chinese diseases can be corrected by comparing correct samples, but the universality is insufficient, so that the key point of the current academic and industrial research is generally a character error correction technology based on a language model.
The inventor researches and discovers that the traditional text error correction technology can lead to higher misjudgment rate.
Disclosure of Invention
The application provides a text error correction processing method, a model processing method, a device, equipment and a medium, which are used for solving the technical problem that the existing text error correction processing method, the model processing device, the device and the medium can cause higher misjudgment rate.
In a first aspect, a text error correction processing method is provided, and the method includes:
text correction is carried out on the identification text through a target text correction model, corrected text is obtained, and the target text correction model is trained in the following mode:
inputting the training text into a text correction module of the trained model to perform text correction processing to obtain a text correction result;
Inputting training texts into a word-staggering recognition module of the trained model to perform word-staggering probability recognition to obtain a word-staggering recognition result;
obtaining the model total loss of the trained model according to the text error correction result and the word error recognition result;
and taking the trained model with the total model loss conforming to a preset loss value after training as the target text error correction model.
Further, the text error correction module comprises a first coding layer, a second coding layer, a first BERT layer, a second BERT layer and a first full connection layer;
the first coding layer is used for coding the training text to obtain a first coding vector, and the second coding layer is used for coding the training text to obtain a second coding vector;
the first BERT layer is configured to convert the first coding vector to obtain a first matrix; the two BERT layers are used for converting the second coding vector to obtain a second matrix;
the first full-connection layer is used for carrying out text error correction according to the addition result of the first matrix and the second matrix to obtain the text error correction result;
wherein the first matrix represents a product of a dimension of the first encoding vector and a token length of the training text, and the second matrix represents a product of a dimension of the second encoding vector and a token length of the training text.
Further, the word-staggering identification module comprises a second coding layer, a second BERT layer and a second full-connection layer;
the second coding layer is used for coding the training text to obtain a second coding vector;
the two BERT layers are used for converting the second coding vector to obtain a second matrix;
the second full-connection layer is used for carrying out word-staggering probability recognition according to the second matrix to obtain the word-staggering recognition result;
the second matrix represents a product of dimensions of the second encoding vector and a token length of the training text.
Further, the first BERT layer and the second BERT layer each comprise two layers of Transformer structures, and the first coding layer and the second coding layer represent the same coding layer.
Further, the obtaining the model total loss of the trained model according to the text error correction result and the word-misplacement recognition result includes:
calculating a text error correction loss value according to the text error correction result and the training text;
calculating a false word recognition loss value according to the false word recognition result and the training text with the pre-labeled false labels;
and calculating the total model loss of the trained model according to the text error correction loss value and the error word recognition loss value.
Further, the calculating the model total loss of the trained model according to the text error correction loss value and the misword recognition loss value comprises the following steps:
and taking the linear combination of the text error correction loss value and the miscord identification loss value as the total loss of the model.
In a second aspect, there is provided a model processing method, the method comprising:
inputting the training text into a text correction module of the trained model to perform text correction processing to obtain a text correction result;
inputting training texts into a word-staggering recognition module of the trained model to perform word-staggering probability recognition to obtain a word-staggering recognition result;
obtaining the model total loss of the trained model according to the text error correction result and the word error recognition result;
and carrying out iterative training on the trained model until the total loss of the model after training accords with a target text error correction model with a preset loss value.
In a third aspect, there is provided a model processing apparatus comprising:
the input module is used for inputting the training text into the text correction module of the trained model to perform text correction processing to obtain a text correction result; inputting training texts into a word-staggering recognition module of the trained model to perform word-staggering probability recognition to obtain a word-staggering recognition result;
The obtaining module is used for obtaining the total model loss of the trained model according to the text error correction result and the wrong word recognition result;
and the training module is used for carrying out iterative training on the trained model until the total loss of the model after training accords with a target text error correction model with a preset loss value.
In a fourth aspect, there is provided a vehicle-mounted device including a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor implementing steps of a text error correction processing method or steps of a model processing method when executing the computer program.
In a fifth aspect, there is provided a readable storage medium storing a computer program which, when executed by a processor, implements steps of a text error correction processing method, or steps of a model processing method.
In the scheme provided by the application, two text processing network branches are constructed, one processing network branch is used for text correction, the other processing network branch is used for word error correction, finally, the loss of the network branches is combined to restrict training until the text correction branch and the word error recognition branch obtain good output effects, the word error recognition probability is restricted in the training process, and the word error correction misjudgment rate is reduced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments of the present application will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a system architecture of a text error correction processing system according to an embodiment of the present application;
FIG. 2 is a schematic flow chart of a model processing method according to an embodiment of the application;
FIG. 3 is a diagram of a model network architecture of a target text error correction model in accordance with one embodiment of the present application;
FIG. 4 is another model network architecture diagram of a target text error correction model in accordance with one embodiment of the present application;
FIG. 5 is a flow chart of a text error correction process according to an embodiment of the application;
FIG. 6 is a schematic diagram of a model processing apparatus according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a computer device according to an embodiment of the application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The model processing method and the text error correction processing method provided by the embodiment of the application are exemplary and can be applied to a system as shown in fig. 1, and comprise a client and a server, wherein the client can communicate with the server through a wireless network, and the server is used for realizing the model processing method provided by the application so as to obtain a required target text error correction model; the client is used for realizing a text error correction processing method based on a target text error correction model obtained by the server, so that the generation of error word misjudgment rate can be reduced.
In this embodiment, the client is also called a client, and refers to a program corresponding to a server for providing local services for users. The client may be installed on, but is not limited to, various personal computers, notebook computers, smartphones, tablet computers, and portable wearable devices. The server may be implemented by a stand-alone server or a server cluster formed by a plurality of servers, which is not particularly limited.
It should be noted that, in the embodiment of the present application, the model training process and the application process of the model are included, and in order to facilitate understanding of the present application, the model training process is first described herein.
In one embodiment, as shown in fig. 2, a model processing method is provided, and the method is applied in fig. 1 for illustration, and includes the following steps:
S10: and inputting the training text into a text correction module of the trained model to perform text correction processing, so as to obtain a text correction result.
S20: and inputting the training text into a word-staggering recognition module of the trained model to perform word-staggering probability recognition to obtain a word-staggering recognition result.
It should be appreciated that many document processing requirements exist in reality, such as scanning documents, and when information processing is performed on the scanned documents, the processing generally includes preprocessing a picture, extracting characters by OCR, and extracting key information by using a language model such as BERT. However, in the OCR step, various factors such as light, focusing, and character error in OCR recognition caused by handwriting may cause character error in the text recognition process, thereby affecting the accuracy of extracting information from the characters. Therefore, a scheme for effectively correcting the errors of the text words is highly demanded.
The general method can be generalized into wrongly written word dictionary, editing distance, language model and the like, however, the construction of wrongly written word dictionary has higher labor cost, and is suitable for the text with limited wrongly written words, the editing distance adopts a similar fuzzy matching method of character strings, and partial common wrongly written words and languages can be corrected by comparing correct samples, but the universality is insufficient. Based on this, there are also language model based error correction techniques, including conventional n-gram LM and DNN LM, which can take a word or word as the granularity of error correction. The semantic information of the word granularity is relatively weak, so that the misjudgment rate is higher than the error correction of the word granularity; the word granularity is more dependent on the accuracy of the word segmentation model. In order to reduce the misjudgment rate, CRF (conditional random field) layer proofreading is generally added to an output layer of a model, unreasonable wrongly written word output is avoided through learning transition probability and a global optimal path, however, the model burden is greatly increased due to the fact that the learning transition probability and the global optimal path are relied on, and the misjudgment rate is not accurate.
In this regard, the embodiment of the application provides a new model framework based on a deep learning technology, which is used for training a target text error correction model and reducing the misjudgment rate of text error correction. In order to facilitate understanding, the embodiment of the application records a trained model in a training process as a trained model, wherein the trained model comprises a text error correction module and a word-staggering recognition module, the two network branches are respectively a deep learning network module constructed based on a deep learning technology, the text error correction module is used for carrying out text error correction processing on a text, and the word-staggering recognition module is used for carrying out word-staggering probability recognition on the text.
For training, a training set is first constructed, where the training set includes a training sample set and a verification sample set, where the training sample set includes a large number of training texts, and the verification sample set includes verification texts. After the training text is obtained, the trained model is trained based on the training text. The training text is input into a trained model, a text correction module of the trained model carries out text correction processing on the training text to obtain a text correction result, and a word-error recognition module of the trained model carries out word-error probability recognition on the training text to obtain a word-error recognition result.
S30: and obtaining the total model loss of the trained model according to the text error correction result and the wrong word recognition result.
S40: and taking the trained model with the total loss of the trained model conforming to the preset loss value as a target text error correction model.
In the embodiment, after a text error correction result and an error word recognition result of a training text are obtained, the embodiment of the application obtains the model total loss of a trained model according to the text error correction result and the error word recognition result, trains the trained model by adopting a joint loss constraint mode based on the model total loss until the trained model total loss accords with a preset loss value, and finally takes the trained model which accords with the preset loss value as a target text error correction model.
It should be noted that, in the training process, other training parameter settings, such as learning rate, training times, etc., are also involved, and will not be described here.
It can be seen that in this embodiment, a model processing method is provided, in this process, by constructing two branches of a text processing network, where one branch of the processing network is used for performing text correction, and the other branch of the processing network is used for performing word-error recognition, and finally constraint training is combined with the loss of the set of branches of the network until the text correction branch and the word-error recognition branch obtain good output effects, and not only by means of the BERT-based error correction model, an excessive error correction situation is easy to occur, such as performing error correction correctly, for example, replacing the correct error correction with an expression mode that is also correct and has a high occurrence probability. The BERT layer assumes that the contexts are correct words, and the error cannot be accurately identified under the condition that more than two errors occur, and the method is added with the right wrong word identification module, so that the situation of excessive error correction can be well reduced.
In addition, in the method, the error word recognition probability is constrained in the training process, the error word recognition probability can be recognized at each position in the text by using the right error word recognition branch, the text with high error word probability is processed by using the left text processing branch, and only the position with high prediction error probability is processed by using the left branch, namely the error correction processing is performed, instead of learning the transition probability and the global optimal path mode as in the traditional scheme, so that the error correction of the trained target text is reduced, an optimizing algorithm is not needed, and the method is simpler, has less calculation amount and is lighter.
It should be noted that, in the foregoing embodiment, the trained model includes a text correction module and an error word recognition module, which may be constructed based on a deep learning technique, where, in order to reduce the calculation amount and improve the model accuracy, the embodiment of the present application provides a specific model architecture of the text correction module and the error word recognition module, which are described below respectively.
As shown in fig. 3, fig. 3 is a schematic diagram of a network architecture of a trained model according to an embodiment of the present application, where the trained model includes a first coding layer, a second coding layer, a first BERT layer, a second BERT layer, a first fully connected layer, and a second fully connected layer, where the first coding layer, the second coding layer, and the first BERT layer are sequentially connected, and the second coding layer, the second BERT layer, and the second fully connected layer are sequentially connected, and it is noted that the second BERT layer is also connected to the first fully connected layer. Illustratively, the first coding layer and the second coding layer may be coding layers of an coding layer of an embedded layer or other coding forms, which is not limited in particular. Based on the network framework of fig. 1, the network framework is divided into a text error correction module and an error word recognition module, and the contents of the two modules are respectively described below:
Text error correction module
The text error correction module comprises a first coding layer, a second coding layer, a first BERT layer, a second BERT layer and a first full-connection layer;
the first coding layer is used for coding the training text to obtain a first coding vector, and the second coding layer is used for coding the training text to obtain a second coding vector;
the first BERT layer is used for converting the first coding vector to obtain a first matrix; the two BERT layers are used for converting the second coding vector to obtain a second matrix;
the first full-connection layer is used for carrying out text error correction according to the addition result of the first matrix and the second matrix to obtain a text error correction result;
wherein the first matrix represents the product of the dimension of the first encoding vector and the token length of the training text and the second matrix represents the product of the dimension of the second encoding vector and the token length of the training text.
The foregoing describes the connection and logic relationship between the sub-modules of the text error correction module, and it can be seen that the training text input into the trained model is transferred to the first encoding layer, so that the first encoding layer encodes the input training text, for example, the training text is subjected to an encoding process to obtain an encoding vector corresponding to the training text, so as to convert the training text into a form that can be processed by the model, the encoding vector is marked as a vector different from the error word processing process, the first encoding vector output by the first encoding layer is input into the first BERT layer, the first BERT layer converts the first encoding vector to obtain the output of the first BERT layer, i.e., the product of the dimension of the first encoding vector and the token length of the training text, and when the encoding process is performed, the output of the first BERT layer is also the token length of the training text Embedding dimension.
And similarly, inputting the training text of the trained model, transmitting the training text to a second coding layer, enabling the second coding layer to code the input training text, for example, performing coding processing on the training text to obtain a coding vector corresponding to the training text, converting the training text into a mode which can be processed by the model, marking the training text as a second coding vector for a vector which is different from a text error correction processing process, inputting the second coding vector output by the second coding layer to a second BERT layer, and enabling the second BERT layer to perform conversion processing on the second coding vector to obtain the output of the second BERT layer, namely, the product of the dimension of the second coding vector and the token length of the training text. When the Embedding is processed, the output of the second BERT layer is the token length of the training textEmbedding dimension.
And finally, inputting the first matrix output by the first BERT layer and the second matrix output by the second BERT layer into the first full-connection layer, so that the first full-connection layer performs text error correction processing based on the first matrix and the second matrix, and a text error correction result is obtained.
It should be noted that in this embodiment, in the branches outputting the text error correction result, the BERT layer in the error word recognition module is utilized, that is, the error word recognition module is set to have the setting function of the branch network thereof, and the feedback can be further performed to the processing of the text error correction module, so that the text error correction module and the error word recognition module interact, the model resource is utilized to the maximum extent, and the purposes of saving the resource, fully utilizing the resource and reducing the weight are achieved.
Module for identifying wrong words
The staggered word identifying module comprises a second coding layer, a second transducer model and a second full-connection layer;
the second coding layer is used for coding the training text to obtain a second coding vector;
the second BERT layer is used for converting the second coding vector to obtain a second matrix;
the second full-connection layer is used for carrying out word-staggering probability recognition according to the second matrix to obtain a word-staggering recognition result;
the second matrix represents the product of the dimensions of the second encoding vector and the token length of the training text.
The content of the part describes the connection and logic relation of each sub-module of the wrong word recognition module, the training text input into the trained model is also transmitted to a second coding layer, the second coding layer codes the input training text, for example, the training text is subjected to an encoding process to obtain a coding vector corresponding to the training text, the training text is converted into a mode which can be processed by the model, the vector is marked as a second coding vector in a mode different from the wrong word processing process, the second coding vector output by the second coding layer is input into a second BERT layer, the second BERT layer converts the second coding vector to obtain the output of the second BERT layer, and the output is marked as a second matrix. And finally, inputting a second matrix output by the second BERT layer into a second full-connection layer, so that the second full-connection layer carries out word-dislocation probability recognition processing based on a second self-attention vector, and a word-dislocation recognition result is obtained. The word-error recognition result is represented by 0 or 1, 0 represents the word error, and 1 represents the word error, but of course, other symbol representations may be used, which are merely examples and are not limited thereto.
The second full-connection layer is specifically configured to perform linear transformation on the obtained second matrix, iterate sigmoid to obtain probability of being a wrong word, and length tokenThe matrix of the ebedding dimension is transformed to token length +.>1。
It should be noted that, in order to further reduce the model size and reduce the calculation amount, other further optimization is included on the basis of fig. 1. As shown in FIG. 4, in one embodiment, the error correction module and the error recognition module both use a common Embedding layer, and the first BERT layer and the second BERT layer are common. By way of example, the first fully-connected layer and the second fully-connected layer may be two fully-connected layers, which is not particularly limited.
As an example, the error word recognition module and the text error correction module both adopt a common Embedding layer, and the first BERT layer and the second BERT layer adopt the same BERT layer, so that the number of coding layers can be reduced, the model complexity can be simplified, and the resource waste can be reduced while the scheme is realized.
It should be noted that, since the BERT layer may be pre-trained, the overall training efficiency may also be improved.
It should be noted that, in the embodiment of the present application, other arrangements may be provided besides the BERT layer. For example, the text error correction module includes a first encoding layer, a second encoding layer, a first BERT layer, a second BERT layer, and a first fully connected layer; the error word recognition module comprises a second coding layer, a second transducer model and a second full-connection layer. That is, the BERT layer may be replaced with a transducer model. By way of example, a two-layer transducer model may be employed and shared among the two processing branches, without limitation. In this embodiment, a two-layer transducer model structure may be used, and since error correction is not a particularly complex task, if there is enough data available for training, it is simpler to use a two-layer transducer model structure, which is beneficial to training and to increase the speed of model prediction.
It should be understood that, in this embodiment, the number of layers of the first transducer model and the second transducer model may be set according to practical needs, for example, but not limited to, a two-layer, three-layer or other layer transducer structure, and when the model structure of the present application is based, it is better to include two-layer transducer models, where each layer includes a multi-head self-attention mechanism, a feedforward neural network, and corresponding residual connection and layer normalization.
It should be noted that the transducer model is composed of two transducer modules independently arranged in the training model, and the BERT layer refers to a bidirectional encoder representation (BidirectionalEncoder Representations from Transformer) of the BERT model based on the transducer, which is a pre-trained language characterization model, and the BERT layer refers to a transducer module used in the BERT model, which includes an encoder and a decoder, the encoder is responsible for encoding an input text sequence, the encoder and the decoder are all composed of a plurality of identical layers, each layer includes a multi-head self-attention mechanism, a feedforward neural network, and corresponding residual connections and layer standardization. In the structure of fig. 3, the transducer may also use other similar deep learning convolution modules, which is not limited in particular.
It can be seen that the transducer module is a deep learning model based on a self-attention mechanism, and the main principle of the BERT layer is to capture global dependency relationships in an input text sequence through the self-attention mechanism, so that information of the input text sequence is fully utilized. The BERT layer has significant advantages over conventional Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) in terms of processing long-range dependencies and parallel computation, and the details of the BERT layer processing are not described in detail herein.
It can be seen that, in this embodiment, a specific network structure of the trained model is provided, so that the feasibility of the scheme is ensured, and meanwhile, in the branch of outputting the text error correction result, the BERT layer in the wrong word recognition module is utilized, that is, the wrong word recognition module is set, besides the setting function of the branch network, the wrong word recognition module can also be fed back into the processing of the text error correction module, so that the text error correction module and the wrong word recognition module interact, the model resource is utilized to the maximum, and the purposes of saving resources and fully utilizing resources are achieved.
Moreover, the trained model provided by the embodiment of the application is simple in structure and simple in processing process, belongs to a lightweight model and is convenient to train, deploy and apply.
It should be noted that, in the training process, the application obtains the model total loss of the trained model according to the text error correction result and the word error recognition result, so as to train based on the constraint model, thereby reducing the text misjudgment rate of the text error correction model, wherein in step S30, that is, the model total loss of the trained model is obtained according to the text error correction result and the word error recognition result, comprising the following steps:
s31: and calculating a text correction loss value according to the text correction result and the training text.
S32: and calculating the error word recognition loss value according to the error word recognition result and the training text labeled with the error tag in advance.
S33: and calculating the total model loss of the trained model according to the text error correction loss value and the error word recognition loss value.
It should be understood that the text correction result, that is, the output corrected text, in this embodiment, the Loss of the training text corresponding to the text correction result is calculated first, and the Loss degree is quantized to obtain the text correction Loss value, which may be understood as the output Loss of the text correction module, where the text correction Loss value characterizes the difference between the output corrected text and the corresponding uncorrected text, that is, the probability of comparing the correct text token with the predicted text token. In a specific implementation, the Loss of both may be calculated in various manners, for example, the Loss is calculated by using a current Loss function, for example, the Loss is calculated by using a cross entropy manner, and the like, and the method is not particularly limited.
Similarly, in this embodiment, the Loss of the labeled training text corresponding to the error word recognition result and the error word recognition result is calculated first, which can be understood as the output Loss of the error word recognition module, so as to calculate the error word recognition Loss value of the network branch, that is, the error word recognition Loss value characterizes the difference degree between the output error word recognition result and the labeling result of the corresponding labeled text, that is, the probability that the correct label 0/1 is compared with the predicted current position is the error word, and similarly, in a specific implementation, the Loss of the error word recognition result and the labeled training text labeled with the error label in advance can be calculated in various modes, for example, the Loss is calculated by adopting a cross entropy mode, and the like, which is not limited in particular.
In the embodiment, a mode of acquiring the total model loss of the trained model according to the text error correction result and the error word recognition result is provided, the error word recognition result is fed back to the text error correction result to participate in training, instead of directly using the text error correction module to output for constraint, so that the constraint model outputs a more accurate text error correction result.
It should be noted that, in the foregoing manner of obtaining the model total loss of the trained model according to the text error correction result and the word error recognition result, in other embodiments, for example, in calculating the text error correction loss value and calculating the word error recognition loss value, corresponding loss value weights are allocated to the text error correction loss value and the word error recognition loss value, and the model total loss of the trained model is calculated according to the text error correction loss value, the word error recognition loss value and the corresponding loss value weights, where the weights of the text error correction loss value, the word error recognition loss value and the word error recognition loss value can be configured according to actual requirements. In the mode, the model can reach the training cut-off condition by adjusting the weight or fine adjustment in training under the condition that the model output effect is not lost, so that the model training efficiency is improved.
For example, the expression is as follows:
where L1 is the loss of the text error correction module, L2 is the loss of the miscord recognition module, a is a number between 0 and 1, representing the weight coefficient.
In one embodiment, in step S33, that is, calculating the model total loss of the trained model according to the text error correction loss value and the misword recognition loss value includes: the linear combination of the text error correction loss value and the misword recognition loss value is taken as the model total loss.
In this embodiment, a mode of obtaining the model total loss is provided, in this mode, the linear combination of the text error correction loss value and the error word recognition loss value is directly used as the model total loss, the feasibility of the scheme is ensured, and by adopting the mode of combining the losses, the balance of each loss part can be adjusted by using super parameters, and the model total loss can be calculated simply and quickly, so that the training process is also simpler and faster. It should be noted that, besides the linear combination mode, there may be other processing modes of other multiple combined training modes, which is not limited specifically, based on the combined loss process of the text error correction loss value and the error word recognition loss value.
It should be noted that in the above embodiment, a model processing method is provided for training a target text error correction model that can be used for text error correction, so that the error judgment rate of the target text error correction model is lower, the calculation amount of the model is smaller, and the model structure is simplified.
In one embodiment, as shown in fig. 5, based on the target text correction model, a text correction processing method is correspondingly provided, which includes the following steps:
s101: the recognition text is obtained.
S102: and performing text correction on the identification text through the target text correction model to obtain corrected text.
The text error correction processing method provided by the embodiment of the application can be applied to various application scenes of recognition processing of electronic texts, such as a document scanning process, a text crawler process and the like, and is not particularly limited. Identifying text represents text that requires error correction. The identification text can be obtained by scanning a paper document by using scanning equipment such as a mobile phone or a scanning gun, and then the text correction module of the target text correction model provided by the embodiment of the application is used for correcting the text of the identification text to obtain corrected text, so that the accuracy of outputting the text is improved, the text verification cost is reduced, the user experience is improved, and the application scene is larger.
In practical application, the target text correction model obtained after the server is trained can be deployed in the cloud, and when the client needs to correct the text, the text correction processing can be performed by calling the target text correction model of the cloud, so that the required corrected text is obtained.
It should be further noted that, in an embodiment, the text correction application may be performed only by using the text correction module of the target text correction model, and the word-error recognition module of the target text correction model is used in subsequent other model adjustment or update training, which is not limited in particular.
In the embodiment, the text correction module of the target text correction model is used for correcting the text of the identification text to obtain the corrected text, so that the misjudgment rate of the corrected text can be greatly reduced, the correction quality is improved, the target text correction model is light and simple, the deployment and the application are convenient, and the application value is extremely high.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.
The foregoing describes a method section provided by the embodiment of the present application, and the following describes an apparatus, a device, and a medium section provided by the embodiment of the present application respectively.
In an embodiment, a model processing apparatus is provided, which corresponds to the model processing method in the above embodiment one by one. As shown in fig. 6, the model processing apparatus includes an input module 101, an acquisition module 102, and a training module 103. The functional modules are described in detail as follows:
the input module 101 is configured to input a training text to a text correction module of the trained model to perform text correction processing, so as to obtain a text correction result; inputting training texts into a word-staggering recognition module of the trained model to perform word-staggering probability recognition to obtain a word-staggering recognition result;
The obtaining module 102 is configured to obtain a model total loss of the trained model according to the text error correction result and the error word recognition result;
and the training module 103 is used for performing iterative training on the trained model until the total loss of the model after training accords with a target text error correction model with a preset loss value.
In an embodiment, the text error correction module includes a first encoding layer, a second encoding layer, a first BERT layer, a second BERT layer, and a first fully connected layer;
the first coding layer is used for coding the training text to obtain a first coding vector, and the second coding layer is used for coding the training text to obtain a second coding vector;
the first BERT layer is configured to convert the first coding vector to obtain a first matrix; the two BERT layers are used for converting the second coding vector to obtain a second matrix;
the first full-connection layer is used for carrying out text error correction according to the addition result of the first matrix and the second matrix to obtain the text error correction result;
wherein the first matrix represents a product of a dimension of the first encoding vector and a token length of the training text, and the second matrix represents a product of a dimension of the second encoding vector and a token length of the training text.
In an embodiment, the miscord identification module includes a second encoding layer, a second BERT layer, and a second fully connected layer;
the second coding layer is used for coding the training text to obtain a second coding vector;
the two BERT layers are used for converting the second coding vector to obtain a second matrix;
the second full-connection layer is used for carrying out word-staggering probability recognition according to the second matrix to obtain the word-staggering recognition result;
the second matrix represents a product of dimensions of the second encoding vector and a token length of the training text.
In an embodiment, the first BERT layer and the second BERT layer represent the same BERT layer, and the first coding layer and the second coding layer represent the same coding layer.
It can be seen that, in this embodiment, a specific network structure of the trained model is provided, so that the feasibility of the scheme is ensured, and meanwhile, in the branch of outputting the text error correction result, the BERT layer in the wrong word recognition module is utilized, that is, the wrong word recognition module is set, besides the setting function of the branch network, the wrong word recognition module can also be fed back into the processing of the text error correction module, so that the text error correction module and the wrong word recognition module interact, the model resource is utilized to the maximum, and the purposes of saving resources and fully utilizing resources are achieved.
In one embodiment, the obtaining module 102 is configured to:
calculating a text error correction loss value according to the text error correction result and the training text;
calculating a false word recognition loss value according to the false word recognition result and the training text with the pre-labeled false labels;
and calculating the total model loss of the trained model according to the text error correction loss value and the error word recognition loss value.
In an embodiment, the obtaining module 102 is further configured to:
and taking the linear combination of the text error correction loss value and the miscord identification loss value as the total loss of the model.
In this embodiment, a mode of obtaining the total model loss of the trained model according to the text error correction result and the error word recognition result is provided, the error word recognition result is fed back to the text error correction result to participate in training, instead of directly using the text error correction module itself to output for constraint, so that the constraint model outputs a more accurate text error correction result.
In summary, in this embodiment, a model processing device is provided, where two branches of a text processing network are constructed, one branch of the processing network is used for performing text error correction, and the other branch of the processing network is used for performing word error recognition, and finally, training is constrained by combining the losses of the branches of the network until the branches of the text error correction and the branches of the word error recognition obtain good output effects, so that the probability of word error recognition is constrained in the training process, instead of learning the transition probability and the global optimal path mode, so that the error correction of the trained target text is reduced, the text error correction rate is improved, no optimization algorithm is required, and the model processing device is simpler and has less calculation amount.
In an embodiment, a text correction device is also provided, and the text correction device corresponds to the text correction method in the embodiment one by one, and is not specifically described herein.
In the embodiment, the text correction module of the target text correction model is used for correcting the text of the identification text to obtain the corrected text, so that the misjudgment rate of the corrected text can be greatly reduced, the correction quality is improved, the target text correction model is light and simple, the deployment and the application are convenient, and the application value is extremely high.
For specific limitations of the model processing means or the text correction means, reference may be made to the above limitations of the model processing method or the text correction method, and no further description is given here. The respective modules in the above-described model processing means or text error correction means may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a server. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a storage medium, an internal memory. The storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the storage media. The network interface of the computer device is for communicating with the client over a network connection. The computer program is executed by a processor to implement a model processing method.
In one embodiment, a computer device is provided, which may be a client. The client comprises a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the client is configured to provide computing and control capabilities. The memory of the client comprises a storage medium and an internal memory. The storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the storage media. The network interface of the client is used to communicate with the server via a network connection. The computer program is executed by a processor to implement a text error correction method.
In one embodiment, as shown in fig. 7, a computer device is provided, including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above model processing method or text error correction method when executing the computer program.
In one embodiment, a readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the steps of the above-described model processing method or text error correction method.
Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions.
The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims (9)

1. A method for text error correction processing, the method comprising:
text correction is carried out on the identification text through a target text correction model, corrected text is obtained, and the target text correction model is trained in the following mode:
inputting the training text into a text correction module of the trained model to perform text correction processing to obtain a text correction result;
inputting training texts into a word-staggering recognition module of the trained model to perform word-staggering probability recognition to obtain a word-staggering recognition result;
obtaining the model total loss of the trained model according to the text error correction result and the word error recognition result;
the trained model with the total model loss conforming to a preset loss value after training is used as the target text error correction model;
the text error correction module comprises a first coding layer, a second coding layer, a first BERT layer, a second BERT layer and a first full-connection layer;
the first coding layer is used for coding the training text to obtain a first coding vector, and the second coding layer is used for coding the training text to obtain a second coding vector;
the first BERT layer is configured to convert the first coding vector to obtain a first matrix; the two BERT layers are used for converting the second coding vector to obtain a second matrix;
The first full-connection layer is used for carrying out text error correction according to the addition result of the first matrix and the second matrix to obtain the text error correction result;
wherein the first matrix represents a product of a dimension of the first encoding vector and a token length of the training text, and the second matrix represents a product of a dimension of the second encoding vector and a token length of the training text.
2. The text error correction processing method of claim 1, wherein the miscord recognition module comprises a second encoding layer, a second BERT layer, and a second fully-connected layer;
the second coding layer is used for coding the training text to obtain a second coding vector;
the two BERT layers are used for converting the second coding vector to obtain a second matrix;
the second full-connection layer is used for carrying out word-staggering probability recognition according to the second matrix to obtain the word-staggering recognition result;
the second matrix represents a product of dimensions of the second encoding vector and a token length of the training text.
3. The text error correction processing method of claim 1, wherein the first BERT layer and the second BERT layer represent a same BERT layer, and the first encoding layer and the second encoding layer represent a same encoding layer.
4. The text error correction processing method of claim 1, wherein the obtaining the model total loss of the trained model according to the text error correction result and the miscord recognition result comprises:
calculating a text error correction loss value according to the text error correction result and the training text;
calculating a false word recognition loss value according to the false word recognition result and the training text with the pre-labeled false labels;
and calculating the total model loss of the trained model according to the text error correction loss value and the error word recognition loss value.
5. The text correction processing method as claimed in claim 4, wherein said calculating a model total loss of said trained model based on said text correction loss value and said misword recognition loss value comprises:
and taking the linear combination of the text error correction loss value and the miscord identification loss value as the total loss of the model.
6. A method of model processing, the method comprising:
inputting the training text into a text correction module of the trained model to perform text correction processing to obtain a text correction result;
inputting training texts into a word-staggering recognition module of the trained model to perform word-staggering probability recognition to obtain a word-staggering recognition result;
Obtaining the model total loss of the trained model according to the text error correction result and the word error recognition result;
performing iterative training on the trained model until the total loss of the model after training accords with a target text error correction model with a preset loss value;
the text error correction module comprises a first coding layer, a second coding layer, a first BERT layer, a second BERT layer and a first full-connection layer;
the first coding layer is used for coding the training text to obtain a first coding vector, and the second coding layer is used for coding the training text to obtain a second coding vector;
the first BERT layer is configured to convert the first coding vector to obtain a first matrix; the two BERT layers are used for converting the second coding vector to obtain a second matrix;
the first full-connection layer is used for carrying out text error correction according to the addition result of the first matrix and the second matrix to obtain the text error correction result;
wherein the first matrix represents a product of a dimension of the first encoding vector and a token length of the training text, and the second matrix represents a product of a dimension of the second encoding vector and a token length of the training text.
7. A model processing apparatus, comprising:
the input module is used for inputting the training text into the text correction module of the trained model to perform text correction processing to obtain a text correction result; inputting training texts into a word-staggering recognition module of the trained model to perform word-staggering probability recognition to obtain a word-staggering recognition result;
the obtaining module is used for obtaining the total model loss of the trained model according to the text error correction result and the wrong word recognition result;
the training module is used for carrying out iterative training on the trained model until the total loss of the model after training accords with a target text error correction model with a preset loss value;
the text error correction module comprises a first coding layer, a second coding layer, a first BERT layer, a second BERT layer and a first full-connection layer;
the first coding layer is used for coding the training text to obtain a first coding vector, and the second coding layer is used for coding the training text to obtain a second coding vector;
the first BERT layer is configured to convert the first coding vector to obtain a first matrix; the two BERT layers are used for converting the second coding vector to obtain a second matrix;
The first full-connection layer is used for carrying out text error correction according to the addition result of the first matrix and the second matrix to obtain the text error correction result;
wherein the first matrix represents a product of a dimension of the first encoding vector and a token length of the training text, and the second matrix represents a product of a dimension of the second encoding vector and a token length of the training text.
8. Computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the text error correction processing method according to any of claims 1 to 5 or the steps of the model processing method according to claim 6 when the computer program is executed.
9. A readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the steps of the text error correction processing method according to any one of claims 1 to 5 or the steps of the model processing method according to claim 6.
CN202311100345.XA 2023-08-30 2023-08-30 Text error correction processing method, model processing method, device, equipment and medium Active CN116822498B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311100345.XA CN116822498B (en) 2023-08-30 2023-08-30 Text error correction processing method, model processing method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311100345.XA CN116822498B (en) 2023-08-30 2023-08-30 Text error correction processing method, model processing method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN116822498A CN116822498A (en) 2023-09-29
CN116822498B true CN116822498B (en) 2023-12-01

Family

ID=88117043

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311100345.XA Active CN116822498B (en) 2023-08-30 2023-08-30 Text error correction processing method, model processing method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN116822498B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112329476A (en) * 2020-11-11 2021-02-05 北京京东尚科信息技术有限公司 Text error correction method and device, equipment and storage medium
CN114153971A (en) * 2021-11-09 2022-03-08 浙江大学 Error-containing Chinese text error correction, identification and classification equipment
WO2022126897A1 (en) * 2020-12-18 2022-06-23 平安科技(深圳)有限公司 Text error correction method, apparatus, and device, and storage medium
CN115796156A (en) * 2022-12-16 2023-03-14 华润数字科技有限公司 Text error correction method, device, equipment and medium
CN116127953A (en) * 2023-04-18 2023-05-16 之江实验室 Chinese spelling error correction method, device and medium based on contrast learning
WO2023093525A1 (en) * 2021-11-23 2023-06-01 中兴通讯股份有限公司 Model training method, chinese text error correction method, electronic device, and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104484322A (en) * 2010-09-24 2015-04-01 新加坡国立大学 Methods and systems for automated text correction

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112329476A (en) * 2020-11-11 2021-02-05 北京京东尚科信息技术有限公司 Text error correction method and device, equipment and storage medium
WO2022126897A1 (en) * 2020-12-18 2022-06-23 平安科技(深圳)有限公司 Text error correction method, apparatus, and device, and storage medium
CN114153971A (en) * 2021-11-09 2022-03-08 浙江大学 Error-containing Chinese text error correction, identification and classification equipment
WO2023093525A1 (en) * 2021-11-23 2023-06-01 中兴通讯股份有限公司 Model training method, chinese text error correction method, electronic device, and storage medium
CN115796156A (en) * 2022-12-16 2023-03-14 华润数字科技有限公司 Text error correction method, device, equipment and medium
CN116127953A (en) * 2023-04-18 2023-05-16 之江实验室 Chinese spelling error correction method, device and medium based on contrast learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
中文语法自动纠错系统的研究与实现;王浩畅;周锦程;;企业科技与发展(第02期);第89-92页 *

Also Published As

Publication number Publication date
CN116822498A (en) 2023-09-29

Similar Documents

Publication Publication Date Title
WO2021212749A1 (en) Method and apparatus for labelling named entity, computer device, and storage medium
CN109190120B (en) Neural network training method and device and named entity identification method and device
KR102027141B1 (en) A program coding system based on artificial intelligence through voice recognition and a method thereof
WO2023160472A1 (en) Model training method and related device
CN108665506B (en) Image processing method, image processing device, computer storage medium and server
US20220300718A1 (en) Method, system, electronic device and storage medium for clarification question generation
CN111814496B (en) Text processing method, device, equipment and storage medium
CN113010635B (en) Text error correction method and device
CN107463928A (en) Word sequence error correction algorithm, system and its equipment based on OCR and two-way LSTM
CN112417092A (en) Intelligent text automatic generation system based on deep learning and implementation method thereof
CN112528643A (en) Text information extraction method and device based on neural network
CN116129902A (en) Cross-modal alignment-based voice translation method and system
CN115762489A (en) Data processing system and method of voice recognition model and voice recognition method
CN111522923A (en) Multi-round task type conversation state tracking method
CN114692624A (en) Information extraction method and device based on multitask migration and electronic equipment
CN114529908A (en) Offline handwritten chemical reaction type image recognition technology
CN116913278B (en) Voice processing method, device, equipment and storage medium
CN112307749A (en) Text error detection method and device, computer equipment and storage medium
CN112016299A (en) Method and device for generating dependency syntax tree by using neural network executed by computer
CN116822498B (en) Text error correction processing method, model processing method, device, equipment and medium
CN112464637A (en) Label-based optimization model training method, device, equipment and storage medium
CN116958700A (en) Image classification method based on prompt engineering and contrast learning
CN115906854A (en) Multi-level confrontation-based cross-language named entity recognition model training method
CN116702765A (en) Event extraction method and device and electronic equipment
CN115840820A (en) Small sample text classification method based on domain template pre-training

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant