WO2023173560A1 - Rpa and ai based text error correction method, training method and related device thereof - Google Patents

Rpa and ai based text error correction method, training method and related device thereof Download PDF

Info

Publication number
WO2023173560A1
WO2023173560A1 PCT/CN2022/091292 CN2022091292W WO2023173560A1 WO 2023173560 A1 WO2023173560 A1 WO 2023173560A1 CN 2022091292 W CN2022091292 W CN 2022091292W WO 2023173560 A1 WO2023173560 A1 WO 2023173560A1
Authority
WO
WIPO (PCT)
Prior art keywords
character
text
predicted
processed
characters
Prior art date
Application number
PCT/CN2022/091292
Other languages
French (fr)
Chinese (zh)
Inventor
王建周
Original Assignee
来也科技(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 来也科技(北京)有限公司 filed Critical 来也科技(北京)有限公司
Publication of WO2023173560A1 publication Critical patent/WO2023173560A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/12Detection or correction of errors, e.g. by rescanning the pattern
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/19007Matching; Proximity measures
    • G06V30/19093Proximity measures, i.e. similarity or distance measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19147Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/26Techniques for post-processing, e.g. correcting the recognition result
    • G06V30/262Techniques for post-processing, e.g. correcting the recognition result using context analysis, e.g. lexical, syntactic or semantic context
    • G06V30/274Syntactic or semantic context, e.g. balancing

Definitions

  • the present disclosure relates to the fields of Artificial Intelligence (AI for short) and Robotic Process Automation (RPA for short), and in particular to a text error correction method, training method and related equipment based on RPA and AI.
  • AI Artificial Intelligence
  • RPA Robotic Process Automation
  • RPA uses specific "robot software” to simulate human operations on computers and automatically execute process tasks according to rules.
  • AI is a technical science that studies and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence.
  • character information in images or PDF Portable Document Format
  • OCR Optical Character Recognition
  • the accuracy of the OCR recognition results may not be guaranteed.
  • the image size is small, the text "account number" in the image may be mistakenly recognized as “account number”.
  • the image is stamped with a seal, the accuracy of the OCR recognition results cannot be guaranteed.
  • the word "car” in the image may be mistakenly recognized as "military”.
  • the present disclosure aims to solve one of the technical problems in the related art, at least to a certain extent.
  • the present disclosure proposes a text error correction method, training method and related equipment based on RPA and AI to correct the characters in the text information after identifying the text information based on OCR technology, thereby improving the text Accuracy and reliability of information identification results.
  • the first embodiment of the present disclosure proposes a text error correction method based on RPA and AI.
  • the method includes:
  • a target character is determined from the at least one replacement character, and the target character is used to replace the character to be processed in the predicted text to obtain Identify text.
  • the second embodiment of the present disclosure provides a method for training the prediction model described in the first embodiment of the present disclosure, including:
  • Model parameters in the prediction model are adjusted based on the difference between the sample text and the output text.
  • the third embodiment of the present disclosure proposes a text error correction device based on RPA and AI, including:
  • a recognition module configured to perform character recognition on the image to be recognized based on the optical character recognition OCR model, to obtain the predicted text and the confidence of each predicted character in the predicted text;
  • a determination module configured to determine the characters to be processed from each of the predicted characters according to the confidence of each of the predicted characters
  • a masking module used to mask the characters to be processed in the predicted text
  • a prediction module used to use a prediction model to perform character prediction on the masked predicted text to obtain at least one replacement character corresponding to the character to be processed;
  • a replacement module configured to determine a target character from the at least one replacement character based on the similarity between the at least one replacement character and the to-be-processed character, and use the target character to replace the to-be-processed character in the predicted text. Process characters to get recognized text.
  • the fourth embodiment of the present disclosure provides a device for training the prediction model described in the third embodiment of the present disclosure, including:
  • a masking module used to mask at least one sample character in the sample text to obtain the masked sample text
  • An input module for inputting the masked sample text into an initial prediction model, so as to use the prediction model to perform character prediction on the masked sample text to obtain output text;
  • An adjustment module configured to adjust model parameters in the prediction model based on the difference between the sample text and the output text.
  • the embodiment of the fifth aspect of the present disclosure provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • the processor executes the computer program, the present disclosure is implemented.
  • the sixth embodiment of the present disclosure provides a non-transitory computer-readable storage medium on which a computer program is stored.
  • the computer program is executed by a processor, the method described in the first embodiment of the disclosure is implemented. Alternatively, implement the method as described in the embodiment of the second aspect of the present disclosure.
  • the seventh embodiment of the present disclosure proposes a computer program product, including a computer program.
  • the computer program When executed by a processor, the computer program implements the method described in the first embodiment of the present disclosure, or implements the method as described in the first aspect of the present disclosure.
  • character recognition is performed on the image to be recognized to obtain the predicted text and the confidence of each predicted character in the predicted text; based on the confidence of each predicted character, the character to be processed is determined from each predicted character; the characters in the predicted text are The characters to be processed are masked, and a prediction model is used to predict the characters of the masked predicted text to obtain at least one replacement character corresponding to the character to be processed; based on the similarity between the at least one replacement character and the character to be processed, from at least one Determine the target character among the replacement characters, and use the target character to replace the character to be processed in the predicted text to obtain the recognized text.
  • Figure 1 is a schematic diagram of the image to be recognized.
  • Figure 2 is a schematic diagram of the image to be recognized.
  • FIG. 3 is a schematic flowchart of a text error correction method based on RPA and AI provided by an embodiment of the present disclosure.
  • Figure 4 is a schematic flowchart of a text error correction method based on RPA and AI provided by an embodiment of the present disclosure.
  • Figure 5 is a schematic flowchart of a text error correction method based on RPA and AI provided by an embodiment of the present disclosure.
  • Figure 6 is a schematic flowchart of a text error correction method based on RPA and AI provided by an embodiment of the present disclosure.
  • Figure 7 is a schematic flowchart of a text error correction method based on RPA and AI provided by an embodiment of the present disclosure.
  • Figure 8 is a schematic flowchart of a text error correction method based on RPA and AI provided by an embodiment of the present disclosure.
  • Figure 9 is a schematic flowchart of a text error correction method based on RPA and AI provided by an embodiment of the present disclosure.
  • Figure 10 is a schematic diagram of the OCR recognition process in an embodiment of the present disclosure.
  • Figure 11 is a schematic flowchart of Chinese text line content recognition according to an embodiment of the present disclosure.
  • Figure 12 is a schematic structural diagram of a prediction model in an embodiment of the present disclosure.
  • Figure 13 is a schematic diagram of character similarity in an embodiment of the present disclosure.
  • Figure 14 is a schematic diagram of the image to be recognized.
  • Figure 15 is a schematic diagram of the image to be recognized.
  • Figure 16 is a schematic flowchart of a training method provided by an embodiment of the present disclosure.
  • Figure 17 is a schematic structural diagram of a text error correction device based on RPA and AI provided by an embodiment of the present disclosure.
  • Figure 18 is a schematic structural diagram of a training device provided by an embodiment of the present disclosure.
  • FIG. 19 illustrates a block diagram of an exemplary electronic device suitable for implementing embodiments of the present disclosure.
  • the first aspect is interference from stains, such as red stamps, ink marks, etc.;
  • the shape is similar to characters, such as " ⁇ ” and " ⁇ , ⁇ , ⁇ ”;
  • the third aspect is font deformation caused by image deformation.
  • the first is to add semantic information during the OCR decoding process to ensure that not only image features are used during decoding, but also semantic alignment with the previous character is achieved.
  • the second is to add multi-task prediction of semantic information. That is, characters are randomly blocked on the image through mosaic.
  • the model finally has two prediction networks (i.e., head network). One prediction network is used to predict the characters displayed on the image, and the other prediction network is used to predict the characters blocked on the image. , whereby the model can learn semantic-based error correction information.
  • the inventor found after many tests that in order to improve the prediction effect of the model, in a large number of synthesized image samples, the number of exposures of some long-tail characters needs to be increased. For example, some rare characters need to appear enough in the image samples. A large number of times, that is, it is necessary to add random text containing rare words to the image sample, and in order to enable the model to distinguish some similar words, multiple similar words need to be deliberately allowed to appear in one image sample, which will result in a synthetic image sample The text in lacks semantic coherence. Therefore, the first way above will not apply.
  • this disclosure proposes a text error correction method, training method and related equipment based on RPA and AI.
  • RPA Robotic Process Automation
  • labor cost investment can be significantly reduced, existing office efficiency can be effectively improved, and work can be completed accurately, stably, and quickly.
  • AI is the abbreviation of Artificial Intelligence. It is a technical science that researches and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence.
  • AI is the study of using computers to simulate certain human thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.). It has both hardware-level technology and software-level technology.
  • AI hardware technology generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, and big data processing; AI software technology mainly includes computer vision technology, speech recognition technology, and Natural Language Processing (NLP). technology and machine learning/deep learning, big data processing technology, knowledge graph technology and other major directions.
  • NLP Natural Language Processing
  • OCR refers to the process in which electronic equipment checks characters printed on paper, determines their shape by detecting dark and light patterns, and then uses character recognition methods to translate the shape into computer text; that is, for printed characters, optical This method converts the text in the paper document into a black and white dot matrix image file, and uses recognition software to convert the text in the image into a text format for further editing and processing by word processing software.
  • OCR model is a pre-trained model that has learned the correspondence between the input image and the output text.
  • Image to be recognized refers to any image that needs to be recognized.
  • the image to be recognized can be an image containing invoice information, an image containing order information, etc.
  • Predicted text refers to the text information or OCR recognition results obtained by the OCR model for character recognition of the image to be recognized.
  • Consence also known as recognition probability or classification probability, refers to the probability value output by the OCR model.
  • Prediction model refers to a model obtained after training. The prediction model is used to predict characters of input text.
  • Mask character refers to the character used to mask text information.
  • the mask character can be a preset character, or the mask character can also be a random character. For example, it can be selected from a set dictionary. Get a random character as a mask character.
  • the "set dictionary” refers to a preset dictionary.
  • the set dictionary may include common or commonly used characters.
  • the setting dictionary can include 3900 common Chinese characters.
  • Target text refers to the text information output by the prediction model.
  • Sample text refers to the text information used to train the prediction model.
  • Specific characters refer to preset special characters.
  • the specific characters can be OOV (Out of Vocabulary, outside the vocabulary) characters, or they can also be other characters. This disclosure does not Make restrictions.
  • Set encoding algorithm refers to a preset encoding algorithm.
  • the set encoding algorithm can be a four-corner encoding algorithm, or it can also be other encoding algorithms, and this disclosure does not limit this.
  • Constant refers to a preset threshold.
  • FIG. 3 is a schematic flowchart of a text error correction method based on RPA and AI provided by an embodiment of the present disclosure.
  • the present disclosure takes as an example that the text error correction method based on RPA and AI is configured in a text error correction device.
  • the text error correction device can be applied to any device with computing power in electronic devices.
  • the electronic device may be a personal computer, a mobile terminal, etc.
  • the mobile terminal is, for example, a mobile phone, a tablet computer, a personal digital assistant and other hardware devices with various operating systems.
  • the text error correction method based on RPA and AI can be applied to an RPA robot, where the RPA robot can run in any electronic device with computing capabilities.
  • the text error correction method based on RPA and AI can include the following steps:
  • Step 101 Based on the OCR model, perform character recognition on the image to be recognized to obtain the predicted text and the confidence of each predicted character in the predicted text.
  • the image to be recognized may be directly acquired.
  • the image to be recognized containing text information may be directly acquired.
  • the image to be recognized can also be obtained indirectly.
  • documents in PDF, PSD (PSD is a special format of Adobe's graphic design software Photoshop) containing text information can be obtained, and the documents containing text information can be extracted or intercepted from the above documents.
  • PSD PSD is a special format of Adobe's graphic design software Photoshop
  • the format of the image to be recognized can be JPG (or JPEG (Joint Photographic Experts Group, Joint Photographic Experts Group)), PNG (Portable Network Graphics, portable network graphics) and other image formats, and this disclosure does not limit this.
  • the OCR model can be used to perform character recognition on the image to be recognized, and the predicted text and the confidence of each predicted character in the predicted text can be obtained.
  • Step 102 Determine the character to be processed from each predicted character based on the confidence level of each predicted character.
  • characters to be processed may be determined from each predicted character according to the confidence level of each predicted character, where the number of characters to be processed may be at least one.
  • the confidence of each predicted character can be compared with a set confidence threshold, and the predicted characters with a confidence higher than the confidence threshold can be used as characters to be processed. .
  • the predicted character with the highest confidence level may be used as the character to be processed.
  • the predicted characters can be sorted from large to small according to the confidence value, and the target number of predicted characters ranked first are selected as the characters to be processed.
  • the value of the number of targets is positively related to the length of the predicted text, that is, the longer the length of the predicted text, the greater the value of the number of targets.
  • the number of sentences contained in the predicted text can be determined, and the target number is determined based on the number of sentences. value.
  • the number of targets and the number of statements are positively related, that is, the more the number of statements, the greater the value of the target number, and conversely, the fewer the number of statements, the smaller the value of the target number.
  • the characters to be processed can be determined from each predicted character based on different methods, which can improve the flexibility and applicability of the method.
  • Step 103 Mask the characters to be processed in the predicted text, and use a prediction model to perform character prediction on the masked predicted text to obtain at least one replacement character corresponding to the character to be processed.
  • the characters to be processed in the predicted text can be masked to obtain the masked predicted text, and the masked predicted text can be input into the prediction model to use the prediction model to predict the masked text.
  • Predict text to perform character prediction and obtain at least one replacement character corresponding to the character to be processed.
  • Step 104 Determine the target character from the at least one replacement character based on the similarity between the at least one replacement character and the character to be processed, and use the target character to replace the character to be processed in the predicted text to obtain the recognized text.
  • the similarity between each replacement character and the character to be processed can be calculated, and the target character is determined from each replacement character based on the similarity between each replacement character and the character to be processed, and the target character is used to predict the replacement The characters to be processed in the text to get the recognized text.
  • the similarity may be character similarity (or character shape similarity). That is to say, in this disclosure, it is considered that the main reasons for OCR recognition errors are: similar characters or interference between characters. Therefore, in this disclosure, the character similarity (or character shape) between each replacement character and the character to be processed can be calculated. Similarity), according to the character similarity (or character shape similarity), determine the target character from each replacement character, and use the target character to replace the character to be processed in the predicted text.
  • each replacement character can be encoded based on the set encoding algorithm to obtain the first encoding value of each replacement character, and based on the set encoding algorithm, the character to be processed can be encoded to obtain the value of the character to be processed.
  • the second encoding value so in the present disclosure, the similarity between each replacement character and the character to be processed can be determined based on the difference between the first encoding value and the second encoding value of each replacement character.
  • the encoding algorithm can be a four-corner encoding algorithm. It can be understood that the more similar the glyphs of Chinese characters are, the closer the four-corner encoding values will be. Based on the above characteristics, each replacement can be calculated.
  • the difference between the first encoding value and the second encoding value of the character determines the similarity between each replacement character and the character to be processed based on the difference between the first encoding value and the second encoding value of each replacement character.
  • similarity and difference have an inverse relationship, that is, the smaller the difference, the higher the similarity, and conversely, the larger the difference, the lower the similarity.
  • the above example only uses the four-corner encoding algorithm as the set encoding algorithm.
  • the set encoding algorithm can also be other encoding algorithms, and this disclosure does not limit this.
  • the replacement character can be drawn according to the replacement character to obtain the first image.
  • the replacement character can be drawn on a blank image to obtain the first image, and the first image can be obtained according to the character to be processed.
  • Draw to obtain the second image For example, you can also draw the character to be processed on a blank image to obtain the second image.
  • the similarity between the first image and the second image can be calculated, and the similarity between the replacement character and the character to be processed is determined based on the similarity between the first image and the second image.
  • the similarity between the replacement character and the character to be processed has a positive relationship with the similarity between the first image and the second image, that is, the higher the similarity between the first image and the second image, the higher the similarity between the replacement character and the character to be processed.
  • feature extraction can be performed on the replacement character based on the feature extraction algorithm to obtain the feature vector corresponding to the replacement character, and feature extraction is performed on the character to be processed to obtain the character to be processed. eigenvector.
  • the similarity between the replacement character and the character to be processed may be determined based on the similarity between the feature vector of the replacement character and the feature vector of the character to be processed.
  • the similarity between the replacement character and the character to be processed is positively related to the similarity between the feature vector of the replacement character and the feature vector of the character to be processed.
  • the target character after determining the similarity between the replacement character and the character to be processed, can be determined from each replacement character according to the similarity between the replacement character and the character to be processed.
  • the similarity can be The largest replacement character is used as the target character, so that the target character can be used to replace the character to be processed in the predicted text to obtain the recognized text.
  • the text error correction method based on RPA and AI in the embodiment of the present disclosure performs character recognition on the image to be recognized based on the OCR model to obtain the predicted text and the confidence of each predicted character in the predicted text; according to the confidence of each predicted character, Determine the characters to be processed from each predicted character; mask the characters to be processed in the predicted text, and use a prediction model to predict the characters of the masked predicted text to obtain at least one replacement character corresponding to the character to be processed; according to The similarity between at least one replacement character and the to-be-processed character is determined, the target character is determined from the at least one replacement character, and the target character is used to replace the to-be-processed character in the predicted text to obtain the recognized text.
  • the present disclosure also proposes a text error correction method based on RPA and AI.
  • Figure 4 is a schematic flowchart of another text error correction method based on RPA and AI provided by an embodiment of the present disclosure.
  • the text error correction method based on RPA and AI can include the following steps:
  • Step 201 Based on the OCR model, perform character recognition on the image to be recognized to obtain the predicted text and the confidence of each predicted character in the predicted text.
  • Step 202 Determine the character to be processed from each predicted character based on the confidence level of each predicted character.
  • steps 201 to 202 can refer to the execution process of any embodiment of the present disclosure, and will not be described again here.
  • Step 203 Determine the target position of the character to be processed in the predicted text.
  • the position of the character to be processed in the predicted text can be determined, which is recorded as the target position in the present disclosure.
  • the target position can be: the fourth character position.
  • Step 204 Obtain the masked character and replace the character to be processed at the target position in the predicted text with the masked character to obtain the masked predicted text.
  • the mask characters may be preset fixed characters, or the mask characters may be random characters, which the disclosure does not limit.
  • the masked character can be used to replace the character to be processed at the target position in the predicted text to obtain the masked predicted text.
  • Step 205 Input the masked predicted text into the prediction model, so as to use the prediction model to perform character prediction on the masked predicted text to obtain at least one target text.
  • the masked predicted text can be input to the prediction model, so that the prediction model performs character prediction on the masked predicted text to obtain at least one target text. That is to say, in this disclosure, the prediction model can use a similar method to machine translation to predict all characters in the entire text and obtain at least one target text.
  • the target text output by the prediction model can be "There is a peach blossom tree in the southeast corner of the garden" and "There is a peach blossom tree in the southwest corner of the garden.”
  • Step 206 Use at least one character at the target position in the target text as at least one replacement character.
  • a character at a target position in at least one target text may be used as at least one replacement character.
  • the character at the fourth character position in each target text can be used as the replacement character.
  • the replacement characters can be "East" and "West”.
  • Step 207 Determine the target character from the at least one replacement character based on the similarity between the at least one replacement character and the character to be processed, and use the target character to replace the character to be processed in the predicted text to obtain the recognized text.
  • step 207 can be referred to the execution process of any embodiment of the present disclosure, and will not be described again.
  • the text error correction method based on RPA and AI in the embodiment of the present disclosure determines the target position of the character to be processed in the predicted text; obtains the mask character, and uses the mask character to replace the character to be processed at the target position in the predicted text, to obtain the masked predicted text; input the masked predicted text into the prediction model, and use the prediction model to perform character prediction on the masked predicted text to obtain at least one target text; convert at least one target text into the target The character at position, as at least one replacement character. Therefore, using deep learning technology to predict at least one replacement character can improve the accuracy and reliability of prediction results.
  • the present disclosure also proposes a text error correction method.
  • Figure 5 is a schematic flowchart of another text error correction method based on RPA and AI provided by an embodiment of the present disclosure.
  • the text error correction method may also include the following steps:
  • Step 301 Obtain sample text.
  • the sample text can be obtained from the existing training set, or the sample text can also be collected online, for example, the sample text can be collected online through web crawler technology, or the sample text can also be collected offline, such as Images of paper text content can be collected, and then each character in the image can be identified through OCR technology to obtain sample text, or the sample text can also be artificially synthesized, etc.
  • the embodiments of the present disclosure are not limited to this .
  • Step 302 Mask at least one sample character in the sample text to obtain a masked sample text.
  • mask characters may be used to mask at least one sample character in the sample text to obtain a masked sample text.
  • At least one sample character in the sample text can be replaced with a random character with a set first random probability, and/or at least one sample character in the sample text can be replaced with a set second random probability. Characters are replaced with fixed characters.
  • the first random probability and the second random probability may be the same or different, and this disclosure does not limit this.
  • the current random probability does not match the first random probability, continue to determine whether the current random probability matches the second random probability. If the current random probability matches the second random probability, replace at least one sample character in the sample text with a fixed character, if the current random probability does not match the second random probability, no processing will be performed.
  • Step 303 Input the masked sample text into the initial prediction model, so as to use the prediction model to perform character prediction on the masked sample text to obtain output text.
  • the masked sample text can be input to the initial prediction model, so that the prediction model performs character prediction on the masked sample text to obtain output text.
  • Step 304 Adjust the model parameters in the prediction model based on the difference between the sample text and the output text.
  • model parameters in the prediction model can be adjusted based on the difference between the sample text and the output text.
  • the target loss value can be generated based on the difference between the sample text and the output text, where the target loss value has a positive relationship with the above difference, that is, the smaller the difference, the smaller the target loss value, and vice versa. , the greater the difference, the greater the target loss value.
  • the prediction model can be trained according to the target loss value, that is, the model parameters in the prediction model can be adjusted.
  • the prediction model can be trained based on the target loss value to minimize the target loss value.
  • the termination condition of model training can also be set.
  • the termination condition can also be when the number of training times reaches the set number. Thresholds, etc., this disclosure does not limit this.
  • the text error correction method based on RPA and AI in the embodiment of the present disclosure obtains sample text; masks at least one sample character in the sample text to obtain a masked sample text; and inputs the masked sample text into
  • the initial prediction model uses the prediction model to predict the characters of the masked sample text to obtain the output text; based on the difference between the sample text and the output text, the model parameters in the prediction model are adjusted. Therefore, by pre-training the prediction model, the prediction effect of the prediction model can be improved. That is, using the trained prediction model to predict the characters of the masked prediction text can improve the accuracy and reliability of the target text prediction results. .
  • the present disclosure also proposes a text error correction method.
  • Figure 6 is a schematic flowchart of another text error correction method based on RPA and AI provided by an embodiment of the present disclosure.
  • the text error correction method may also include the following steps:
  • Step 401 Obtain sample text.
  • Step 402 Mask at least one sample character in the sample text to obtain a masked sample text.
  • Step 403 Input the masked sample text into the initial prediction model, so as to use the prediction model to perform character prediction on the masked sample text to obtain output text.
  • steps 401 to 403 can refer to the execution process of any embodiment of the present disclosure, and will not be described again here.
  • Step 404 Generate a first loss value based on the difference in confidence distribution of characters between the sample text and the output text.
  • the difference in confidence distribution (or probability distribution) of characters between the sample text and the output text can be determined, and the first loss value is generated based on the above confidence distribution difference (or probability distribution difference).
  • the first loss value has a positive relationship with the difference in confidence distribution. That is, the smaller the difference in confidence distribution, the smaller the value of the first loss value. On the contrary, the greater the difference in confidence distribution, the smaller the value of the first loss value. The bigger.
  • the first loss value may be cross-entropy loss Loss.
  • Step 405 Determine the first position of at least one sample character to be masked in the sample text.
  • the position of at least one sample character to be masked in the sample text can be determined, which is recorded as the first position in the present disclosure.
  • Step 406 Generate a second loss value based on the difference between the sample character at the first position in the sample text and the output character at the first position in the output text.
  • a difference between a sample character at a first position in the sample text and an output character at a first position in the output text may be determined, and a second loss value is generated based on the difference.
  • the second loss value has a positive relationship with the above-mentioned difference, that is, the smaller the difference, the smaller the value of the second loss value, and conversely, the larger the difference, the greater the value of the second loss value.
  • the second loss value can also be cross entropy Loss.
  • Step 407 Generate a target loss value based on the first loss value and the second loss value.
  • the target loss value may be generated based on the first loss value and the second loss value.
  • the target loss value and the first loss value have a positive relationship
  • the target loss value and the second loss value also have a positive relationship.
  • the first loss value and the second loss value can be weighted according to the first weight of the first loss value and the second weight of the second loss value to obtain a weighted result, and the target loss value is determined based on the weighted result, Among them, the target loss value has a positive relationship with the weighted result.
  • the second weight may be greater than the first weight.
  • Step 408 Adjust the model parameters in the prediction model according to the target loss value.
  • the prediction model can be trained according to the target loss value, that is, the model parameters in the prediction model can be adjusted.
  • the prediction model can be trained based on the target loss value to minimize the target loss value.
  • the termination condition of model training can also be set.
  • the termination condition can also be when the number of training times reaches the set number. Thresholds, etc., this disclosure does not limit this.
  • the text error correction method based on RPA and AI in the embodiment of the present disclosure can improve the prediction effect of the prediction model by pre-training the prediction model, that is, using the trained prediction model to perform character prediction on the masked prediction text. It can improve the accuracy and reliability of target text prediction results.
  • the setting dictionary can be obtained, and it can be determined whether each character in the masked sample text is located in the setting dictionary. If each character in the masked sample text is located in By setting the dictionary, the masked sample text can be directly input into the initial prediction model, so that the prediction model can be used to predict the characters of the masked sample text to obtain the output text.
  • the target loss value may be the weighted result of the first loss value and the second loss value, that is, the first loss value and the second loss value may be combined according to the first weight of the first loss value and the second weight of the second loss value.
  • the loss values are weighted to obtain the target loss value.
  • the model is used to predict characters of the replaced sample text using a prediction model to obtain the output text.
  • the target loss value is not only obtained based on the weighted result of the first loss value and the second loss value, but also needs to be obtained based on specific characters.
  • FIG. 7 is a schematic flowchart of another text error correction method based on RPA and AI provided by an embodiment of the present disclosure.
  • step 407 may include the following steps:
  • Step 501 Determine the second position of the first character in the sample text.
  • the position of the first character in the sample text can be determined, which is denoted as the first character in the present disclosure. Two positions.
  • Step 502 Generate a third loss value based on the difference between the specific character and the output character located at the second position in the output text.
  • the third loss value may be generated based on the difference between the specific character and the output character located at the second position in the output text.
  • the third loss value has a positive relationship with the above-mentioned difference, that is, the smaller the difference, the smaller the value of the third loss value, and conversely, the greater the difference, the greater the value of the third loss value.
  • Step 503 Weight the first loss value and the second loss value according to the first weight of the first loss value and the second weight of the second loss value to obtain a fourth loss value, where the second weight is greater than the first Weights.
  • the first loss value and the second loss value can be weighted according to the first weight of the first loss value and the second weight of the second loss value to obtain a weighted result, and the weighted result is used as the fourth loss value.
  • Step 504 Generate a target loss value based on the difference between the fourth loss value and the third loss value.
  • the target loss value can be generated according to the difference between the fourth loss value and the third loss value, where the target loss value is positively related to the above difference, that is, the smaller the difference, the smaller the target loss value is.
  • the smaller the value conversely, the greater the difference, the greater the target loss value.
  • the third loss value can be subtracted from the fourth loss value to obtain the fifth loss value, and the fifth loss value can be amplified to obtain the target loss value. That is to say, the third loss value corresponding to the specific characters that the model does not need to pay attention to can be removed from the fourth loss value, and in order to avoid having fewer effective characters in the output text, resulting in a lower target loss value. If this happens, in this disclosure, the fifth loss value can be amplified.
  • the text error correction method based on RPA and AI in the embodiment of the present disclosure can improve the prediction effect of the prediction model by pre-training the prediction model, that is, using the trained prediction model to perform character prediction on the masked prediction text. It can improve the accuracy and reliability of target text prediction results.
  • the present disclosure also proposes a text error correction method based on RPA and AI.
  • FIG. 8 is a schematic flowchart of another text error correction method based on RPA and AI provided by an embodiment of the present disclosure.
  • the text error correction method based on RPA and AI can include the following steps:
  • Step 601 Based on the OCR model, perform character recognition on the image to be recognized to obtain the predicted text and the confidence of each predicted character in the predicted text.
  • Step 602 Determine the character to be processed from each predicted character based on the confidence level of each predicted character.
  • steps 601 to 602 can refer to the execution process of any embodiment of the present disclosure, and will not be described again here.
  • Step 603 Use masked characters to replace characters to be processed in the predicted text to obtain masked predicted text.
  • the mask characters may be preset fixed characters, or the mask characters may be random characters, which the disclosure does not limit.
  • masked characters can be used to replace characters to be processed in the predicted text to obtain masked predicted text.
  • Step 604 Input the masked predicted text into the prediction model, so as to use the prediction model to predict the masked characters in the masked predicted text to obtain at least one replacement character.
  • the masked predicted text can be input to the prediction model, so that the prediction model predicts the masked characters in the masked predicted text to obtain at least one replacement character. That is to say, in the present disclosure, the prediction model can predict only the masked characters, similar to the cloze task.
  • the prediction model can be trained in the following manner: obtaining sample text.
  • the sample text in the above Figure 5 to Figure 7 can be Marked as the first sample text
  • the sample text in this embodiment can be marked as the second sample text
  • at least one second sample character in the second sample text can be masked to obtain the masked second sample text
  • input the masked second sample text into the initial prediction model, so that the prediction model predicts at least one masked second sample character to obtain at least one recognized character, so that in this disclosure, it can be based on at least The difference between a recognized character and at least one first sample character adjusts model parameters in the prediction model.
  • a target loss value can be generated based on the difference between at least one recognized character and at least one first sample character, where the target loss value has a positive relationship with the above-mentioned difference, that is, the smaller the difference, the smaller the target loss value.
  • the prediction model can be trained according to the target loss value, that is, the model parameters in the prediction model can be adjusted.
  • the prediction model can be trained based on the target loss value to minimize the target loss value.
  • the termination condition of model training can also be set.
  • the termination condition can also be when the number of training times reaches the set number. Thresholds, etc., this disclosure does not limit this.
  • Step 605 Determine the target character from the at least one replacement character based on the similarity between the at least one replacement character and the character to be processed, and use the target character to replace the character to be processed in the predicted text to obtain the recognized text.
  • step 605 can be referred to the execution process of any embodiment of the present disclosure, and will not be described again here.
  • the text error correction method based on RPA and AI in the embodiment of the present disclosure uses masked characters to replace the characters to be processed in the predicted text to obtain the masked predicted text; input the masked predicted text into the prediction model , to use a prediction model to predict the masked characters in the masked prediction text, and obtain at least one replacement character. Therefore, using deep learning technology to predict at least one replacement character can improve the accuracy and reliability of prediction results. In addition, predicting replacement characters in a different manner from the embodiment shown in Figure 2 can improve the flexibility and applicability of the method.
  • the present disclosure also proposes a text error correction method.
  • Figure 9 is a schematic flowchart of another text error correction method based on RPA and AI provided by an embodiment of the present disclosure.
  • the text error correction method based on RPA and AI can include the following steps:
  • Step 701 Use the feature extraction branch in the OCR model to extract features from the image to be recognized to obtain the first feature map.
  • the feature extraction branch in the OCR model can be used to extract features of the image to be recognized to obtain the first feature map.
  • the feature extraction branch can be a backbone network such as CNN (Convolutional Neural Network) or VIT (Vision Transformer). Features are extracted from the image to be identified through the above backbone network to obtain the first feature map.
  • the image to be recognized may be tilted, deformed or flipped to a certain extent, and the above situations will affect the reliability of subsequent model recognition results. Therefore, in a possible implementation of the embodiment of the present disclosure, in order to improve the accuracy of subsequent model recognition results, after obtaining the image to be recognized, the tilt angle of the image to be recognized can be corrected.
  • angle prediction can be performed on the image to be recognized, and the tilt angle of the document image to be processed can be determined.
  • an image classification model can be used to predict the angle of the image to be recognized and determine the tilt angle of the image to be recognized.
  • angle prediction of the image to be recognized can be performed based on a corner point detection algorithm to determine the tilt angle of the image to be recognized.
  • the conventional target detection algorithm is invalid.
  • the key point detection algorithm can be used to detect the four corner points of the irregular quadrilateral. Then the quadrilateral is extracted based on the four corner points, so that the tilt angle of the image to be recognized can be determined based on the extracted quadrilateral.
  • the image to be recognized after determining the tilt angle of the image to be recognized, the image to be recognized can be rotated according to the above tilt angle, so that the angle of the image to be recognized that is tilted or flipped can be corrected. Improve the reliability of subsequent image recognition results.
  • Step 702 Use the fusion branch in the OCR model to fuse the first feature map and the location map to obtain a second feature map.
  • Each element in the location map corresponds to each element in the first feature map.
  • the location map The elements in are used to indicate the coordinates of the corresponding elements in the first feature map in the image to be recognized.
  • the position of the image to be recognized can be coded to obtain a position map, in which each element in the position map corresponds to each element in the first feature map one-to-one, and each element in the position map is used to Indicates the coordinates of the corresponding element in the first feature map in the image to be recognized.
  • the fusion branch in the OCR model can be used to fuse the first feature map and the position map to obtain the second feature map.
  • the first feature map and the corresponding position map can be spliced to obtain the second feature map.
  • the first feature map and the corresponding position map can be spliced to obtain a spliced feature map, and the spliced feature map can be input into the convolution layer to fuse to obtain the second feature map.
  • Step 703 Use the feature transformation branch in the OCR model to perform feature transformation on the second feature map to obtain a third feature map.
  • the feature transformation branch in the OCR model can be used to perform feature transformation on the second feature map to obtain the third feature map.
  • Step 704 Use the prediction branch in the OCR model to decode the third feature map to obtain the predicted text and the confidence corresponding to each predicted character in the predicted text.
  • the prediction branch in the OCR model can be used to decode the third feature map to obtain the predicted text and the confidence corresponding to each predicted character in the predicted text.
  • Step 705 Determine the character to be processed from each predicted character based on the confidence level of each predicted character.
  • Step 706 Mask the characters to be processed in the predicted text, and use a prediction model to perform character prediction on the masked predicted text to obtain at least one replacement character corresponding to the character to be processed.
  • Step 707 Determine a target character from at least one replacement character based on the similarity between the at least one replacement character and the character to be processed, and use the target character to replace the character to be processed in the predicted text to obtain the recognized text.
  • steps 705 to 707 can be referred to the execution process of any embodiment of the present disclosure, and will not be described again.
  • error correction can be used as a post-processing model that is not coupled to OCR recognition.
  • the inventor uses the OCR recognition process as shown in Figure 10 to exemplify the recognition of character information in the image.
  • the OCR recognition process mainly includes: rotating the image; detecting the text rows (columns) on the image; performing character recognition on the content of the text rows (columns); and restoring the coordinate information of each character recognized. ; Output each character, the coordinates of each character and the confidence of each character.
  • a post-processing module can be added after the stage of text line content identification and before the coordinate information restoration stage, and the above-mentioned post-processing module can be used to correct the text line content.
  • the model structure used in this stage of text line content recognition can be shown in Figure 11.
  • the feature extraction branch such as a backbone network such as CNN or VIT, can be used to extract features of the image to be recognized to obtain the first feature map; through the fusion branch, the The extracted first feature map is fused with the position map to obtain a second feature map; through the feature transformation branch, feature transformation is performed on the second feature map to obtain a third feature map.
  • the feature transformation branch may include a transformation branch (the The conversion branch is used to convert the second feature map into one-dimensional features (Reshape to 1-D) or two-dimensional features (Reshape to 2-D)), and Transformer (used to continue feature transformation on the converted features to obtain features sequence feature) and MLP (Multi Layer Perceptron, multi-layer perceptron) (used to perform feature transformation on the feature sequence to obtain the third feature map); through prediction branches, such as CTC (Connectionist temporal classification, neural network-based time series class Classification), decode the third feature map to obtain the predicted text and the confidence corresponding to each predicted character in the predicted text.
  • CTC Connectionist temporal classification, neural network-based time series class Classification
  • multi-task training such as center-loss and rdrop (Regularized Dropout, a regularization method)
  • center-loss and rdrop Regularized Dropout, a regularization method
  • the text error correction method based on RPA and AI in the embodiment of the present disclosure can improve the accuracy and reliability of the recognition results by using deep learning technology to perform OCR recognition on the image to be recognized.
  • the first category replace SubStitution. For example, replace “Today I feel so happy” with “Today I feel very happy.”
  • the second category is Insert & Delete. For example, add “Today I feel very happy” to "Today I feel very happy”; for another example, delete “Today I feel very happy” to "Today I feel very happy”.
  • the third category is rewriting Local Paraphrasing (to a minimum extent). For example, change "I feel very happy today” to "I feel very happy today.”
  • a supervised model is used to correct text errors, a large number of annotated sample pairs need to be used to train the above-mentioned supervised model before error correction.
  • the annotated sample pairs include incorrect results and accurate results of OCR recognition.
  • the training process is: training the above-mentioned supervised model based on the difference between the incorrect results and the accurate results of the labeled sample alignment.
  • the OCR recognition process includes multiple stages, each stage requires a GPU (graphics processing unit, image processing unit).
  • the post-processing module (or error correction model) can be lightweight
  • the model can be executed on the CPU (Central Processing Unit). Therefore, lightweight unsupervised models and solutions can be used to correct OCR recognition results.
  • the specific implementation plan is:
  • the first category is sequence tasks, such as machine translation. Simply put, it is to translate sentences with typos into accurate sentences.
  • the second category is to detect errors and correct them. To put it simply, it is to detect possible errors in the sentence, and if a typo is detected, the typo will be corrected.
  • the second type of error correction effect is better, such as the soft-masked BERT (Bidirectional Encoder Representations from Transformers, bidirectional encoder representation from the converter) model, which can achieve better correction
  • BERT Bit Encoder Representations from Transformers, bidirectional encoder representation from the converter
  • the above-mentioned second type of design model can be used to divide the entire error correction task into three parts:
  • the first part is: error word detection.
  • the last layer of the OCR model is an N classification task (N is the number of characters recognized). From the perspective of the loss function of the classification network, if the predicted characters are more accurate, the confidence of the character in the classification task will be The degree (or softmax recognition probability) p will also be higher.
  • the inventor used the OCR model to test the test text in the test set and obtained: the confidence or recognition probability of each character.
  • the confidence threshold or probability threshold
  • the optimal confidence threshold or optimal probability threshold
  • f is the recognition accuracy of the remaining characters when it is lower than the optimal confidence threshold (or optimal probability threshold) f. The improvement is not big.
  • the confidence level (or recognition probability) p of the character is used as a priori knowledge to detect incorrect words, without the need for an additional error detection model. Specifically, when the confidence level (or recognition probability) p of a character is higher than the optimal confidence threshold (or optimal probability threshold) f, the character is considered to be recognized correctly. On the contrary, when the confidence level (or recognition probability) of the character ( If the recognition probability) p is not higher than the optimal confidence threshold (or the optimal probability threshold) f, it is considered that the character recognition is incorrect.
  • the second part is: correct word recall.
  • Corrective word recall can be illustrated from offline model training and online model calling respectively.
  • Model design After statistical analysis, in the vast majority of error scenarios recognized by OCR, a sentence has only one error character, and due to the lack of supervised annotation sample pairs, in this disclosure, MLM (Mask Language Modeling) can be used , mask language model) method to implement model training based on self-supervised training tasks.
  • MLM Manufacturing Language Modeling
  • the first is to predict only the characters removed by the Mask, a task similar to cloze; the second is to predict all characters of the entire sentence, similar to machine translation. After experiments, it can be found that the second kind of prediction effect is better. For example, for the training corpus "There is a peach blossom tree in the southeast corner of the garden" (recorded as sample text in this disclosure), the following tasks can be designed (the left side is the model input, the right side is the model output):
  • the prediction model can use the standard six-layer encoding layer TransformerEncoderLayer stack.
  • the maximum character length of the model input is 32 bits.
  • the model parameter head_num is selected as 6.
  • the dimension of the character feature vector embedding is 128, that is, the prediction model (as shown on the right side of Figure 12).
  • Multi-Head Attention can include 6 layers of attention coding layers.
  • the coding sequence (up to 32 bits) is input into the prediction model.
  • the last layer of the prediction model can output a 32 ⁇ 128 coding vector, and finally the 6th layer can be
  • the output features of the attention encoding layer are classified into 32 predicted characters through softmax on 32 feature vectors.
  • Multi-Head Attention projects Q, K, and V through multiple (e.g., h) different linear transformations, and then splices different attention results; the splicing results are scaled dot product (Scaled Dot).
  • -Product Attention performs attention calculation; the results of attention calculation are subjected to non-linear transformation processing through the connection layer (Concat) and the linear layer (Liner).
  • the structure of Scaled Dot-Product Attention can be shown in the left part of Figure 12.
  • Q, K, and V are the three matrices obtained by performing matrix operations on the input of the attention layer and the model parameters respectively.
  • d k represents the normalization factor
  • T represents the transpose operation of the matrix.
  • the prediction model uses a model with Transformer as the basic structure is that the advantage of Transformer is to capture the correlation between long-distance characters through the multi-head self-attention mechanism MultiHeadSelfAttention in the multi-HeadAttention mechanism.
  • the multi-head attention mechanism Multi-HeadAttention can be shown in the right part of Figure 12.
  • Training data On the training corpus, multiple news data can be captured from multiple data sources, and the above multiple news data can be split into more than 200 million pieces of training corpus (recorded as sample text in this disclosure).
  • the sample text can be randomly masked during the training process, and the sample text can be masked with a first random probability (such as 10%). Any character is replaced with a random character, and any character in the sample text is replaced with a Mask character (i.e., a fixed character) with a second random probability (such as 10%).
  • the most common 3900 Chinese characters can be used to form a set dictionary; in order to improve the accuracy of the model prediction results, the numbers, English characters, punctuation, and Characters with strong semantic universality in Chinese characters, such as the quantifiers thousand and hundred, are replaced with specific characters, such as OOV characters.
  • Loss function Loss design Use cross-entropy loss function, in which the value of the loss function can be determined in the following way: Cross-entropy Loss consists of two parts, one part is the cross-entropy Loss of all predicted characters in the entire sentence (note in this disclosure is the first loss value), and part of it is the cross-entropy Loss of the masked characters (recorded as the second loss value in this disclosure). Because in the final use, more attention is paid to the second part, so the weight of the second part Loss will be more high.
  • the characters whose confidence or recognition probability p is lower than f in the OCR recognition results can be replaced with OOV characters. If the sentence contains more than 32 characters, the beginning and end of the sentence can be Truncate to ensure that the masked characters are in the middle of the sentence.
  • the OCR recognition results include all predicted characters, this disclosure only uses low-probability or low-confidence predicted characters as recalls, ignoring predicted characters in other positions, such as recalling 20 (Top20) predicted characters.
  • the third part is: Correcting word sorting.
  • the entire recall part is a purely semantic model. Only the Top 1 recalled by the model are used for correction. The accuracy of the correction results cannot be guaranteed.
  • the input of the model is: "There is a peach blossom tree in the south corner of the garden [mask]” , the predicted characters “East and West” may be wrong no matter which one is used, so a sorting module with prior knowledge is needed to select the most accurate characters.
  • character similarity can be used as a sorting index.
  • character similarity can be calculated through the following three methods:
  • the first is the Chinese character four-corner encoding algorithm. Based on the Chinese character four-corner encoding algorithm, the encoding value of the character is calculated, and the similarity between characters is determined based on the encoding value of the character.
  • the second type is image similarity. You can use regular fonts, Song fonts and other fonts to fill each character on a 128*128 picture with a white background, and then calculate the similarity between the pictures as the character similarity. Alternatively, you can extract features from the pictures and extract them based on the extracted characters. Feature vector to calculate image similarity. Thus, the character similarity can be determined based on the image similarity.
  • the third type OCR feature vector.
  • the feature vector of each character can be determined based on the softmax matrix of the last layer in the OCR model, so that similarity calculation can be performed based on the feature vector of the character.
  • the above matrix is an N ⁇ D matrix, where N is the number of characters and D is the vector dimension. It can be considered that the D-dimensional vector in each row of the matrix is the representation vector or feature vector of the corresponding character.
  • the character similarity shown in Figure 13 can be obtained.
  • the third column in Figure 13 is the character shape similarity.
  • the third kind is more effective than the first and second kinds.
  • the character similarity between the recalled characters and the characters to be corrected can be determined based on the character shape similarity. If the character similarity is higher than the set threshold, the character with the highest character similarity can be selected as the target character.
  • the inventor tested the model on a large-scale test set and found that on the basis of the already high F1 index, the F1 index could still be improved by more than 0.03%.
  • the above are various embodiments corresponding to the text error correction method.
  • the present disclosure also proposes a method for training the prediction model in any of the above method embodiments.
  • Figure 16 is a schematic flowchart of a training method provided by an embodiment of the present disclosure.
  • the training method may include the following steps:
  • Step 801 Obtain sample text.
  • Step 801 Mask at least one sample character in the sample text to obtain a masked sample text.
  • Step 802 Input the masked sample text into the initial prediction model, so as to use the prediction model to perform character prediction on the masked sample text to obtain output text.
  • Step 803 Adjust the model parameters in the prediction model based on the difference between the sample text and the output text.
  • the first loss value can be generated based on the difference in confidence distribution of characters between the sample text and the output text; it is determined that at least one sample character to be masked is in the sample text the first position; generate a second loss value based on the difference between the sample character at the first position in the sample text and the output character at the first position in the output text; according to the first loss value and the second loss value, Generate a target loss value; adjust the model parameters in the prediction model based on the target loss value.
  • the setting dictionary can be obtained; it is determined whether each character in the masked sample text is located in the setting dictionary; if there is a character in the masked sample text that is not located in the setting dictionary, When the first character in the dictionary is specified, replace the first character in the masked sample text with a specific character; use a prediction model to predict characters in the replaced sample text to obtain the output text.
  • a second position of the first character in the sample text may be determined; and a third character may be generated based on the difference between the specific character and the output character located at the second position in the output text.
  • Three loss values according to the first weight of the first loss value and the second weight of the second loss value, the first loss value and the second loss value are weighted to obtain a fourth loss value, where the second weight is greater than the third loss value.
  • One weight generate a target loss value based on the difference between the fourth loss value and the third loss value.
  • the training method of the embodiment of the present disclosure obtains sample text; masks at least one sample character in the sample text to obtain the masked sample text; and inputs the masked sample text into the initial prediction model to adopt
  • the prediction model performs character prediction on the masked sample text to obtain the output text; based on the difference between the sample text and the output text, the model parameters in the prediction model are adjusted. Therefore, by training the prediction model in advance, the prediction effect of the prediction model can be improved.
  • the present disclosure also provides a text error correction device based on RPA and AI. Since the RPA and AI based text error correction method provided by the embodiment of the present disclosure The text error correction device corresponds to the text error correction method based on RPA and AI provided by the above-mentioned embodiments of Figures 3 to 9. Therefore, the implementation of the text error correction method based on RPA and AI is also applicable to the text error correction method provided by the embodiment of the present disclosure.
  • the text error correction device based on RPA and AI will not be described in detail in the embodiment of this disclosure.
  • Figure 17 is a schematic structural diagram of a text error correction device based on RPA and AI provided by an embodiment of the present disclosure.
  • the text error correction device 1700 based on RPA and AI may include: an identification module 1710, a determination module 1720, a mask module 1730, a prediction module 1740, and a replacement module 1750.
  • the recognition module 1710 is used to perform character recognition on the image to be recognized based on the optical character recognition OCR model, so as to obtain the predicted text and the confidence of each predicted character in the predicted text.
  • the determination module 1720 is used to determine the character to be processed from each predicted character according to the confidence of each predicted character.
  • Masking module 1730 is used to mask the characters to be processed in the predicted text.
  • the prediction module 1740 is configured to use a prediction model to perform character prediction on the masked predicted text to obtain at least one replacement character corresponding to the character to be processed.
  • the replacement module 1750 is configured to determine a target character from at least one replacement character based on the similarity between the at least one replacement character and the character to be processed, and use the target character to replace the character to be processed in the predicted text to obtain the recognized text.
  • the text error correction device 1700 based on RPA and AI can be applied to RPA robots.
  • the mask module 1730 is used to: determine the target position of the character to be processed in the predicted text; obtain the mask character, and use the mask character to replace the target position in the predicted text to obtain the masked predicted text.
  • the prediction module 1740 is used to: input the masked predicted text into the prediction model, and use the prediction model to perform character prediction on the masked predicted text to obtain at least one target text; characters as at least one replacement character.
  • the prediction model is trained through the following modules:
  • Get module used to get sample text.
  • the masking module is also used to mask at least one sample character in the sample text to obtain the masked sample text.
  • the input module is used to input the masked sample text into the initial prediction model, so as to use the prediction model to perform character prediction on the masked sample text to obtain the output text.
  • the adjustment module is used to adjust the model parameters in the prediction model based on the difference between the sample text and the output text.
  • the adjustment module is configured to: generate a first loss value based on the difference in confidence distribution of characters between the sample text and the output text; determine at least one sample for masking the first position of the character in the sample text; generate a second loss value based on the difference between the sample character at the first position in the sample text and the output character at the first position in the output text; generate a second loss value based on the first loss value and The second loss value generates a target loss value; based on the target loss value, the model parameters in the prediction model are adjusted.
  • the input module is used to: obtain a setting dictionary; determine whether each character in the masked sample text is in the setting dictionary; If there is a first character that is not in the set dictionary, replace the first character in the masked sample text with a specific character; input the replaced sample text into the prediction model to use the prediction model to replace Perform character prediction on the final sample text to obtain the output text.
  • the adjustment module is configured to: determine the second position of the first character in the sample text; according to the difference between the specific character and the output character located at the second position in the output text The difference of , generates a third loss value; according to the first weight of the first loss value and the second weight of the second loss value, the first loss value and the second loss value are weighted to obtain the fourth loss value, where, The second weight is greater than the first weight; a target loss value is generated based on the difference between the fourth loss value and the third loss value.
  • the text error correction device 1700 based on RPA and AI may also include:
  • the first processing module is used to: encode at least one replacement character based on the set encoding algorithm to obtain the first encoding value of the at least one replacement character; encode the character to be processed based on the set encoding algorithm to obtain the to-be-processed character.
  • the second encoding value of the character determine the similarity between the at least one replacement character and the character to be processed based on the difference between the first encoding value and the second encoding value of the at least one replacement character.
  • the text error correction device 1700 based on RPA and AI may also include:
  • the second processing module is used to: for each replacement character, draw according to the replacement character to obtain the first image; draw according to the character to be processed to obtain the second image; and draw based on the similarity between the first image and the second image , determine the similarity between the replacement character and the character to be processed.
  • the text error correction device 1700 based on RPA and AI may also include:
  • the third processing module is used to: perform feature extraction on at least one replacement character to obtain a feature vector of at least one replacement character; perform feature extraction on the character to be processed to obtain a feature vector of the character to be processed; and based on the sum of the feature vectors of at least one replacement character and The similarity between the feature vectors of the characters to be processed determines the similarity between at least one replacement character and the character to be processed.
  • the recognition module 1710 is used to: use the feature extraction branch in the OCR model to extract features of the image to be recognized to obtain the first feature map; use the fusion in the OCR model Branch, fuse the first feature map and the position map to obtain the second feature map, where each element in the position map corresponds one-to-one to each element in the first feature map, and the elements in the position map are used to indicate the first The coordinates of the corresponding elements in the feature map in the image to be recognized; use the feature transformation branch in the OCR model to perform feature transformation on the second feature map to obtain the third feature map; use the prediction branch in the OCR model to transform the third feature map Decode to obtain the predicted text and the confidence corresponding to each predicted character in the predicted text.
  • the text error correction device 1700 based on RPA and AI may also include:
  • the fourth processing module is used to predict the angle of the image to be recognized, determine the tilt angle of the image to be recognized, and perform rotation processing on the image to be recognized based on the tilt angle.
  • the determination module 1720 is configured to: use predicted characters with a confidence level higher than a confidence threshold as characters to be processed; or, use predicted characters with the highest confidence level as characters to be processed. Process characters; or, sort each predicted character according to the confidence value from large to small, and select the target number of predicted characters ranked first as the characters to be processed, where the value of the target number is the same as the predicted text. Length is positively related.
  • the text error correction device based on RPA and AI in the embodiment of the present disclosure performs character recognition on the image to be recognized based on the OCR model to obtain the predicted text and the confidence of each predicted character in the predicted text; according to the confidence of each predicted character, Determine the characters to be processed from each predicted character; mask the characters to be processed in the predicted text, and use a prediction model to predict the characters of the masked predicted text to obtain at least one replacement character corresponding to the character to be processed; according to The similarity between at least one replacement character and the to-be-processed character is determined, the target character is determined from the at least one replacement character, and the target character is used to replace the to-be-processed character in the predicted text to obtain the recognized text.
  • the present disclosure also provides a training device. Since the training device provided by the embodiment of the present disclosure corresponds to the training method provided by the above-mentioned embodiment of FIG. 16, the implementation of the training method The method is also applicable to the training device provided in the embodiment of the present disclosure, and will not be described in detail in the embodiment of the present disclosure.
  • Figure 18 is a schematic structural diagram of a training device provided by an embodiment of the present disclosure.
  • the training device 1800 may include: an acquisition module 1810, a mask module 1820, an input module 1830, and an adjustment module 1840.
  • the acquisition module 1810 is used to acquire sample text.
  • the masking module 1820 is used to mask at least one sample character in the sample text to obtain a masked sample text.
  • the input module 1830 is used to input the masked sample text into the initial prediction model, so as to use the prediction model to perform character prediction on the masked sample text to obtain output text.
  • the adjustment module 1840 is used to adjust the model parameters in the prediction model based on the difference between the sample text and the output text.
  • the adjustment module 1840 is configured to: generate a first loss value based on the difference in confidence distribution of characters between the sample text and the output text; determine at least one masking method the first position of the sample character in the sample text; generating a second loss value based on the difference between the sample character at the first position in the sample text and the output character at the first position in the output text; based on the first loss value and the second loss value to generate a target loss value; adjust the model parameters in the prediction model according to the target loss value.
  • the input module 1830 is used to: obtain a setting dictionary; determine whether each character in the masked sample text is located in the setting dictionary; If there is a first character in the text that is not in the set dictionary, replace the first character in the masked sample text with a specific character; input the replaced sample text into the prediction model to use the prediction model to predict the text. Character prediction is performed on the replaced sample text to obtain the output text.
  • the adjustment module 1840 is configured to: determine the second position of the first character in the sample text; The difference between , generates a third loss value; according to the first weight of the first loss value and the second weight of the second loss value, the first loss value and the second loss value are weighted to obtain the fourth loss value, where , the second weight is greater than the first weight; a target loss value is generated based on the difference between the fourth loss value and the third loss value.
  • the text error correction device based on RPA and AI in the embodiment of the present disclosure obtains sample text; masks at least one sample character in the sample text to obtain a masked sample text; and inputs the masked sample text into
  • the initial prediction model uses the prediction model to predict the characters of the masked sample text to obtain the output text; based on the difference between the sample text and the output text, the model parameters in the prediction model are adjusted. Therefore, by training the prediction model in advance, the prediction effect of the prediction model can be improved.
  • An embodiment of the present disclosure also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor.
  • the processor executes the computer program, any one of the foregoing methods is implemented.
  • Embodiments of the present disclosure also provide a non-transitory computer-readable storage medium on which a computer program is stored.
  • the computer program is executed by a processor, the text correction based on RPA and AI as described in any of the foregoing method embodiments is implemented. wrong method, or implement the training method as described in any of the foregoing method embodiments.
  • Embodiments of the present disclosure also provide a computer program product.
  • the instruction processor in the computer program product is executed, the text error correction method based on RPA and AI as described in any of the foregoing method embodiments is implemented, or, the text error correction method based on RPA and AI is implemented.
  • the training method as described in any of the aforementioned method embodiments.
  • FIG. 19 illustrates a block diagram of an exemplary electronic device suitable for implementing embodiments of the present disclosure.
  • the electronic device 12 shown in FIG. 19 is only an example and should not bring any limitations to the functions and scope of use of the embodiments of the present disclosure.
  • electronic device 12 is embodied in the form of a general computing device.
  • Components of the electronic device 12 may include, but are not limited to: one or more processors or processing units 16, system memory 28, and a bus 18 connecting various system components (including memory 28 and processing unit 16).
  • Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a graphics accelerated port, a processor, or a local bus using any of a variety of bus structures.
  • these architectures include but are not limited to Industry Standard Architecture (hereinafter referred to as: ISA) bus, Micro Channel Architecture (Micro Channel Architecture; hereafter referred to as: MAC) bus, enhanced ISA bus, video electronics Standards Association (Video Electronics Standards Association; hereinafter referred to as: VESA) local bus and Peripheral Component Interconnection (hereinafter referred to as: PCI) bus.
  • ISA Industry Standard Architecture
  • MAC Micro Channel Architecture
  • VESA Video Electronics Standards Association
  • PCI Peripheral Component Interconnection
  • Electronic device 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by electronic device 12, including volatile and nonvolatile media, removable and non-removable media.
  • the memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (Random Access Memory; hereinafter referred to as: RAM) 30 and/or cache memory 32.
  • Electronic device 12 may further include other removable/non-removable, volatile/non-volatile computer system storage media.
  • storage system 34 may be used to read and write to non-removable, non-volatile magnetic media (not shown in Figure 19, commonly referred to as a "hard drive”).
  • a disk drive for reading and writing a removable non-volatile disk e.g., a "floppy disk”
  • a removable non-volatile optical disk e.g., a compact disk read-only memory
  • CD-ROM Compact Disc Read Only Memory
  • DVD-ROM Digital Video Disc Read Only Memory
  • Memory 28 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of embodiments of the present disclosure.
  • a program/utility 40 having a set of (at least one) program modules 42 may be stored, for example, in memory 28 , each of these examples or some combination may include the implementation of a network environment.
  • Program modules 42 generally perform functions and/or methods in the embodiments described in this disclosure.
  • Electronic device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), may also communicate with one or more devices that enable a user to interact with electronic device 12, and/or with Any device (eg, network card, modem, etc.) that enables the electronic device 12 to communicate with one or more other computing devices. This communication may occur through input/output (I/O) interface 22.
  • the electronic device 12 can also communicate with one or more networks (such as a local area network (Local Area Network; hereinafter referred to as: LAN), a wide area network (Wide Area Network; hereinafter referred to as: WAN)) and/or a public network, such as the Internet, through the network adapter 20 ) communication.
  • networks such as a local area network (Local Area Network; hereinafter referred to as: LAN), a wide area network (Wide Area Network; hereinafter referred to as: WAN)
  • a public network such as the Internet
  • network adapter 20 communicates with other modules of electronic device 12 via bus 18 .
  • other hardware and/or software modules may be used in conjunction with electronic device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives And data backup storage system, etc.
  • the processing unit 16 executes programs stored in the memory 28 to perform various functional applications and data processing, such as implementing the methods mentioned in the previous embodiments.
  • references to the terms “one embodiment,” “some embodiments,” “an example,” “specific examples,” or “some examples” or the like means that specific features are described in connection with the embodiment or example. , structures, materials, or features are included in at least one embodiment or example of the present disclosure. In this specification, the schematic expressions of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the specific features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, those skilled in the art may combine and combine different embodiments or examples and features of different embodiments or examples described in this specification unless they are inconsistent with each other.
  • first and second are used for descriptive purposes only and cannot be understood as indicating or implying relative importance or implicitly indicating the quantity of indicated technical features. Therefore, features defined as “first” and “second” may explicitly or implicitly include at least one of these features.
  • “plurality” means at least two, such as two, three, etc., unless otherwise expressly and specifically limited.
  • a "computer-readable medium” may be any device that can contain, store, communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
  • Non-exhaustive list of computer readable media include the following: electrical connections with one or more wires (electronic device), portable computer disk cartridges (magnetic device), random access memory (RAM), Read-only memory (ROM), erasable and programmable read-only memory (EPROM or flash memory), fiber optic devices, and portable compact disc read-only memory (CDROM).
  • the computer-readable medium may even be paper or other suitable medium on which the program may be printed, as the paper or other medium may be optically scanned, for example, and subsequently edited, interpreted, or otherwise suitable as necessary. process to obtain the program electronically and then store it in computer memory.
  • various parts of the present disclosure may be implemented in hardware, software, firmware, or combinations thereof.
  • various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system.
  • a suitable instruction execution system For example, if it is implemented in hardware, as in another embodiment, it can be implemented by any one of the following technologies known in the art or their combination: discrete logic gate circuits with logic functions for implementing data signals; Logic circuits, application specific integrated circuits with suitable combinational logic gates, programmable gate arrays (PGA), field programmable gate arrays (FPGA), etc.
  • the program can be stored in a computer-readable storage medium.
  • the program can be stored in a computer-readable storage medium.
  • each functional unit in various embodiments of the present disclosure may be integrated into one processing module, each unit may exist physically alone, or two or more units may be integrated into one module.
  • the above integrated modules can be implemented in the form of hardware or software function modules. If the integrated module is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer-readable storage medium.
  • the storage media mentioned above can be read-only memory, magnetic disks or optical disks, etc.

Abstract

A RPA and AI based text error correction method, comprising: on the basis of an OCR model, performing character recognition on an image to be recognized, so as to obtain a predicted text and a confidence coefficient of each predicted character in the predicted text; determining a character to be processed among the predicted characters according to the confidence coefficient of each predicted character; masking in the predicted text the character to be processed; performing character prediction on the masked predicted text by using a prediction model, so as to obtain at least one substitute character corresponding to the character to be processed; determining a target character among the at least one substitute character according to the similarity between the at least one substitute character and the character to be processed; and using the target character to substitute for the character to be processed in the predicted text so as to obtain a recognized text.

Description

基于RPA和AI的文本纠错方法、训练方法及其相关设备Text error correction methods, training methods and related equipment based on RPA and AI
相关申请的交叉引用Cross-references to related applications
本申请基于申请号为202210261001.6、申请日为2022年03月16日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。This application is filed based on a Chinese patent application with application number 202210261001.6 and a filing date of March 16, 2022, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is hereby incorporated by reference into this application.
技术领域Technical field
本公开涉及人工智能(Artificial Intelligence,简称AI)和机器人流程自动化(Robotic Process Automation,简称RPA)领域,尤其涉及一种基于RPA和AI的文本纠错方法、训练方法及其相关设备。The present disclosure relates to the fields of Artificial Intelligence (AI for short) and Robotic Process Automation (RPA for short), and in particular to a text error correction method, training method and related equipment based on RPA and AI.
背景技术Background technique
RPA是通过特定的“机器人软件”,模拟人在计算机上的操作,按规则自动执行流程任务。RPA uses specific "robot software" to simulate human operations on computers and automatically execute process tasks according to rules.
AI是研究、开发用于模拟、延伸和扩展人的智能的理论、方法、技术及应用系统的一门技术科学。AI is a technical science that studies and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence.
对于企业而言,可能经常需要对图像或PDF(Portable Document Format,可携带文档格式)文档中的字符信息进行识别并提取,比如,对图像中的发票信息、订单信息等进行识别并提取。相关技术中,可以基于光学字符识别(Optical Character Recognition,简称OCR)技术,提取图像或文档中的字符信息。For enterprises, it may often be necessary to identify and extract character information in images or PDF (Portable Document Format) documents, for example, to identify and extract invoice information, order information, etc. in images. In related technologies, character information in images or documents can be extracted based on Optical Character Recognition (OCR) technology.
然而,受限于图像大小,OCR识别结果的准确性可能无法保证,比如,在图像尺寸较小的情况下,可能将图像中的文本“账号”误识别为“帐号”。此外,在图像上盖有印章的情况下,OCR识别结果的准确性也无法保障,比如,可能将图像中的“车”字误识别为“军”。However, limited by the size of the image, the accuracy of the OCR recognition results may not be guaranteed. For example, when the image size is small, the text "account number" in the image may be mistakenly recognized as "account number". In addition, when the image is stamped with a seal, the accuracy of the OCR recognition results cannot be guaranteed. For example, the word "car" in the image may be mistakenly recognized as "military".
发明内容Contents of the invention
本公开旨在至少在一定程度上解决相关技术中的技术问题之一。The present disclosure aims to solve one of the technical problems in the related art, at least to a certain extent.
为此,本公开提出一种基于RPA和AI的文本纠错方法、训练方法及其相关设备,以实现在基于OCR技术,识别得到文本信息后,对文本信息中的字符进行纠正,从而提升文本信息识别结果的准确性和可靠性。To this end, the present disclosure proposes a text error correction method, training method and related equipment based on RPA and AI to correct the characters in the text information after identifying the text information based on OCR technology, thereby improving the text Accuracy and reliability of information identification results.
本公开第一方面实施例提出了一种基于RPA和AI的文本纠错方法,所述方法包括:The first embodiment of the present disclosure proposes a text error correction method based on RPA and AI. The method includes:
基于光学字符识别OCR模型,对待识别图像进行字符识别,以得到预测文本以及所述预测文本中各预测字符的置信度;Based on the optical character recognition OCR model, perform character recognition on the image to be recognized to obtain the predicted text and the confidence of each predicted character in the predicted text;
根据各所述预测字符的置信度,从各所述预测字符中确定待处理字符;Determine the character to be processed from each of the predicted characters according to the confidence of each of the predicted characters;
将所述预测文本中的所述待处理字符进行掩码,并采用预测模型对掩码后的预测文本进行字符预测,以得到所述待处理字符对应的至少一个替换字符;Mask the characters to be processed in the predicted text, and use a prediction model to perform character prediction on the masked predicted text to obtain at least one replacement character corresponding to the character to be processed;
根据所述至少一个替换字符与所述待处理字符的相似度,从所述至少一个替换字符中确定目标字符,并利用所述目标字符替换所述预测文本中的所述待处理字符,以得到识别文本。According to the similarity between the at least one replacement character and the character to be processed, a target character is determined from the at least one replacement character, and the target character is used to replace the character to be processed in the predicted text to obtain Identify text.
本公开第二方面实施例提出了一种用于对本公开第一方面实施例所述的预测模型进行训练的方法,包括:The second embodiment of the present disclosure provides a method for training the prediction model described in the first embodiment of the present disclosure, including:
获取样本文本;Get sample text;
对所述样本文本中的至少一个样本字符进行掩码,得到掩码后的样本文本;Mask at least one sample character in the sample text to obtain a masked sample text;
将所述掩码后的样本文本输入至初始的预测模型,以采用所述预测模型对所述掩码后的样本文本进行字符预测,得到输出文本;Input the masked sample text into an initial prediction model to use the prediction model to perform character prediction on the masked sample text to obtain output text;
根据所述样本文本和所述输出文本之间的差异,对所述预测模型中的模型参数进行调整。Model parameters in the prediction model are adjusted based on the difference between the sample text and the output text.
本公开第三方面实施例提出了一种基于RPA和AI的文本纠错装置,包括:The third embodiment of the present disclosure proposes a text error correction device based on RPA and AI, including:
识别模块,用于基于光学字符识别OCR模型,对待识别图像进行字符识别,以得到预测文本以及所述预测文本中各预测字符的置信度;A recognition module, configured to perform character recognition on the image to be recognized based on the optical character recognition OCR model, to obtain the predicted text and the confidence of each predicted character in the predicted text;
确定模块,用于根据各所述预测字符的置信度,从各所述预测字符中确定待处理字符;A determination module, configured to determine the characters to be processed from each of the predicted characters according to the confidence of each of the predicted characters;
掩码模块,用于将所述预测文本中的所述待处理字符进行掩码;A masking module, used to mask the characters to be processed in the predicted text;
预测模块,用于采用预测模型对掩码后的预测文本进行字符预测,以得到所述待处理字符对应的至 少一个替换字符;A prediction module, used to use a prediction model to perform character prediction on the masked predicted text to obtain at least one replacement character corresponding to the character to be processed;
替换模块,用于根据所述至少一个替换字符与所述待处理字符的相似度,从所述至少一个替换字符中确定目标字符,并利用所述目标字符替换所述预测文本中的所述待处理字符,以得到识别文本。a replacement module, configured to determine a target character from the at least one replacement character based on the similarity between the at least one replacement character and the to-be-processed character, and use the target character to replace the to-be-processed character in the predicted text. Process characters to get recognized text.
本公开第四方面实施例提出了一种用于对本公开第三方面实施例所述的预测模型进行训练的装置,包括:The fourth embodiment of the present disclosure provides a device for training the prediction model described in the third embodiment of the present disclosure, including:
获取模块,用于获取样本文本;Obtain module, used to obtain sample text;
掩码模块,用于对所述样本文本中的至少一个样本字符进行掩码,得到掩码后的样本文本;A masking module, used to mask at least one sample character in the sample text to obtain the masked sample text;
输入模块,用于将所述掩码后的样本文本输入至初始的预测模型,以采用所述预测模型对所述掩码后的样本文本进行字符预测,得到输出文本;An input module for inputting the masked sample text into an initial prediction model, so as to use the prediction model to perform character prediction on the masked sample text to obtain output text;
调整模块,用于根据所述样本文本和所述输出文本之间的差异,对所述预测模型中的模型参数进行调整。An adjustment module, configured to adjust model parameters in the prediction model based on the difference between the sample text and the output text.
本公开第五方面实施例提出了一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时,实现如本公开上述第一方面实施例所述的方法,或者,实现如本公开上述第二方面实施例所述的方法。The embodiment of the fifth aspect of the present disclosure provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, the present disclosure is implemented. The method described in the above-mentioned embodiment of the first aspect, or the method described in the above-mentioned embodiment of the second aspect of the present disclosure.
本公开第六方面实施例提出了一种非临时性计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如本公开上述第一方面实施例所述的方法,或者,实现如本公开上述第二方面实施例所述的方法。The sixth embodiment of the present disclosure provides a non-transitory computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the method described in the first embodiment of the disclosure is implemented. Alternatively, implement the method as described in the embodiment of the second aspect of the present disclosure.
本公开第七方面实施例提出了一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现如本公开上述第一方面实施例所述的方法,或者,实现如本公开上述第二方面实施例所述的方法。The seventh embodiment of the present disclosure proposes a computer program product, including a computer program. When executed by a processor, the computer program implements the method described in the first embodiment of the present disclosure, or implements the method as described in the first aspect of the present disclosure. The method described in the embodiment of the second aspect above.
本公开实施例所提供的技术方案包含如下的有益效果:The technical solutions provided by the embodiments of the present disclosure include the following beneficial effects:
通过基于OCR模型,对待识别图像进行字符识别,以得到预测文本以及预测文本中各预测字符的置信度;根据各预测字符的置信度,从各预测字符中确定待处理字符;将预测文本中的待处理字符进行掩码,并采用预测模型对掩码后的预测文本进行字符预测,以得到待处理字符对应的至少一个替换字符;根据至少一个替换字符与待处理字符的相似度,从至少一个替换字符中确定目标字符,并利用目标字符替换预测文本中的待处理字符,以得到识别文本。由此,在基于OCR技术,识别得到文本信息后,对文本信息中的字符进行纠正,可以提升文本信息识别结果的准确性和可靠性。此外,无需人工对文本信息中的字符进行纠正,可以释放人力资源,降低人工成本,提升该方法的适用性。Based on the OCR model, character recognition is performed on the image to be recognized to obtain the predicted text and the confidence of each predicted character in the predicted text; based on the confidence of each predicted character, the character to be processed is determined from each predicted character; the characters in the predicted text are The characters to be processed are masked, and a prediction model is used to predict the characters of the masked predicted text to obtain at least one replacement character corresponding to the character to be processed; based on the similarity between the at least one replacement character and the character to be processed, from at least one Determine the target character among the replacement characters, and use the target character to replace the character to be processed in the predicted text to obtain the recognized text. Therefore, after the text information is recognized based on OCR technology, the characters in the text information are corrected, which can improve the accuracy and reliability of the text information recognition results. In addition, there is no need to manually correct characters in text information, which can free up human resources, reduce labor costs, and improve the applicability of this method.
本公开附加的方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本公开的实践了解到。Additional aspects and advantages of the disclosure will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the disclosure.
附图说明Description of the drawings
本公开上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:The above and/or additional aspects and advantages of the present disclosure will become apparent and readily understood from the following description of the embodiments in conjunction with the accompanying drawings, in which:
图1为待识别图像示意图。Figure 1 is a schematic diagram of the image to be recognized.
图2为待识别图像示意图。Figure 2 is a schematic diagram of the image to be recognized.
图3为本公开实施例所提供的一种基于RPA和AI的文本纠错方法的流程示意图。FIG. 3 is a schematic flowchart of a text error correction method based on RPA and AI provided by an embodiment of the present disclosure.
图4为本公开实施例所提供的一种基于RPA和AI的文本纠错方法的流程示意图。Figure 4 is a schematic flowchart of a text error correction method based on RPA and AI provided by an embodiment of the present disclosure.
图5为本公开实施例所提供的一种基于RPA和AI的文本纠错方法的流程示意图。Figure 5 is a schematic flowchart of a text error correction method based on RPA and AI provided by an embodiment of the present disclosure.
图6为本公开实施例所提供的一种基于RPA和AI的文本纠错方法的流程示意图。Figure 6 is a schematic flowchart of a text error correction method based on RPA and AI provided by an embodiment of the present disclosure.
图7为本公开实施例所提供的一种基于RPA和AI的文本纠错方法的流程示意图。Figure 7 is a schematic flowchart of a text error correction method based on RPA and AI provided by an embodiment of the present disclosure.
图8为本公开实施例所提供的一种基于RPA和AI的文本纠错方法的流程示意图。Figure 8 is a schematic flowchart of a text error correction method based on RPA and AI provided by an embodiment of the present disclosure.
图9为本公开实施例所提供的一种基于RPA和AI的文本纠错方法的流程示意图。Figure 9 is a schematic flowchart of a text error correction method based on RPA and AI provided by an embodiment of the present disclosure.
图10为本公开实施例中的OCR识别流程示意图。Figure 10 is a schematic diagram of the OCR recognition process in an embodiment of the present disclosure.
图11为本公开实施例中文字行内容识别流程示意图。Figure 11 is a schematic flowchart of Chinese text line content recognition according to an embodiment of the present disclosure.
图12为本公开实施例中的预测模型的结构示意图。Figure 12 is a schematic structural diagram of a prediction model in an embodiment of the present disclosure.
图13为本公开实施例中的字符相似度示意图。Figure 13 is a schematic diagram of character similarity in an embodiment of the present disclosure.
图14为待识别图像示意图。Figure 14 is a schematic diagram of the image to be recognized.
图15为待识别图像示意图。Figure 15 is a schematic diagram of the image to be recognized.
图16为本公开实施例所提供的一种训练方法的流程示意图。Figure 16 is a schematic flowchart of a training method provided by an embodiment of the present disclosure.
图17为本公开实施例提供的一种基于RPA和AI的文本纠错装置的结构示意图。Figure 17 is a schematic structural diagram of a text error correction device based on RPA and AI provided by an embodiment of the present disclosure.
图18为本公开实施例提供的一种训练装置的结构示意图。Figure 18 is a schematic structural diagram of a training device provided by an embodiment of the present disclosure.
图19示出了适于用来实现本公开实施方式的示例性电子设备的框图。19 illustrates a block diagram of an exemplary electronic device suitable for implementing embodiments of the present disclosure.
具体实施方式Detailed ways
下面详细描述本公开的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本公开,而不能理解为对本公开的限制。Embodiments of the present disclosure are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals throughout represent the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the accompanying drawings are exemplary and intended to explain the present disclosure and are not to be construed as limitations of the present disclosure.
相关技术中,虽然光学字符识别(Optical Character Recognition,简称OCR)模型的综合指标(比如F1指标,其中,F1指标为精确率和召回率的调和平均数)已取得较高的指标值,但是,由于视觉深度学习的OCR模型的缺陷,在一些长尾问题上,依然存在识别结果的准确率不高的问题,典型场景包括以下几方面:In related technologies, although the comprehensive indicators (such as the F1 indicator, where the F1 indicator is the harmonic average of the precision rate and the recall rate) of the optical character recognition (Optical Character Recognition, OCR) model have achieved high index values, however, Due to the shortcomings of the OCR model of visual deep learning, there is still a problem of low accuracy of recognition results on some long-tail problems. Typical scenarios include the following aspects:
第一方面,污渍干扰,如红章、墨迹等;The first aspect is interference from stains, such as red stamps, ink marks, etc.;
第二方面,形似字,如“戍”与“戌、成、戊”;Secondly, the shape is similar to characters, such as "戍" and "戌,成,五";
第三方面,图像形变引起的字体形变。The third aspect is font deformation caused by image deformation.
比如,如图1所示,在图像中的字符过小的情况下,“账”字较易误识别为形似字“帐”;再比如,如图2所示,在图像上存在红章干扰时,图像中的“车”字较易误识别为“军”。For example, as shown in Figure 1, when the characters in the image are too small, it is easier to misidentify the word "Account" as the word "Account"; for another example, as shown in Figure 2, there is red seal interference on the image. At this time, the word "car" in the image is easily misidentified as "military".
针对上述问题,相关技术中,常用的解决方法主要包括以下几种:In response to the above problems, commonly used solutions in related technologies mainly include the following:
第一种,在OCR解码过程中加入语义信息,保证在解码时不仅采用了图像特征,还和前一个字符做了语义对齐。The first is to add semantic information during the OCR decoding process to ensure that not only image features are used during decoding, but also semantic alignment with the previous character is achieved.
第二种,加入预测语义信息的多任务。即在图像上通过马赛克的方式,随机遮挡字符,模型最终具有两路预测网络(即head网络),一路预测网络用于预测图像上显示的字符,另一路预测网络用于预测图像上遮挡的字符,由此,模型可以学习到基于语义的纠错信息。The second is to add multi-task prediction of semantic information. That is, characters are randomly blocked on the image through mosaic. The model finally has two prediction networks (i.e., head network). One prediction network is used to predict the characters displayed on the image, and the other prediction network is used to predict the characters blocked on the image. , whereby the model can learn semantic-based error correction information.
针对上述方式,发明人经过多次测试后发现:为了提升模型的预测效果,在大量合成的图像样本中,需要提升一些长尾字符的曝光次数,比如,一些生僻字需要在图像样本中出现足够多的次数,即需要在图像样本中增加包含生僻字的随机文本,并且,为了使得模型能够区分一些形似字,需要故意让多个形似字出现在一个图像样本中,这样将导致合成的图像样本中的文字缺乏语义连贯性。因此,上述第一种方式将不适用。In view of the above method, the inventor found after many tests that in order to improve the prediction effect of the model, in a large number of synthesized image samples, the number of exposures of some long-tail characters needs to be increased. For example, some rare characters need to appear enough in the image samples. A large number of times, that is, it is necessary to add random text containing rare words to the image sample, and in order to enable the model to distinguish some similar words, multiple similar words need to be deliberately allowed to appear in one image sample, which will result in a synthetic image sample The text in lacks semantic coherence. Therefore, the first way above will not apply.
此外,在真实的图像样本中,为了减少样本的标注成本,一般不标注单个字符的位置,由此将导致无法在真实的图像样本中较为准确地掩码(mask)某个位置的字符,因此,上述第二种方式将不适用。In addition, in real image samples, in order to reduce the labeling cost of the sample, the position of a single character is generally not labeled. This will result in the inability to accurately mask the character at a certain position in the real image sample. Therefore, , the second method above will not be applicable.
针对上述问题,本公开提出了一种基于RPA和AI的文本纠错方法、训练方法及其相关设备。In response to the above problems, this disclosure proposes a text error correction method, training method and related equipment based on RPA and AI.
下面参考附图描述本公开实施例的基于RPA和AI的文本纠错方法、训练方法及其相关设备。在具体描述本公开实施例之前,为了便于理解,首先对常用技术词进行介绍:The text error correction method, training method and related equipment based on RPA and AI according to the embodiments of the present disclosure will be described below with reference to the accompanying drawings. Before describing the embodiments of the present disclosure in detail, in order to facilitate understanding, common technical terms are first introduced:
“RPA”,是机器人流程自动化(Robotic Process Automation)的简称,是为企业和个人提供专业全面的流程自动化解决方案。RPA是通过特定的“机器人软件”,模拟人在计算机上的操作,按规则自动执行流程任务。即RPA机器人可通过模拟用户的鼠标键盘操作,快速、准确的收集用户操作界面的数据,基于清晰的逻辑规则处理这些数据,再快速而准确地录入到另外一个系统或界面。由此,可以大幅降低人力成本的投入,有效提高现有办公效率,准确、稳定、快捷地完成工作。"RPA", the abbreviation of Robotic Process Automation, provides professional and comprehensive process automation solutions for enterprises and individuals. RPA uses specific "robot software" to simulate human operations on computers and automatically execute process tasks according to rules. That is, the RPA robot can quickly and accurately collect data from the user operation interface by simulating the user's mouse and keyboard operations, process the data based on clear logical rules, and then quickly and accurately input it into another system or interface. As a result, labor cost investment can be significantly reduced, existing office efficiency can be effectively improved, and work can be completed accurately, stably, and quickly.
“AI”是人工智能(Artificial Intelligence)的简称,是研究、开发用于模拟、延伸和扩展人的智能的理论、方法、技术及应用系统的一门技术科学。AI是研究使计算机来模拟人的某些思维过程和智能行为(如学习、推理、思考、规划等)的学科,既有硬件层面的技术也有软件层面的技术。AI硬件技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理等技术;AI软件技术主要包括计算机视觉技术、语音识别技术、自然语言处理(Natural Language Processing,简称NLP)技术以及机器学习/ 深度学习、大数据处理技术、知识图谱技术等几大方向。"AI" is the abbreviation of Artificial Intelligence. It is a technical science that researches and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence. AI is the study of using computers to simulate certain human thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.). It has both hardware-level technology and software-level technology. AI hardware technology generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, and big data processing; AI software technology mainly includes computer vision technology, speech recognition technology, and Natural Language Processing (NLP). technology and machine learning/deep learning, big data processing technology, knowledge graph technology and other major directions.
“OCR”,是指电子设备检查纸上打印的字符,通过检测暗、亮的模式确定其形状,然后用字符识别方法将形状翻译成计算机文字的过程;即,针对印刷体字符,采用光学的方式将纸质文档中的文字转换成为黑白点阵的图像文件,并通过识别软件将图像中的文字转换成文本格式,供文字处理软件进一步编辑加工的技术。"OCR" refers to the process in which electronic equipment checks characters printed on paper, determines their shape by detecting dark and light patterns, and then uses character recognition methods to translate the shape into computer text; that is, for printed characters, optical This method converts the text in the paper document into a black and white dot matrix image file, and uses recognition software to convert the text in the image into a text format for further editing and processing by word processing software.
“OCR模型”,为预先经过训练的模型,该OCR模型已经学习得到输入图像与输出文本之间的对应关系。"OCR model" is a pre-trained model that has learned the correspondence between the input image and the output text.
“待识别图像”,是指任意一个需要进行识别的图像,比如,该待识别图像可以为包含发票信息的图像、包含订单信息的图像等等。"Image to be recognized" refers to any image that needs to be recognized. For example, the image to be recognized can be an image containing invoice information, an image containing order information, etc.
“预测文本”,是指OCR模型对待识别图像进行字符识别,得到的文本信息或OCR识别结果。"Predicted text" refers to the text information or OCR recognition results obtained by the OCR model for character recognition of the image to be recognized.
“置信度”,又称为识别概率或分类概率,是指OCR模型输出的概率值。"Confidence", also known as recognition probability or classification probability, refers to the probability value output by the OCR model.
“预测模型”,是指经过训练后得到的模型,该预测模型用于对输入的文本进行字符预测。"Prediction model" refers to a model obtained after training. The prediction model is used to predict characters of input text.
“掩码字符”,是指用于对文本信息进行掩码的字符,该掩码字符可以为预先设定的字符,或者,该掩码字符也可以为随机字符,比如可从设定字典中随机获取一个字符,作为掩码字符。"Mask character" refers to the character used to mask text information. The mask character can be a preset character, or the mask character can also be a random character. For example, it can be selected from a set dictionary. Get a random character as a mask character.
其中,“设定字典”,是指预先设定的字典,例如,设定字典中可以包括常见或常用的各个字符。比如,以设定字典中包括汉字字符进行示例,该设定字典中可以包括3900个常见的汉字字符。The "set dictionary" refers to a preset dictionary. For example, the set dictionary may include common or commonly used characters. For example, taking the setting dictionary to include Chinese characters as an example, the setting dictionary can include 3900 common Chinese characters.
“目标文本”,是指预测模型输出的文本信息。"Target text" refers to the text information output by the prediction model.
“样本文本”,是指用于对预测模型进行训练的文本信息。"Sample text" refers to the text information used to train the prediction model.
“特定字符”,是指预先设定的特殊字符,比如,该特定字符可以为OOV(Out of Vocabulary,在词表外)的字符,或者,也可以为其他的字符,本公开对此并不做限制。"Specific characters" refer to preset special characters. For example, the specific characters can be OOV (Out of Vocabulary, outside the vocabulary) characters, or they can also be other characters. This disclosure does not Make restrictions.
“设定编码算法”,是指预先设定的编码算法,比如,该设定编码算法可以为四角编码算法,或者,也可以为其他编码算法,本公开对此并不做限制。"Set encoding algorithm" refers to a preset encoding algorithm. For example, the set encoding algorithm can be a four-corner encoding algorithm, or it can also be other encoding algorithms, and this disclosure does not limit this.
“置信度阈值”,是指预先设定的阈值。"Confidence threshold" refers to a preset threshold.
图3为本公开实施例所提供的一种基于RPA和AI的文本纠错方法的流程示意图。FIG. 3 is a schematic flowchart of a text error correction method based on RPA and AI provided by an embodiment of the present disclosure.
本公开实施例提供的一种可能的实现方式中,本公开以该基于RPA和AI的文本纠错方法被配置于文本纠错装置中来举例说明,该文本纠错装置可以应用于任一具有计算能力的电子设备中。In a possible implementation provided by the embodiments of the present disclosure, the present disclosure takes as an example that the text error correction method based on RPA and AI is configured in a text error correction device. The text error correction device can be applied to any device with computing power in electronic devices.
其中,该电子设备可以是个人电脑、移动终端等,移动终端例如为手机、平板电脑、个人数字助理等具有各种操作系统的硬件设备。The electronic device may be a personal computer, a mobile terminal, etc. The mobile terminal is, for example, a mobile phone, a tablet computer, a personal digital assistant and other hardware devices with various operating systems.
在本公开实施例的另一种可能的实现方式中,该基于RPA和AI的文本纠错方法,可应用于RPA机器人,其中,该RPA机器人可以运行在任一具有计算能力的电子设备中。In another possible implementation of the embodiment of the present disclosure, the text error correction method based on RPA and AI can be applied to an RPA robot, where the RPA robot can run in any electronic device with computing capabilities.
如图3所示,该基于RPA和AI的文本纠错方法可以包括以下步骤:As shown in Figure 3, the text error correction method based on RPA and AI can include the following steps:
步骤101,基于OCR模型,对待识别图像进行字符识别,以得到预测文本以及预测文本中各预测字符的置信度。Step 101: Based on the OCR model, perform character recognition on the image to be recognized to obtain the predicted text and the confidence of each predicted character in the predicted text.
在本公开实施例中,待识别图像可以为直接获取的,比如,可以直接获取包含文字信息的待识别图像。或者,待识别图像也可以为间接获取的,比如,可以获取包含文字信息的PDF、PSD(PSD是Adobe公司的图形设计软件Photoshop的专用格式)等格式的文档,从上述文档中提取或截取包含文字信息的待识别图像。In the embodiment of the present disclosure, the image to be recognized may be directly acquired. For example, the image to be recognized containing text information may be directly acquired. Alternatively, the image to be recognized can also be obtained indirectly. For example, documents in PDF, PSD (PSD is a special format of Adobe's graphic design software Photoshop) containing text information can be obtained, and the documents containing text information can be extracted or intercepted from the above documents. The image of text information to be recognized.
其中,待识别图像的格式可以为JPG(或JPEG(Joint Photographic Experts Group,联合图像专家组))、PNG(Portable Network Graphics,便携式网络图形)等图像格式,本公开对此并不做限制。Among them, the format of the image to be recognized can be JPG (or JPEG (Joint Photographic Experts Group, Joint Photographic Experts Group)), PNG (Portable Network Graphics, portable network graphics) and other image formats, and this disclosure does not limit this.
在本公开实施例中,可以采用OCR模型对待识别图像进行字符识别,得到预测文本以及预测文本中各预测字符的置信度。In the embodiment of the present disclosure, the OCR model can be used to perform character recognition on the image to be recognized, and the predicted text and the confidence of each predicted character in the predicted text can be obtained.
步骤102,根据各预测字符的置信度,从各预测字符中确定待处理字符。Step 102: Determine the character to be processed from each predicted character based on the confidence level of each predicted character.
在本公开实施例中,可以根据各预测字符的置信度,从各预测字符中确定待处理字符,其中,待处理字符的个数可以为至少一个。In this embodiment of the present disclosure, characters to be processed may be determined from each predicted character according to the confidence level of each predicted character, where the number of characters to be processed may be at least one.
在本公开实施例的一种可能的实现方式中,可以将各预测字符的置信度与设定的置信度阈值进行比对,并将置信度高于置信度阈值的预测字符,作为待处理字符。In a possible implementation of the embodiment of the present disclosure, the confidence of each predicted character can be compared with a set confidence threshold, and the predicted characters with a confidence higher than the confidence threshold can be used as characters to be processed. .
在本公开实施例的另一种可能的实现方式中,可以将置信度最大的预测字符,作为待处理字符。In another possible implementation of the embodiment of the present disclosure, the predicted character with the highest confidence level may be used as the character to be processed.
在本公开实施例的又一种可能的实现方式中,可以将各预测字符按照置信度的取值由大至小排序,选取排序在前的目标个数的预测字符,作为待处理字符。In another possible implementation of the embodiment of the present disclosure, the predicted characters can be sorted from large to small according to the confidence value, and the target number of predicted characters ranked first are selected as the characters to be processed.
其中,目标个数的取值与预测文本的长度正相关,即预测文本的长度越长,目标个数的取值越大。Among them, the value of the number of targets is positively related to the length of the predicted text, that is, the longer the length of the predicted text, the greater the value of the number of targets.
作为一种示例,考虑到在大多数OCR识别的错误场景中,一个语句仅有一个错误字符,因此,本公开中,可以确定预测文本所包含的语句个数,根据语句个数确定目标个数的取值。其中,目标个数与语句个数为正向关系,即语句个数越多,目标个数的取值越大,反之,语句个数越少,目标个数的取值越小。As an example, considering that in most error scenarios recognized by OCR, a sentence has only one wrong character, therefore, in this disclosure, the number of sentences contained in the predicted text can be determined, and the target number is determined based on the number of sentences. value. Among them, the number of targets and the number of statements are positively related, that is, the more the number of statements, the greater the value of the target number, and conversely, the fewer the number of statements, the smaller the value of the target number.
综上,可以基于不同方式,从各预测字符中确定待处理字符,可以提升该方法的灵活性和适用性。In summary, the characters to be processed can be determined from each predicted character based on different methods, which can improve the flexibility and applicability of the method.
步骤103,将预测文本中的待处理字符进行掩码,并采用预测模型对掩码后的预测文本进行字符预测,以得到待处理字符对应的至少一个替换字符。Step 103: Mask the characters to be processed in the predicted text, and use a prediction model to perform character prediction on the masked predicted text to obtain at least one replacement character corresponding to the character to be processed.
在本公开实施例中,可以将预测文本中的待处理字符进行掩码,得到掩码后的预测文本,并将掩码后的预测文本输入至预测模型,以采用预测模型对掩码后的预测文本进行字符预测,得到待处理字符对应的至少一个替换字符。In the embodiment of the present disclosure, the characters to be processed in the predicted text can be masked to obtain the masked predicted text, and the masked predicted text can be input into the prediction model to use the prediction model to predict the masked text. Predict text to perform character prediction and obtain at least one replacement character corresponding to the character to be processed.
步骤104,根据至少一个替换字符与待处理字符的相似度,从至少一个替换字符中确定目标字符,并利用目标字符替换预测文本中的待处理字符,以得到识别文本。Step 104: Determine the target character from the at least one replacement character based on the similarity between the at least one replacement character and the character to be processed, and use the target character to replace the character to be processed in the predicted text to obtain the recognized text.
在本公开实施例中,可以计算各替换字符与待处理字符之间的相似度,根据各替换字符与待处理字符的相似度,从各替换字符中确定目标字符,并利用该目标字符替换预测文本中的待处理字符,以得到识别文本。In the embodiment of the present disclosure, the similarity between each replacement character and the character to be processed can be calculated, and the target character is determined from each replacement character based on the similarity between each replacement character and the character to be processed, and the target character is used to predict the replacement The characters to be processed in the text to get the recognized text.
其中,相似度可以为字符相似度(或字符形状相似度)。也就是说,本公开中,考虑到OCR识别错误的原因主要为:形似字或者字符存在干扰,因此,本公开中,可以计算各替换字符与待处理字符之间的字符相似度(或字符形状相似度),根据字符相似度(或字符形状相似度),从各替换字符中确定目标字符,并利用该目标字符替换预测文本中的待处理字符。Wherein, the similarity may be character similarity (or character shape similarity). That is to say, in this disclosure, it is considered that the main reasons for OCR recognition errors are: similar characters or interference between characters. Therefore, in this disclosure, the character similarity (or character shape) between each replacement character and the character to be processed can be calculated. Similarity), according to the character similarity (or character shape similarity), determine the target character from each replacement character, and use the target character to replace the character to be processed in the predicted text.
作为一种可能的实现方式,可以基于设定编码算法,对各替换字符进行编码,得到各替换字符的第一编码值,并基于设定编码算法,对待处理字符进行编码,得到待处理字符的第二编码值,从而在本公开中,可以基于各替换字符的第一编码值和第二编码值之间的差异,确定各替换字符与待处理字符的相似度。As a possible implementation method, each replacement character can be encoded based on the set encoding algorithm to obtain the first encoding value of each replacement character, and based on the set encoding algorithm, the character to be processed can be encoded to obtain the value of the character to be processed. The second encoding value, so in the present disclosure, the similarity between each replacement character and the character to be processed can be determined based on the difference between the first encoding value and the second encoding value of each replacement character.
作为一种示例,以字符为中文字符进行示例性说明,设定编码算法可以四角编码算法,可以理解的是,字形越相似的中文字符,四角编码值越接近,基于上述特性,可以计算各替换字符的第一编码值和第二编码值之间的差异,根据各替换字符的第一编码值和第二编码值之间的差异,确定各替换字符与待处理字符的相似度。其中,相似度与差异为反向关系,即差异越小,相似度越高,反之,差异越大,相似度越低。As an example, let's take Chinese characters as illustrative examples. The encoding algorithm can be a four-corner encoding algorithm. It can be understood that the more similar the glyphs of Chinese characters are, the closer the four-corner encoding values will be. Based on the above characteristics, each replacement can be calculated. The difference between the first encoding value and the second encoding value of the character determines the similarity between each replacement character and the character to be processed based on the difference between the first encoding value and the second encoding value of each replacement character. Among them, similarity and difference have an inverse relationship, that is, the smaller the difference, the higher the similarity, and conversely, the larger the difference, the lower the similarity.
需要说明的是,上述仅以设定编码算法为四角编码算法进行示例,实际应用时,设定编码算法也可以为其他编码算法,本公开对此并不做限制。It should be noted that the above example only uses the four-corner encoding algorithm as the set encoding algorithm. In actual application, the set encoding algorithm can also be other encoding algorithms, and this disclosure does not limit this.
作为另一种可能的实现方式,针对每个替换字符,可以根据该替换字符进行绘制,得到第一图像,比如,可以在空白的图像上绘制替换字符,得到第一图像,并根据待处理字符进行绘制,得到第二图像,比如,同样可以在空白的图像上绘制待处理字符,得到第二图像。在本公开中,可以计算第一图像和第二图像之间的相似度,根据第一图像和第二图像之间的相似度,确定该替换字符与待处理字符之间的相似度。As another possible implementation, for each replacement character, the replacement character can be drawn according to the replacement character to obtain the first image. For example, the replacement character can be drawn on a blank image to obtain the first image, and the first image can be obtained according to the character to be processed. Draw to obtain the second image. For example, you can also draw the character to be processed on a blank image to obtain the second image. In the present disclosure, the similarity between the first image and the second image can be calculated, and the similarity between the replacement character and the character to be processed is determined based on the similarity between the first image and the second image.
其中,该替换字符与待处理字符之间的相似度,与第一图像和第二图像之间的相似度成正向关系,即第一图像和第二图像之间的相似度越高,该替换字符与待处理字符之间的相似度越高,反之,第一图像和第二图像之间的相似度越低,该替换字符与待处理字符之间的相似度越低。Wherein, the similarity between the replacement character and the character to be processed has a positive relationship with the similarity between the first image and the second image, that is, the higher the similarity between the first image and the second image, the higher the similarity between the replacement character and the character to be processed. The higher the similarity between the character and the character to be processed, and conversely, the lower the similarity between the first image and the second image, the lower the similarity between the replacement character and the character to be processed.
作为又一种可能的实现方式,针对每个替换字符,可以基于特征提取算法,对该替换字符进行特征提取,得到该替换字符对应的特征向量,并对待处理字符进行特征提取,得到待处理字符的特征向量。在本公开中,可以根据该替换字符的特征向量和待处理字符的特征向量之间的相似度,确定该替换字符与待处理字符的相似度。As another possible implementation method, for each replacement character, feature extraction can be performed on the replacement character based on the feature extraction algorithm to obtain the feature vector corresponding to the replacement character, and feature extraction is performed on the character to be processed to obtain the character to be processed. eigenvector. In the present disclosure, the similarity between the replacement character and the character to be processed may be determined based on the similarity between the feature vector of the replacement character and the feature vector of the character to be processed.
其中,替换字符与待处理字符的相似度,与该替换字符的特征向量和待处理字符的特征向量之间的相似度成正向关系。Among them, the similarity between the replacement character and the character to be processed is positively related to the similarity between the feature vector of the replacement character and the feature vector of the character to be processed.
需要说明的是,上述相似度的计算方式仅是示例性说明,但本公开并不限于此,即本公开对相似度的计算方式不作限定,只要能够得到各替换字符与待处理字符的相似度即可。It should be noted that the above calculation method of similarity is only an illustrative explanation, but the present disclosure is not limited thereto, that is, the present disclosure does not limit the calculation method of similarity, as long as the similarity between each replacement character and the character to be processed can be obtained That’s it.
在本公开实施例中,在确定替换字符与待处理字符之间的相似度后,可以根据各替换字符与待处理字符的相似度,从各替换字符中确定目标字符,比如,可以将相似度最大的替换字符,作为目标字符,从而可以利用该目标字符替换预测文本中的待处理字符,以得到识别文本。In the embodiment of the present disclosure, after determining the similarity between the replacement character and the character to be processed, the target character can be determined from each replacement character according to the similarity between the replacement character and the character to be processed. For example, the similarity can be The largest replacement character is used as the target character, so that the target character can be used to replace the character to be processed in the predicted text to obtain the recognized text.
举例而言,以预测文本为“花园的乐南角有颗桃花树”,待处理字符为“乐”进行示例,利用掩码字符“开”,对“乐”字进行掩码后,得到的掩码后的预测文本为“花园的开南角有颗桃花树”,预测模型对掩码后的预测文本进行字符预测后,得到的替换字符为“东”和“西”,由于“东”和“乐”之间的字符相似度高于“西”和“乐”之间的字符相似度,因此,可以将“东”作为目标字符,利用该目标字符替换预测文本中的“待处理字符”,得到的识别文本为“花园的东南角有颗桃花树”。For example, let's take the predicted text as "There is a peach blossom tree in the south corner of the garden" and the character to be processed is "乐". After using the mask character "开" to mask the word "乐", the masked result obtained is The coded predicted text is "There is a peach blossom tree in the southern corner of the garden." After the prediction model performs character prediction on the masked predicted text, the replacement characters obtained are "East" and "West". Since "East" and " The character similarity between "乐" and "乐" is higher than the character similarity between "西" and "乐". Therefore, "东" can be used as the target character, and the target character can be used to replace the "to-be-processed character" in the predicted text. The obtained recognition text is "There is a peach blossom tree in the southeast corner of the garden."
本公开实施例的基于RPA和AI的文本纠错方法,通过基于OCR模型,对待识别图像进行字符识别,以得到预测文本以及预测文本中各预测字符的置信度;根据各预测字符的置信度,从各预测字符中确定待处理字符;将预测文本中的待处理字符进行掩码,并采用预测模型对掩码后的预测文本进行字符预测,以得到待处理字符对应的至少一个替换字符;根据至少一个替换字符与待处理字符的相似度,从至少一个替换字符中确定目标字符,并利用目标字符替换预测文本中的待处理字符,以得到识别文本。由此,在基于OCR技术,识别得到文本信息后,对文本信息中的字符进行纠正,可以提升文本信息识别结果的准确性和可靠性。此外,无需人工对文本信息中的字符进行纠正,可以释放人力资源,降低人工成本,提升该方法的适用性。The text error correction method based on RPA and AI in the embodiment of the present disclosure performs character recognition on the image to be recognized based on the OCR model to obtain the predicted text and the confidence of each predicted character in the predicted text; according to the confidence of each predicted character, Determine the characters to be processed from each predicted character; mask the characters to be processed in the predicted text, and use a prediction model to predict the characters of the masked predicted text to obtain at least one replacement character corresponding to the character to be processed; according to The similarity between at least one replacement character and the to-be-processed character is determined, the target character is determined from the at least one replacement character, and the target character is used to replace the to-be-processed character in the predicted text to obtain the recognized text. Therefore, after the text information is recognized based on OCR technology, the characters in the text information are corrected, which can improve the accuracy and reliability of the text information recognition results. In addition, there is no need to manually correct characters in text information, which can free up human resources, reduce labor costs, and improve the applicability of this method.
为了清楚说明本公开上述实施例中是如何得到待处理字符对应的至少一个替换字符的,本公开还提出一种基于RPA和AI的文本纠错方法。In order to clearly explain how to obtain at least one replacement character corresponding to the character to be processed in the above embodiments of the present disclosure, the present disclosure also proposes a text error correction method based on RPA and AI.
图4为本公开实施例所提供的另一种基于RPA和AI的文本纠错方法的流程示意图。Figure 4 is a schematic flowchart of another text error correction method based on RPA and AI provided by an embodiment of the present disclosure.
如图4所示,该基于RPA和AI的文本纠错方法可以包括以下步骤:As shown in Figure 4, the text error correction method based on RPA and AI can include the following steps:
步骤201,基于OCR模型,对待识别图像进行字符识别,以得到预测文本以及预测文本中各预测字符的置信度。Step 201: Based on the OCR model, perform character recognition on the image to be recognized to obtain the predicted text and the confidence of each predicted character in the predicted text.
步骤202,根据各预测字符的置信度,从各预测字符中确定待处理字符。Step 202: Determine the character to be processed from each predicted character based on the confidence level of each predicted character.
步骤201至202的执行过程可以参见本公开任一实施例的执行过程,在此不做赘述。The execution process of steps 201 to 202 can refer to the execution process of any embodiment of the present disclosure, and will not be described again here.
步骤203,确定待处理字符在预测文本中的目标位置。Step 203: Determine the target position of the character to be processed in the predicted text.
在本公开实施例中,可以确定待处理字符在预测文本中所处的位置,本公开中记为目标位置。In the embodiment of the present disclosure, the position of the character to be processed in the predicted text can be determined, which is recorded as the target position in the present disclosure.
比如,以预测文本为“花园的乐南角有颗桃花树”,待处理字符为“乐”进行示例,目标位置可以为:第四个字符位置。For example, if the predicted text is "There is a peach blossom tree in the southern corner of the garden" and the character to be processed is "乐", the target position can be: the fourth character position.
步骤204,获取掩码字符,并利用掩码字符替换预测文本中目标位置处的待处理字符,以得到掩码后的预测文本。Step 204: Obtain the masked character and replace the character to be processed at the target position in the predicted text with the masked character to obtain the masked predicted text.
在本公开实施例中,掩码字符可以为预设的固定字符,或者,掩码字符也可以为随机字符,本公开对此并不做限制。In the embodiment of the disclosure, the mask characters may be preset fixed characters, or the mask characters may be random characters, which the disclosure does not limit.
在本公开实施例中,可以利用掩码字符替换预测文本中目标位置处的待处理字符,得到掩码后的预测文本。In the embodiment of the present disclosure, the masked character can be used to replace the character to be processed at the target position in the predicted text to obtain the masked predicted text.
仍以上述例子进行示例,假设掩码字符为“开”,则可以利用掩码字符“开”,对“乐”字进行掩码,得到掩码后的预测文本为“花园的开南角有颗桃花树”。Still using the above example, assuming that the mask character is "开", you can use the mask character "开" to mask the word "乐", and get the predicted text after masking as "There is a star in the south corner of the garden." Peach Blossom Tree".
步骤205,将掩码后的预测文本输入至预测模型,以采用预测模型对掩码后的预测文本进行字符预测,以得到至少一个目标文本。Step 205: Input the masked predicted text into the prediction model, so as to use the prediction model to perform character prediction on the masked predicted text to obtain at least one target text.
在本公开实施例中,可以将掩码后的预测文本输入至预测模型,以由预测模型对掩码后的预测文本进行字符预测,以得到至少一个目标文本。也就是说,本公开中,预测模型可以采用与机器翻译类似的方式,预测整个文本中的所有字符,得到至少一个目标文本。In the embodiment of the present disclosure, the masked predicted text can be input to the prediction model, so that the prediction model performs character prediction on the masked predicted text to obtain at least one target text. That is to say, in this disclosure, the prediction model can use a similar method to machine translation to predict all characters in the entire text and obtain at least one target text.
仍以上述例子进行示例,预测模型输出的目标文本可以为“花园的东南角有颗桃花树”和“花园的西南 角有颗桃花树”。Still using the above example, the target text output by the prediction model can be "There is a peach blossom tree in the southeast corner of the garden" and "There is a peach blossom tree in the southwest corner of the garden."
步骤206,将至少一个目标文本中目标位置处的字符,作为至少一个替换字符。Step 206: Use at least one character at the target position in the target text as at least one replacement character.
在本公开实施例中,可以将至少一个目标文本中目标位置处的字符,作为至少一个替换字符。In this embodiment of the present disclosure, a character at a target position in at least one target text may be used as at least one replacement character.
仍以上述例子进行示例,可以将各目标文本中第四个字符位置处的字符,作为替换字符,比如,替换字符可以为“东”和“西”。Still using the above example, the character at the fourth character position in each target text can be used as the replacement character. For example, the replacement characters can be "East" and "West".
步骤207,根据至少一个替换字符与待处理字符的相似度,从至少一个替换字符中确定目标字符,并利用目标字符替换预测文本中的待处理字符,以得到识别文本。Step 207: Determine the target character from the at least one replacement character based on the similarity between the at least one replacement character and the character to be processed, and use the target character to replace the character to be processed in the predicted text to obtain the recognized text.
步骤207的执行过程可以参见本公开任一实施例的执行过程,在此不做赘述。The execution process of step 207 can be referred to the execution process of any embodiment of the present disclosure, and will not be described again.
本公开实施例的基于RPA和AI的文本纠错方法,通过确定待处理字符在预测文本中的目标位置;获取掩码字符,并利用掩码字符替换预测文本中目标位置处的待处理字符,以得到掩码后的预测文本;将掩码后的预测文本输入至预测模型,以采用预测模型对掩码后的预测文本进行字符预测,以得到至少一个目标文本;将至少一个目标文本中目标位置处的字符,作为至少一个替换字符。由此,采用深度学习技术,预测至少一个替换字符,可以提升预测结果的准确性和可靠性。The text error correction method based on RPA and AI in the embodiment of the present disclosure determines the target position of the character to be processed in the predicted text; obtains the mask character, and uses the mask character to replace the character to be processed at the target position in the predicted text, to obtain the masked predicted text; input the masked predicted text into the prediction model, and use the prediction model to perform character prediction on the masked predicted text to obtain at least one target text; convert at least one target text into the target The character at position, as at least one replacement character. Therefore, using deep learning technology to predict at least one replacement character can improve the accuracy and reliability of prediction results.
为了清楚说明上述实施例中的预测模型是如何训练得到的,本公开还提出一种文本纠错方法。In order to clearly explain how the prediction model in the above embodiment is trained, the present disclosure also proposes a text error correction method.
图5为本公开实施例所提供的另一种基于RPA和AI的文本纠错方法的流程示意图。Figure 5 is a schematic flowchart of another text error correction method based on RPA and AI provided by an embodiment of the present disclosure.
如图5所示,在4所示实施例的基础上,该文本纠错方法还可以包括以下步骤:As shown in Figure 5, based on the embodiment shown in Figure 4, the text error correction method may also include the following steps:
步骤301,获取样本文本。Step 301: Obtain sample text.
在本公开实施例中,样本文本可以从现有的训练集获取,或者,样本文本也可以在线采集,比如可以通过网络爬虫技术,在线采集样本文本,或者,样本文本也可以线下采集,比如可以对纸质的文本内容进行图像采集,之后通过OCR技术,识别图像中的各个字符,以得到样本文本,或者,样本文本也可以人工合成,等等,本公开实施例对此并不做限制。In the embodiment of the present disclosure, the sample text can be obtained from the existing training set, or the sample text can also be collected online, for example, the sample text can be collected online through web crawler technology, or the sample text can also be collected offline, such as Images of paper text content can be collected, and then each character in the image can be identified through OCR technology to obtain sample text, or the sample text can also be artificially synthesized, etc. The embodiments of the present disclosure are not limited to this .
步骤302,对样本文本中的至少一个样本字符进行掩码,得到掩码后的样本文本。Step 302: Mask at least one sample character in the sample text to obtain a masked sample text.
在本公开实施例中,可以采用掩码字符,对样本文本中的至少一个样本字符进行掩码,得到掩码后的样本文本。In the embodiment of the present disclosure, mask characters may be used to mask at least one sample character in the sample text to obtain a masked sample text.
作为一种示例,可以以设定的第一随机概率,将样本文本中的至少一个样本字符替换为随机字符,和/或,以设定的第二随机概率,将样本文本中的至少一个样本字符替换为固定字符。其中,第一随机概率与第二随机概率可以相同,或者也可以不同,本公开对此并不做限制。As an example, at least one sample character in the sample text can be replaced with a random character with a set first random probability, and/or at least one sample character in the sample text can be replaced with a set second random probability. Characters are replaced with fixed characters. The first random probability and the second random probability may be the same or different, and this disclosure does not limit this.
例如,可以判断当前随机概率是否与第一随机概率匹配,若当前随机概率与第一随机概率匹配,则将样本文本中的至少一个样本字符替换为随机字符,并判断当前随机概率是否与第二随机概率匹配,若当前随机概率与第二随机概率匹配,则继续将样本文本中的至少一个样本字符替换为固定字符,若当前随机概率与第二随机概率不匹配,则不作任何处理。For example, it can be determined whether the current random probability matches the first random probability. If the current random probability matches the first random probability, then at least one sample character in the sample text is replaced with a random character, and it is determined whether the current random probability matches the second random probability. Random probability matching. If the current random probability matches the second random probability, continue to replace at least one sample character in the sample text with a fixed character. If the current random probability does not match the second random probability, no processing will be performed.
若当前随机概率与第一随机概率不匹配,则继续判断当前随机概率是否与第二随机概率匹配,若当前随机概率与第二随机概率匹配,则将样本文本中的至少一个样本字符替换为固定字符,若当前随机概率与第二随机概率不匹配,则不作任何处理。If the current random probability does not match the first random probability, continue to determine whether the current random probability matches the second random probability. If the current random probability matches the second random probability, replace at least one sample character in the sample text with a fixed character, if the current random probability does not match the second random probability, no processing will be performed.
以第二随机概率和第一随机概率均为10/100=10%进行示例性说明,假设当前所选的随机数命中0~100中的30~40,则确定当前随机概率与第一随机概率和第二随机概率匹配。在对样本文本进行掩码时,可以随机选择一个随机数,并判断该随机数是否位于30~40,若是,则确定当前随机概率与第一随机概率匹配,可以将样本文本中的至少一个样本字符替换为随机字符,并继续选择一个随机数,判断该随机数是否位于30~40,若是,则确定当前随机概率与第二随机概率匹配,可以继续将样本文本中的至少一个样本字符替换为固定字符。As an example, let the second random probability and the first random probability be both 10/100=10%. Assume that the currently selected random number hits 30-40 out of 0-100, then determine the current random probability and the first random probability. and second random probability match. When masking the sample text, you can randomly select a random number and determine whether the random number is between 30 and 40. If so, determine that the current random probability matches the first random probability, and at least one sample in the sample text can be masked. Characters are replaced with random characters, and continue to select a random number to determine whether the random number is between 30 and 40. If so, determine that the current random probability matches the second random probability, and you can continue to replace at least one sample character in the sample text with Fixed characters.
步骤303,将掩码后的样本文本输入至初始的预测模型,以采用预测模型对掩码后的样本文本进行字符预测,得到输出文本。Step 303: Input the masked sample text into the initial prediction model, so as to use the prediction model to perform character prediction on the masked sample text to obtain output text.
在本公开实施例中,可以将掩码后的样本文本输入至初始的预测模型,以由预测模型对掩码后的样本文本进行字符预测,得到输出文本。In the embodiment of the present disclosure, the masked sample text can be input to the initial prediction model, so that the prediction model performs character prediction on the masked sample text to obtain output text.
步骤304,根据样本文本和输出文本之间的差异,对预测模型中的模型参数进行调整。Step 304: Adjust the model parameters in the prediction model based on the difference between the sample text and the output text.
在本公开实施例中,可以根据样本文本和输出文本之间的差异,对预测模型中的模型参数进行调整。In embodiments of the present disclosure, model parameters in the prediction model can be adjusted based on the difference between the sample text and the output text.
作为一种示例,可以根据样本文本和输出文本之间的差异,生成目标损失值,其中,目标损失值与上述差异为正向关系,即差异越小,目标损失值的取值越小,反之,差异越大,目标损失值的取值越大。As an example, the target loss value can be generated based on the difference between the sample text and the output text, where the target loss value has a positive relationship with the above difference, that is, the smaller the difference, the smaller the target loss value, and vice versa. , the greater the difference, the greater the target loss value.
从而本公开中,可以根据目标损失值,对预测模型进行训练,即对预测模型中的模型参数进行调整。比如,可以根据目标损失值,对预测模型进行训练,以使目标损失值的取值最小化。需要说明的是,上述仅以模型训练的终止条件为目标损失值的取值最小化进行示例,实际应用时,也可以设置其它的终止条件,比如终止条件还可以为训练次数达到设定的次数阈值,等等,本公开对此并不做限制。Therefore, in the present disclosure, the prediction model can be trained according to the target loss value, that is, the model parameters in the prediction model can be adjusted. For example, the prediction model can be trained based on the target loss value to minimize the target loss value. It should be noted that the above example only uses the termination condition of model training as the minimization of the target loss value. In actual application, other termination conditions can also be set. For example, the termination condition can also be when the number of training times reaches the set number. Thresholds, etc., this disclosure does not limit this.
本公开实施例的基于RPA和AI的文本纠错方法,通过获取样本文本;对样本文本中的至少一个样本字符进行掩码,得到掩码后的样本文本;将掩码后的样本文本输入至初始的预测模型,以采用预测模型对掩码后的样本文本进行字符预测,得到输出文本;根据样本文本和输出文本之间的差异,对预测模型中的模型参数进行调整。由此,通过预先对预测模型进行训练,可以提升预测模型的预测效果,即采用经过训练后的预测模型对掩码后的预测文本进行字符预测,可以提升目标文本预测结果的准确性和可靠性。The text error correction method based on RPA and AI in the embodiment of the present disclosure obtains sample text; masks at least one sample character in the sample text to obtain a masked sample text; and inputs the masked sample text into The initial prediction model uses the prediction model to predict the characters of the masked sample text to obtain the output text; based on the difference between the sample text and the output text, the model parameters in the prediction model are adjusted. Therefore, by pre-training the prediction model, the prediction effect of the prediction model can be improved. That is, using the trained prediction model to predict the characters of the masked prediction text can improve the accuracy and reliability of the target text prediction results. .
为了清楚说明本公开任一实施例中是如何对预测模型进行训练的,本公开还提出一种文本纠错方法。In order to clearly illustrate how the prediction model is trained in any embodiment of the present disclosure, the present disclosure also proposes a text error correction method.
图6为本公开实施例所提供的另一种基于RPA和AI的文本纠错方法的流程示意图。Figure 6 is a schematic flowchart of another text error correction method based on RPA and AI provided by an embodiment of the present disclosure.
如图6所示,在4所示实施例的基础上,该文本纠错方法还可以包括以下步骤:As shown in Figure 6, based on the embodiment shown in 4, the text error correction method may also include the following steps:
步骤401,获取样本文本。Step 401: Obtain sample text.
步骤402,对样本文本中的至少一个样本字符进行掩码,得到掩码后的样本文本。Step 402: Mask at least one sample character in the sample text to obtain a masked sample text.
步骤403,将掩码后的样本文本输入至初始的预测模型,以采用预测模型对掩码后的样本文本进行字符预测,得到输出文本。Step 403: Input the masked sample text into the initial prediction model, so as to use the prediction model to perform character prediction on the masked sample text to obtain output text.
步骤401至403的执行过程可以参见本公开任一实施例的执行过程,在此不做赘述。The execution process of steps 401 to 403 can refer to the execution process of any embodiment of the present disclosure, and will not be described again here.
步骤404,根据样本文本和输出文本之间的字符的置信度分布差异,生成第一损失值。Step 404: Generate a first loss value based on the difference in confidence distribution of characters between the sample text and the output text.
在本公开实施例中,可以确定样本文本和输出文本之间的字符的置信度分布(或称为概率分布)差异,根据上述置信度分布差异(或概率分布差异)生成第一损失值。In an embodiment of the present disclosure, the difference in confidence distribution (or probability distribution) of characters between the sample text and the output text can be determined, and the first loss value is generated based on the above confidence distribution difference (or probability distribution difference).
其中,第一损失值与置信度分布差异为正向关系,即置信度分布差异越小,第一损失值的取值越小,反之,置信度分布差异越大,第一损失值的取值越大。比如,该第一损失值例如可以为交叉熵损失Loss。Among them, the first loss value has a positive relationship with the difference in confidence distribution. That is, the smaller the difference in confidence distribution, the smaller the value of the first loss value. On the contrary, the greater the difference in confidence distribution, the smaller the value of the first loss value. The bigger. For example, the first loss value may be cross-entropy loss Loss.
步骤405,确定进行掩码的至少一个样本字符在样本文本中的第一位置。Step 405: Determine the first position of at least one sample character to be masked in the sample text.
在本公开实施例中,可以确定进行掩码的至少一个样本字符在样本文本中所处的位置,本公开中记为第一位置。In the embodiment of the present disclosure, the position of at least one sample character to be masked in the sample text can be determined, which is recorded as the first position in the present disclosure.
步骤406,根据样本文本中的第一位置处的样本字符和输出文本中第一位置处的输出字符之间的差异,生成第二损失值。Step 406: Generate a second loss value based on the difference between the sample character at the first position in the sample text and the output character at the first position in the output text.
在本公开实施例中,可以确定样本文本中的第一位置处的样本字符和输出文本中第一位置处的输出字符之间的差异,根据上述差异,生成第二损失值。In an embodiment of the present disclosure, a difference between a sample character at a first position in the sample text and an output character at a first position in the output text may be determined, and a second loss value is generated based on the difference.
其中,第二损失值与上述差异为正向关系,即差异越小,第二损失值的取值越小,反之,差异越大,第二损失值的取值越大。比如,第二损失值也可以为交叉熵Loss。Among them, the second loss value has a positive relationship with the above-mentioned difference, that is, the smaller the difference, the smaller the value of the second loss value, and conversely, the larger the difference, the greater the value of the second loss value. For example, the second loss value can also be cross entropy Loss.
步骤407,根据第一损失值和第二损失值,生成目标损失值。Step 407: Generate a target loss value based on the first loss value and the second loss value.
在本公开实施例中,可以根据第一损失值和第二损失值,生成目标损失值。其中,目标损失值与第一损失值为正向关系,且目标损失值与第二损失值也为正向关系。In the embodiment of the present disclosure, the target loss value may be generated based on the first loss value and the second loss value. Among them, the target loss value and the first loss value have a positive relationship, and the target loss value and the second loss value also have a positive relationship.
作为一种示例,可以根据第一损失值的第一权重和第二损失值的第二权重,将第一损失值和第二损失值进行加权,得到加权结果,根据加权结果确定目标损失值,其中,目标损失值与加权结果为正向关系。As an example, the first loss value and the second loss value can be weighted according to the first weight of the first loss value and the second weight of the second loss value to obtain a weighted result, and the target loss value is determined based on the weighted result, Among them, the target loss value has a positive relationship with the weighted result.
需要说明的是,在预测模型的训练过程中,一般更关注掩码掉的字符的损失值,因此,本公开中,第二权重可以大于第一权重。It should be noted that during the training process of the prediction model, more attention is generally paid to the loss value of the masked characters. Therefore, in the present disclosure, the second weight may be greater than the first weight.
步骤408,根据目标损失值,对预测模型中的模型参数进行调整。Step 408: Adjust the model parameters in the prediction model according to the target loss value.
在本公开实施例中,可以根据目标损失值,对预测模型进行训练,即对预测模型中的模型参数进行调整。比如,可以根据目标损失值,对预测模型进行训练,以使目标损失值的取值最小化。需要说明的是,上述仅以模型训练的终止条件为目标损失值的取值最小化进行示例,实际应用时,也可以设置其它 的终止条件,比如终止条件还可以为训练次数达到设定的次数阈值,等等,本公开对此并不做限制。In the embodiment of the present disclosure, the prediction model can be trained according to the target loss value, that is, the model parameters in the prediction model can be adjusted. For example, the prediction model can be trained based on the target loss value to minimize the target loss value. It should be noted that the above example only uses the termination condition of model training as the minimization of the target loss value. In actual application, other termination conditions can also be set. For example, the termination condition can also be when the number of training times reaches the set number. Thresholds, etc., this disclosure does not limit this.
本公开实施例的基于RPA和AI的文本纠错方法,通过预先对预测模型进行训练,可以提升预测模型的预测效果,即采用经过训练后的预测模型对掩码后的预测文本进行字符预测,可以提升目标文本预测结果的准确性和可靠性。The text error correction method based on RPA and AI in the embodiment of the present disclosure can improve the prediction effect of the prediction model by pre-training the prediction model, that is, using the trained prediction model to perform character prediction on the masked prediction text. It can improve the accuracy and reliability of target text prediction results.
在本公开实施例的一种可能的实现方式中,为了提升模型预测结果的准确性,可以将样本文本中的数字、英文字符、标点、汉字中如量词千、百等语义通用性很强的字符,替换为特定字符(比如OOV字符),从而模型可以无需关注这些特定字符。具体地,针对上述实施例中的步骤205,可以获取设定字典,并判断掩码后的样本文本中的各字符是否位于设定字典中,若掩码后的样本文本中的各字符均位于设定字典中,则可以直接将将掩码后的样本文本输入至初始的预测模型,以采用预测模型对掩码后的样本文本进行字符预测,得到输出文本。此时,目标损失值可以为第一损失值和第二损失值的加权结果,即可以根据第一损失值的第一权重和第二损失值的第二权重,将第一损失值和第二损失值进行加权,以得到目标损失值。In a possible implementation of the embodiment of the present disclosure, in order to improve the accuracy of the model prediction results, numbers, English characters, punctuation points, and Chinese characters such as the quantifiers thousand and hundred in the sample text can be Characters are replaced with specific characters (such as OOV characters), so that the model does not need to pay attention to these specific characters. Specifically, for step 205 in the above embodiment, the setting dictionary can be obtained, and it can be determined whether each character in the masked sample text is located in the setting dictionary. If each character in the masked sample text is located in By setting the dictionary, the masked sample text can be directly input into the initial prediction model, so that the prediction model can be used to predict the characters of the masked sample text to obtain the output text. At this time, the target loss value may be the weighted result of the first loss value and the second loss value, that is, the first loss value and the second loss value may be combined according to the first weight of the first loss value and the second weight of the second loss value. The loss values are weighted to obtain the target loss value.
而若掩码后的样本文本中存在未位于设定字典中的第一字符,则可以将掩码后的样本文本中的第一字符替换为特定字符,并将替换后的样本文本输入至预测模型,以采用预测模型对替换后的样本文本进行字符预测,得到输出文本。此时,目标损失值并非仅根据第一损失值和第二损失值的加权结果得到,还需根据特定字符得到。下面结合图7,对上述过程进行详细说明。If there is a first character in the masked sample text that is not in the set dictionary, you can replace the first character in the masked sample text with a specific character, and input the replaced sample text into the prediction The model is used to predict characters of the replaced sample text using a prediction model to obtain the output text. At this time, the target loss value is not only obtained based on the weighted result of the first loss value and the second loss value, but also needs to be obtained based on specific characters. The above process will be described in detail below with reference to Figure 7.
图7为本公开实施例所提供的另一种基于RPA和AI的文本纠错方法的流程示意图。FIG. 7 is a schematic flowchart of another text error correction method based on RPA and AI provided by an embodiment of the present disclosure.
如图7所示,在6所示实施例的基础上,步骤407可以包括以下步骤:As shown in Figure 7, based on the embodiment shown in Figure 6, step 407 may include the following steps:
步骤501,确定第一字符在样本文本中的第二位置。Step 501: Determine the second position of the first character in the sample text.
在本公开实施例中,在掩码后的样本文本中存在未位于设定字典中的第一字符的情况下,可以确定第一字符在样本文本中所处的位置,本公开中记为第二位置。In the embodiment of the present disclosure, when there is a first character in the masked sample text that is not in the set dictionary, the position of the first character in the sample text can be determined, which is denoted as the first character in the present disclosure. Two positions.
步骤502,根据特定字符和输出文本中位于第二位置处的输出字符之间的差异,生成第三损失值。Step 502: Generate a third loss value based on the difference between the specific character and the output character located at the second position in the output text.
在本公开实施例中,可以根据特定字符和输出文本中位于第二位置处的输出字符之间的差异,生成第三损失值。其中,第三损失值与上述差异为正向关系,即差异越小,第三损失值的取值越小,反之,差异越大,第三损失值的取值越大。In embodiments of the present disclosure, the third loss value may be generated based on the difference between the specific character and the output character located at the second position in the output text. Among them, the third loss value has a positive relationship with the above-mentioned difference, that is, the smaller the difference, the smaller the value of the third loss value, and conversely, the greater the difference, the greater the value of the third loss value.
步骤503,根据第一损失值的第一权重和第二损失值的第二权重,将第一损失值和第二损失值进行加权,以得到第四损失值,其中,第二权重大于第一权重。Step 503: Weight the first loss value and the second loss value according to the first weight of the first loss value and the second weight of the second loss value to obtain a fourth loss value, where the second weight is greater than the first Weights.
在本公开实施例中,可以根据第一损失值的第一权重和第二损失值的第二权重,将第一损失值和第二损失值进行加权,得到加权结果,将加权结果作为第四损失值。In the embodiment of the present disclosure, the first loss value and the second loss value can be weighted according to the first weight of the first loss value and the second weight of the second loss value to obtain a weighted result, and the weighted result is used as the fourth loss value.
步骤504,根据第四损失值和第三损失值之间的差异,生成目标损失值。Step 504: Generate a target loss value based on the difference between the fourth loss value and the third loss value.
在本公开实施例中,可以根据第四损失值和第三损失值之间的差异,生成目标损失值,其中,目标损失值与上述差异成正向关系,即差异越小,目标损失值的取值越小,反之,差异越大,目标损失值的取值越大。In the embodiment of the present disclosure, the target loss value can be generated according to the difference between the fourth loss value and the third loss value, where the target loss value is positively related to the above difference, that is, the smaller the difference, the smaller the target loss value is. The smaller the value, conversely, the greater the difference, the greater the target loss value.
作为一种示例,可以将第四损失值减去第三损失值,得到第五损失值,将第五损失值进行放大处理,以得到目标损失值。也就是说,可以从第四损失值中,将模型无需关注的特定字符对应的第三损失值进行去除,且为了避免输出文本中有效字符较少,而导致目标损失值的取值较低的情况发生,本公开中,可以将第五损失值进行放大处理。As an example, the third loss value can be subtracted from the fourth loss value to obtain the fifth loss value, and the fifth loss value can be amplified to obtain the target loss value. That is to say, the third loss value corresponding to the specific characters that the model does not need to pay attention to can be removed from the fourth loss value, and in order to avoid having fewer effective characters in the output text, resulting in a lower target loss value. If this happens, in this disclosure, the fifth loss value can be amplified.
本公开实施例的基于RPA和AI的文本纠错方法,通过预先对预测模型进行训练,可以提升预测模型的预测效果,即采用经过训练后的预测模型对掩码后的预测文本进行字符预测,可以提升目标文本预测结果的准确性和可靠性。The text error correction method based on RPA and AI in the embodiment of the present disclosure can improve the prediction effect of the prediction model by pre-training the prediction model, that is, using the trained prediction model to perform character prediction on the masked prediction text. It can improve the accuracy and reliability of target text prediction results.
为了清楚说明本公开上述实施例中是如何得到待处理字符对应的至少一个替换字符的,本公开还提出一种基于RPA和AI的文本纠错方法。In order to clearly explain how to obtain at least one replacement character corresponding to the character to be processed in the above embodiments of the present disclosure, the present disclosure also proposes a text error correction method based on RPA and AI.
图8为本公开实施例所提供的另一种基于RPA和AI的文本纠错方法的流程示意图。FIG. 8 is a schematic flowchart of another text error correction method based on RPA and AI provided by an embodiment of the present disclosure.
如图8所示,该基于RPA和AI的文本纠错方法可以包括以下步骤:As shown in Figure 8, the text error correction method based on RPA and AI can include the following steps:
步骤601,基于OCR模型,对待识别图像进行字符识别,以得到预测文本以及预测文本中各预测字符的置信度。Step 601: Based on the OCR model, perform character recognition on the image to be recognized to obtain the predicted text and the confidence of each predicted character in the predicted text.
步骤602,根据各预测字符的置信度,从各预测字符中确定待处理字符。Step 602: Determine the character to be processed from each predicted character based on the confidence level of each predicted character.
步骤601至602的执行过程可以参见本公开任一实施例的执行过程,在此不做赘述。The execution process of steps 601 to 602 can refer to the execution process of any embodiment of the present disclosure, and will not be described again here.
步骤603,利用掩码字符,替换预测文本中的待处理字符,以得到掩码后的预测文本。Step 603: Use masked characters to replace characters to be processed in the predicted text to obtain masked predicted text.
在本公开实施例中,掩码字符可以为预设的固定字符,或者,掩码字符也可以为随机字符,本公开对此并不做限制。In the embodiment of the disclosure, the mask characters may be preset fixed characters, or the mask characters may be random characters, which the disclosure does not limit.
在本公开实施例中,可以利用掩码字符替换预测文本中的待处理字符,得以到掩码后的预测文本。In the embodiment of the present disclosure, masked characters can be used to replace characters to be processed in the predicted text to obtain masked predicted text.
步骤604,将掩码后的预测文本输入至预测模型,以采用预测模型对掩码后的预测文本中的掩码字符进行预测,得到至少一个替换字符。Step 604: Input the masked predicted text into the prediction model, so as to use the prediction model to predict the masked characters in the masked predicted text to obtain at least one replacement character.
在本公开实施例中,可以将掩码后的预测文本输入至预测模型,以由预测模型对掩码后的预测文本中的掩码字符进行预测,得到至少一个替换字符。也就是说,本公开中,预测模型可以只预测掩码(Mask)掉的字符,与完形填空的任务类似。In the embodiment of the present disclosure, the masked predicted text can be input to the prediction model, so that the prediction model predicts the masked characters in the masked predicted text to obtain at least one replacement character. That is to say, in the present disclosure, the prediction model can predict only the masked characters, similar to the cloze task.
在本公开实施例的一种可能的实现方式中,预测模型可以通过以下方式训练得到:获取样本文本,为了与上述实施例中的样本文本进行区分,上述图5至图7中的样本文本可以标记为第一样本文本,该实施例中的样本文本可以标记为第二样本文本,可以对第二样本文本中的至少一个第二样本字符进行掩码,得到掩码后的第二样本文本,并将掩码后的第二样本文本输入至初始的预测模型,以由预测模型对掩码后的至少一个第二样本字符进行预测,得到至少一个识别字符,从而本公开中,可以根据至少一个识别字符与至少一个第一样本字符之间的差异,对预测模型中的模型参数进行调整。In a possible implementation of the embodiment of the present disclosure, the prediction model can be trained in the following manner: obtaining sample text. In order to distinguish it from the sample text in the above embodiment, the sample text in the above Figure 5 to Figure 7 can be Marked as the first sample text, the sample text in this embodiment can be marked as the second sample text, and at least one second sample character in the second sample text can be masked to obtain the masked second sample text , and input the masked second sample text into the initial prediction model, so that the prediction model predicts at least one masked second sample character to obtain at least one recognized character, so that in this disclosure, it can be based on at least The difference between a recognized character and at least one first sample character adjusts model parameters in the prediction model.
作为一种示例,可以根据至少一个识别字符与至少一个第一样本字符之间的差异,生成目标损失值,其中,目标损失值与上述差异为正向关系,即差异越小,目标损失值的取值越小,反之,差异越大,目标损失值的取值越大。从而本公开中,可以根据目标损失值,对预测模型进行训练,即对预测模型中的模型参数进行调整。比如,可以根据目标损失值,对预测模型进行训练,以使目标损失值的取值最小化。需要说明的是,上述仅以模型训练的终止条件为目标损失值的取值最小化进行示例,实际应用时,也可以设置其它的终止条件,比如终止条件还可以为训练次数达到设定的次数阈值,等等,本公开对此并不做限制。As an example, a target loss value can be generated based on the difference between at least one recognized character and at least one first sample character, where the target loss value has a positive relationship with the above-mentioned difference, that is, the smaller the difference, the smaller the target loss value. The smaller the value, conversely, the greater the difference, the greater the value of the target loss value. Therefore, in the present disclosure, the prediction model can be trained according to the target loss value, that is, the model parameters in the prediction model can be adjusted. For example, the prediction model can be trained based on the target loss value to minimize the target loss value. It should be noted that the above example only uses the termination condition of model training as the minimization of the target loss value. In actual application, other termination conditions can also be set. For example, the termination condition can also be when the number of training times reaches the set number. Thresholds, etc., this disclosure does not limit this.
步骤605,根据至少一个替换字符与待处理字符的相似度,从至少一个替换字符中确定目标字符,并利用目标字符替换预测文本中的待处理字符,以得到识别文本。Step 605: Determine the target character from the at least one replacement character based on the similarity between the at least one replacement character and the character to be processed, and use the target character to replace the character to be processed in the predicted text to obtain the recognized text.
步骤605的执行过程可以参见本公开任一实施例的执行过程,在此不做赘述。The execution process of step 605 can be referred to the execution process of any embodiment of the present disclosure, and will not be described again here.
本公开实施例的基于RPA和AI的文本纠错方法,通过利用掩码字符,替换预测文本中的待处理字符,以得到掩码后的预测文本;将掩码后的预测文本输入至预测模型,以采用预测模型对掩码后的预测文本中的掩码字符进行预测,得到至少一个替换字符。由此,采用深度学习技术,预测至少一个替换字符,可以提升预测结果的准确性和可靠性。此外,采用与图2所示实施例不同的方式,预测替换字符,可以提升该方法的灵活性和适用性。The text error correction method based on RPA and AI in the embodiment of the present disclosure uses masked characters to replace the characters to be processed in the predicted text to obtain the masked predicted text; input the masked predicted text into the prediction model , to use a prediction model to predict the masked characters in the masked prediction text, and obtain at least one replacement character. Therefore, using deep learning technology to predict at least one replacement character can improve the accuracy and reliability of prediction results. In addition, predicting replacement characters in a different manner from the embodiment shown in Figure 2 can improve the flexibility and applicability of the method.
为了清楚说明本公开上述实施例中是如何对待识别图像进行字符识别的,本公开还提出一种文本纠错方法。In order to clearly explain how character recognition is performed on the image to be recognized in the above embodiments of the present disclosure, the present disclosure also proposes a text error correction method.
图9为本公开实施例所提供的另一种基于RPA和AI的文本纠错方法的流程示意图。Figure 9 is a schematic flowchart of another text error correction method based on RPA and AI provided by an embodiment of the present disclosure.
如图9所示,该基于RPA和AI的文本纠错方法可以包括以下步骤:As shown in Figure 9, the text error correction method based on RPA and AI can include the following steps:
步骤701,采用OCR模型中的特征提取分支,对待识别图像进行特征提取,以得到第一特征图。Step 701: Use the feature extraction branch in the OCR model to extract features from the image to be recognized to obtain the first feature map.
在本公开实施例中,可以采用OCR模型中的特征提取分支,对待识别图像进行特征提取,以得到第一特征图。比如,特征提取分支可以为CNN(Convolutional Neural Network,卷积神经网络)、VIT(Vision Transformer,视觉Transformer)等主干网络,通过上述主干网络对待识别图像进行特征提取,得到第一特征图。In the embodiment of the present disclosure, the feature extraction branch in the OCR model can be used to extract features of the image to be recognized to obtain the first feature map. For example, the feature extraction branch can be a backbone network such as CNN (Convolutional Neural Network) or VIT (Vision Transformer). Features are extracted from the image to be identified through the above backbone network to obtain the first feature map.
需要说明的是,待识别图像可能存在一定的倾斜、变形或者翻转的情况,而上述情况均会影响后续模型识别结果的可靠性。因此,在本公开实施例的一种可能的实现方式中,为了提升后续模型识别结果的准确性,在获取待识别图像后,可以对待识别图像进行倾斜角度的矫正。It should be noted that the image to be recognized may be tilted, deformed or flipped to a certain extent, and the above situations will affect the reliability of subsequent model recognition results. Therefore, in a possible implementation of the embodiment of the present disclosure, in order to improve the accuracy of subsequent model recognition results, after obtaining the image to be recognized, the tilt angle of the image to be recognized can be corrected.
具体地,可以对待识别图像进行角度预测,确定该待处理文档图像的倾斜角度。Specifically, angle prediction can be performed on the image to be recognized, and the tilt angle of the document image to be processed can be determined.
作为一种示例,在待识别图像的倾斜角度较大的情况下,可以采用图像分类模型,对待识别图像进 行角度预测,确定待识别图像的倾斜角度。As an example, when the tilt angle of the image to be recognized is large, an image classification model can be used to predict the angle of the image to be recognized and determine the tilt angle of the image to be recognized.
作为另一种示例,在待识别图像的倾斜角度较小的情况下,可以基于角点检测算法,对待识别图像进行角度预测,确定待识别图像的倾斜角度。As another example, when the tilt angle of the image to be recognized is small, angle prediction of the image to be recognized can be performed based on a corner point detection algorithm to determine the tilt angle of the image to be recognized.
应当理解的是,当待识别图像存在倾斜的情况下,需要检测的是不规则四边形,因此常规的目标检测算法是失效的,可以采用关键点检测算法,检测不规则四边形的四个角点,而后基于四个角点进行四边形提取,从而可以基于提取的四边形确定待识别图像的倾斜角度。It should be understood that when the image to be recognized is tilted, what needs to be detected is an irregular quadrilateral, so the conventional target detection algorithm is invalid. The key point detection algorithm can be used to detect the four corner points of the irregular quadrilateral. Then the quadrilateral is extracted based on the four corner points, so that the tilt angle of the image to be recognized can be determined based on the extracted quadrilateral.
需要说明的是,实际应用时,还可以结合上述两种示例,来对待识别图像的倾斜角度进行预测。It should be noted that in practical applications, the above two examples can also be combined to predict the tilt angle of the image to be recognized.
在本公开实施例中,在确定待识别图像的倾斜角度后,可以根据上述倾斜角度,对待识别图像进行旋转处理,由此,可以实现将存在倾斜或翻转情况的待识别图像进行角度矫正,以提升后续图像识别结果的可靠性。In the embodiment of the present disclosure, after determining the tilt angle of the image to be recognized, the image to be recognized can be rotated according to the above tilt angle, so that the angle of the image to be recognized that is tilted or flipped can be corrected. Improve the reliability of subsequent image recognition results.
步骤702,采用OCR模型中的融合分支,将第一特征图与位置图进行融合,以得到第二特征图,其中,位置图中各元素与第一特征图中各元素一一对应,位置图中的元素,用于指示第一特征图中对应元素在待识别图像中的坐标。Step 702: Use the fusion branch in the OCR model to fuse the first feature map and the location map to obtain a second feature map. Each element in the location map corresponds to each element in the first feature map. The location map The elements in are used to indicate the coordinates of the corresponding elements in the first feature map in the image to be recognized.
在本公开实施例中,可以对待识别图像进行位置编码,得到位置图,其中,位置图中的各元素与第一特征图中的各元素一一对应,位置图中的每个元素,用于指示第一特征图中对应元素在待识别图像中的坐标。In the embodiment of the present disclosure, the position of the image to be recognized can be coded to obtain a position map, in which each element in the position map corresponds to each element in the first feature map one-to-one, and each element in the position map is used to Indicates the coordinates of the corresponding element in the first feature map in the image to be recognized.
在本公开实施例中,可以采用OCR模型中的融合分支,对第一特征图与位置图进行融合,得到第二特征图。比如,可以将第一特征图与对应的位置图进行拼接,得到第二特征图。或者,可以将第一特征图与对应的位置图进行拼接,得到拼接特征图,将拼接特征图输入卷积层,以融合得到第二特征图。In the embodiment of the present disclosure, the fusion branch in the OCR model can be used to fuse the first feature map and the position map to obtain the second feature map. For example, the first feature map and the corresponding position map can be spliced to obtain the second feature map. Alternatively, the first feature map and the corresponding position map can be spliced to obtain a spliced feature map, and the spliced feature map can be input into the convolution layer to fuse to obtain the second feature map.
步骤703,采用OCR模型中的特征变换分支,将第二特征图进行特征变换,得到第三特征图。Step 703: Use the feature transformation branch in the OCR model to perform feature transformation on the second feature map to obtain a third feature map.
在本公开实施例中,可以采用OCR模型中的特征变换分支,将第二特征图进行特征变换,得到第三特征图。In the embodiment of the present disclosure, the feature transformation branch in the OCR model can be used to perform feature transformation on the second feature map to obtain the third feature map.
步骤704,采用OCR模型中的预测分支,对第三特征图进行解码,以得到预测文本和预测文本中各预测字符对应的置信度。Step 704: Use the prediction branch in the OCR model to decode the third feature map to obtain the predicted text and the confidence corresponding to each predicted character in the predicted text.
在本公开实施例中,可以采用OCR模型中的预测分支,对第三特征图进行解码,以得到预测文本和预测文本中各预测字符对应的置信度。In the embodiment of the present disclosure, the prediction branch in the OCR model can be used to decode the third feature map to obtain the predicted text and the confidence corresponding to each predicted character in the predicted text.
步骤705,根据各预测字符的置信度,从各预测字符中确定待处理字符。Step 705: Determine the character to be processed from each predicted character based on the confidence level of each predicted character.
步骤706,将预测文本中的待处理字符进行掩码,并采用预测模型对掩码后的预测文本进行字符预测,以得到待处理字符对应的至少一个替换字符。Step 706: Mask the characters to be processed in the predicted text, and use a prediction model to perform character prediction on the masked predicted text to obtain at least one replacement character corresponding to the character to be processed.
步骤707,根据至少一个替换字符与待处理字符的相似度,从至少一个替换字符中确定目标字符,并利用目标字符替换预测文本中的待处理字符,以得到识别文本。Step 707: Determine a target character from at least one replacement character based on the similarity between the at least one replacement character and the character to be processed, and use the target character to replace the character to be processed in the predicted text to obtain the recognized text.
在本公开实施例中,步骤705至707的执行过程可以参见本公开任一实施例的执行过程,在此不做赘述。In the embodiment of the present disclosure, the execution process of steps 705 to 707 can be referred to the execution process of any embodiment of the present disclosure, and will not be described again.
作为一种示例,可以将纠错作为一个与OCR识别不耦合的后处理模型。例如,发明人以采用如图10所示的OCR识别流程,对图像中的字符信息进行识别进行示例性说明。如图10所示,OCR识别流程主要包括:对图像进行旋转处理;检测图像上的文本行(列);对文本行(列)内容进行字符识别;对识别得到的各字符的坐标信息进行还原;输出各字符、各字符的坐标和各字符的置信度。As an example, error correction can be used as a post-processing model that is not coupled to OCR recognition. For example, the inventor uses the OCR recognition process as shown in Figure 10 to exemplify the recognition of character information in the image. As shown in Figure 10, the OCR recognition process mainly includes: rotating the image; detecting the text rows (columns) on the image; performing character recognition on the content of the text rows (columns); and restoring the coordinate information of each character recognized. ; Output each character, the coordinates of each character and the confidence of each character.
经过分析可以确定,长尾识别错误的主要原因是发生在图10中的文字行内容识别这个阶段。因此,可以在文字行内容识别这个阶段之后,以及坐标信息还原这个阶段之前,增加一个后处理模块,采用上述后处理模块,将文字行内容进行纠正处理。After analysis, it can be determined that the main reason for the long-tail recognition error is the text line content recognition stage in Figure 10. Therefore, a post-processing module can be added after the stage of text line content identification and before the coordinate information restoration stage, and the above-mentioned post-processing module can be used to correct the text line content.
文字行内容识别这个阶段所采用的模型结构可以如图11所示,可以通过特征提取分支,比如CNN或VIT等主干网络,对待识别图像进行特征提取,得到第一特征图;通过融合分支,将提取后的第一特征图与位置图进行融合,得到第二特征图;通过特征变换分支,对第二特征图进行特征变换,得到第三特征图,比如,特征变换分支可以包括转换分支(该转换分支用于将第二特征图转换为一维特征(Reshape to 1-D)或二维特征(Reshape to 2-D))、Transformer(用于对转换后的特征继续进行特征变换,得到特征序列sequence feature)以及MLP(Multi Layer Perceptron,多层感知机)(用于对特征序列进行特征变换, 得到第三特征图);通过预测分支,比如CTC(Connectionist temporal classification,基于神经网络的时序类分类),对第三特征图进行解码,以得到预测文本和预测文本中各预测字符对应的置信度。The model structure used in this stage of text line content recognition can be shown in Figure 11. The feature extraction branch, such as a backbone network such as CNN or VIT, can be used to extract features of the image to be recognized to obtain the first feature map; through the fusion branch, the The extracted first feature map is fused with the position map to obtain a second feature map; through the feature transformation branch, feature transformation is performed on the second feature map to obtain a third feature map. For example, the feature transformation branch may include a transformation branch (the The conversion branch is used to convert the second feature map into one-dimensional features (Reshape to 1-D) or two-dimensional features (Reshape to 2-D)), and Transformer (used to continue feature transformation on the converted features to obtain features sequence feature) and MLP (Multi Layer Perceptron, multi-layer perceptron) (used to perform feature transformation on the feature sequence to obtain the third feature map); through prediction branches, such as CTC (Connectionist temporal classification, neural network-based time series class Classification), decode the third feature map to obtain the predicted text and the confidence corresponding to each predicted character in the predicted text.
在一个实施例中,在模型训练的最后阶段,可以打开多任务训练,比如中心损失center-loss和rdrop(Regularized Dropout,一种正则化方法),来提高模型识别的准确率。In one embodiment, in the final stage of model training, multi-task training, such as center-loss and rdrop (Regularized Dropout, a regularization method), can be turned on to improve the accuracy of model recognition.
本公开实施例的基于RPA和AI的文本纠错方法,通过采用深度学习技术,对待识别图像进行OCR识别,可以提升识别结果的准确性和可靠性。The text error correction method based on RPA and AI in the embodiment of the present disclosure can improve the accuracy and reliability of the recognition results by using deep learning technology to perform OCR recognition on the image to be recognized.
需要说明的是,相关技术中可以基于字典的查找算法、基于贝叶斯的传统机器学习算法、基于深度学习算法,来实现对文本进行纠错。在广义的文本纠错上,一般具有以下三类问题需要解决:It should be noted that in related technologies, dictionary-based search algorithms, traditional Bayesian-based machine learning algorithms, and deep learning algorithms can be used to correct text errors. In general text error correction, there are generally three types of problems that need to be solved:
第一类,替换SubStitution。比如,将“今天我感到飞长高兴”替换为“今天我感到非常高兴”。The first category, replace SubStitution. For example, replace "Today I feel so happy" with "Today I feel very happy."
第二类,增补与删除Insert & Delete。比如,将“今天我感到常高兴”增补为“今天我感到非常高兴”;再比如,将“今天我感到非非常高兴”删除为“今天我感到非常高兴”。The second category is Insert & Delete. For example, add "Today I feel very happy" to "Today I feel very happy"; for another example, delete "Today I feel very happy" to "Today I feel very happy".
第三类,改写Local Paraphrasing(极小限度)。比如,将“今天我非常感到高兴”改写为“今天我感到非常高兴”。The third category is rewriting Local Paraphrasing (to a minimum extent). For example, change "I feel very happy today" to "I feel very happy today."
然而,考虑到OCR识别错误的主要原因为:干扰或者形近字导致的误识别,因此,需要解决的问题主要为上述第一类问题。此外,由于纠错是一个通用场景,因此,基于词典的方案并不适用,可以采用基于模型的方案。However, considering that the main cause of OCR recognition errors is: interference or misrecognition caused by similar words, therefore, the problems that need to be solved are mainly the first type of problems mentioned above. In addition, since error correction is a general scenario, dictionary-based solutions are not suitable and model-based solutions can be used.
如果采用有监督的模型对文本进行纠错,则在纠错之前,需要利用大量的标注样本对,对上述有监督的模型进行训练,其中,标注样本对包括OCR识别的错误结果和准确结果。具体地,训练过程为:根据标注样本对中错误结果和准确结果之间的差异,对上述有监督的模型进行训练。If a supervised model is used to correct text errors, a large number of annotated sample pairs need to be used to train the above-mentioned supervised model before error correction. The annotated sample pairs include incorrect results and accurate results of OCR recognition. Specifically, the training process is: training the above-mentioned supervised model based on the difference between the incorrect results and the accurate results of the labeled sample alignment.
然而上述方式中,为了提升模型的预测效果,需要产生足够多的标注样本对来对模型进行训练,标注成本较高,因此,采用标注样本对进行训练的有监督模型也不适用。However, in the above method, in order to improve the prediction effect of the model, it is necessary to generate enough labeled sample pairs to train the model, and the labeling cost is high. Therefore, supervised models that use labeled sample pairs for training are not suitable.
从图10可知,OCR识别流程包括多个阶段,每个阶段都需要GPU(graphics processing unit,图像处理单元),为了节省昂贵的GPU,后处理模块(或称为纠错模型)可以为轻量化模型,可以在CPU(Central Processing Unit,中央处理单元)上执行。因此,可以采用轻量的非有监督的模型和方案,对OCR识别结果进行纠错。As can be seen from Figure 10, the OCR recognition process includes multiple stages, each stage requires a GPU (graphics processing unit, image processing unit). In order to save expensive GPU, the post-processing module (or error correction model) can be lightweight The model can be executed on the CPU (Central Processing Unit). Therefore, lightweight unsupervised models and solutions can be used to correct OCR recognition results.
具体实现方案为:The specific implementation plan is:
考虑到相关技术中文本纠错常用的模型基本分为以下两类:Considering that the commonly used models for text error correction in related technologies are basically divided into the following two categories:
第一类,序列任务,如机器翻译。简单而言,是将有错字的句子翻译成准确句子。The first category is sequence tasks, such as machine translation. Simply put, it is to translate sentences with typos into accurate sentences.
第二类,检测错误加纠正。简单而言,是检测句子中可能发生错误的字,如果检测到错字,则将错字进行纠正。The second category is to detect errors and correct them. To put it simply, it is to detect possible errors in the sentence, and if a typo is detected, the typo will be corrected.
在单一的纠错任务上,第二类的纠错效果更好,比如软屏蔽Soft-Masked BERT(Bidirectional Encoder Representations from Transformers,来自转换器的双向编码器的表示)模型,可以实现较好的纠错效果,原因为:额外错字检测任务可以有效地缓解纠错中的误召回问题。On a single error correction task, the second type of error correction effect is better, such as the soft-masked BERT (Bidirectional Encoder Representations from Transformers, bidirectional encoder representation from the converter) model, which can achieve better correction The reason is that the additional typo detection task can effectively alleviate the false recall problem in error correction.
因此,本公开中,可以采用上述第二类设计模型,将整个纠错任务分为三大部分:Therefore, in this disclosure, the above-mentioned second type of design model can be used to divide the entire error correction task into three parts:
第一部分为:错误字检测。The first part is: error word detection.
在OCR任务上,OCR模型的最后一层是一个N分类任务(N为识别的字符个数),从分类网络的损失函数而言,如果预测的字符越准确,则该字符在分类任务的置信度(或softmax识别概率)p也会越高。In the OCR task, the last layer of the OCR model is an N classification task (N is the number of characters recognized). From the perspective of the loss function of the classification network, if the predicted characters are more accurate, the confidence of the character in the classification task will be The degree (or softmax recognition probability) p will also be higher.
发明人利用OCR模型对测试集中的测试文本进行测试,得到:每个字符的置信度或识别概率,在将置信度阈值(或称为概率阈值)从1.0逐渐下降的情况下,剩下字符的识别准确率可以得到提升。由此,可以推断出最佳置信度阈值(或称为最佳概率阈值)f,即低于该最佳置信度阈值(或称为最佳概率阈值)f时,剩下字符的识别准确率的提升不大。The inventor used the OCR model to test the test text in the test set and obtained: the confidence or recognition probability of each character. When the confidence threshold (or probability threshold) is gradually decreased from 1.0, the remaining characters The recognition accuracy can be improved. From this, it can be inferred that the optimal confidence threshold (or optimal probability threshold) f is the recognition accuracy of the remaining characters when it is lower than the optimal confidence threshold (or optimal probability threshold) f. The improvement is not big.
本公开中,在OCR纠错场景下,采用字符的置信度(或识别概率)p作为先验知识,来进行错误字检测,可以无需额外的错误检测模型。具体地,在字符的置信度(或识别概率)p高于最佳置信度阈值(或称为最佳概率阈值)f的情况下,则认为该字符识别无误,反之,在字符的置信度(或识别概率)p未高于最佳置信度阈值(或称为最佳概率阈值)f的情况下,则认为该字符识别有误。In this disclosure, in the OCR error correction scenario, the confidence level (or recognition probability) p of the character is used as a priori knowledge to detect incorrect words, without the need for an additional error detection model. Specifically, when the confidence level (or recognition probability) p of a character is higher than the optimal confidence threshold (or optimal probability threshold) f, the character is considered to be recognized correctly. On the contrary, when the confidence level (or recognition probability) of the character ( If the recognition probability) p is not higher than the optimal confidence threshold (or the optimal probability threshold) f, it is considered that the character recognition is incorrect.
在某个字符识别有误的情况下,可以将该字符进行纠正字召回。When a certain character is recognized incorrectly, the character can be recalled as a corrective word.
第二部分为:纠正字召回。The second part is: correct word recall.
可以分别从离线模型训练和在线模型调用,来说明纠正字召回。Corrective word recall can be illustrated from offline model training and online model calling respectively.
其中,离线模型训练:Among them, offline model training:
模型设计:经过统计分析,在绝大多数的OCR识别的错误场景中,一个语句只有一个错误字符,并且,由于缺乏有监督的标注样本对,因此,本公开中,可以采用MLM(Mask Language Modeling,遮罩语言模型)的方式,基于自监督训练任务来实现模型的训练。Model design: After statistical analysis, in the vast majority of error scenarios recognized by OCR, a sentence has only one error character, and due to the lack of supervised annotation sample pairs, in this disclosure, MLM (Mask Language Modeling) can be used , mask language model) method to implement model training based on self-supervised training tasks.
对于模型设计,可以包括两种方案:第一种是只预测Mask掉的字符、类似完形填空的任务;第二种是预测整句话的所有字符,类似机器翻译。经过实验可以发现,第二种的预测效果更好。举例说明,针对“花园的东南角有颗桃花树”这条训练语料(本公开中记为样本文本),可以设计以下任务(左侧是模型输入,右侧是模型输出):For model design, there are two options: the first is to predict only the characters removed by the Mask, a task similar to cloze; the second is to predict all characters of the entire sentence, similar to machine translation. After experiments, it can be found that the second kind of prediction effect is better. For example, for the training corpus "There is a peach blossom tree in the southeast corner of the garden" (recorded as sample text in this disclosure), the following tasks can be designed (the left side is the model input, the right side is the model output):
花园的[mask(固定字符)]南角有颗桃花树->花园的东南角有颗桃花树;There is a peach blossom tree in the south corner of [mask (fixed character)] of the garden -> there is a peach blossom tree in the southeast corner of the garden;
花园的[随机字符]南角有颗桃花树->花园的东南角有颗桃花树;There is a peach blossom tree in the south corner of [random character] of the garden -> there is a peach blossom tree in the southeast corner of the garden;
在模型架构设计上,为了减少输出延迟,可以采用非自回归的任务,舍弃BERT模型的解码Decoder部分、只采用编码Encoder部分,综合预测性能和预测效果,本公开中,如图12所示,预测模型可以采用标准的六层编码层TransformerEncoderLayer堆叠,模型输入的最大字符长度为32位,模型参数head_num选择6,字符的特征向量embedding的维度为128,即预测模型(如图12右侧部分,Multi-Head Attention)可包括6层注意力编码层,将编码序列(最长32位)输入该预测模型中,预测模型的最后一层可输出32×128的编码向量,最终可将第6层注意力编码层的输出特征,在32个特征向量上分别通过softmax分类得到32个预测字符。In the model architecture design, in order to reduce the output delay, non-autoregressive tasks can be used, the decoding Decoder part of the BERT model is discarded, and only the encoding Encoder part is used to comprehensively predict the performance and prediction effect. In this disclosure, as shown in Figure 12, The prediction model can use the standard six-layer encoding layer TransformerEncoderLayer stack. The maximum character length of the model input is 32 bits. The model parameter head_num is selected as 6. The dimension of the character feature vector embedding is 128, that is, the prediction model (as shown on the right side of Figure 12). Multi-Head Attention) can include 6 layers of attention coding layers. The coding sequence (up to 32 bits) is input into the prediction model. The last layer of the prediction model can output a 32×128 coding vector, and finally the 6th layer can be The output features of the attention encoding layer are classified into 32 predicted characters through softmax on 32 feature vectors.
其中,Multi-Head Attention是通过多个(如,h个)不同的线性变换对Q,K,V进行投影,之后将不同的注意力结果进行拼接;对拼接结果采用缩放-点积(Scaled Dot-Product Attention)进行注意力计算;对注意力计算的结果经过连接层(Concat)与线性层(Liner)进行非线性变换处理。其中,Scaled Dot-Product Attention的结构可如图12左侧部分所示。在Scaled Dot-Product Attention中,将Q、K经过矩阵点乘(MatMul),将矩阵相乘的结果进行缩放(Scale)和掩码(Mask)处理,将掩码后的结果经过激活函数SoftMax进行处理,将激活函数的处理结果与V进行矩阵点乘,可表现为如下公式:Among them, Multi-Head Attention projects Q, K, and V through multiple (e.g., h) different linear transformations, and then splices different attention results; the splicing results are scaled dot product (Scaled Dot). -Product Attention) performs attention calculation; the results of attention calculation are subjected to non-linear transformation processing through the connection layer (Concat) and the linear layer (Liner). Among them, the structure of Scaled Dot-Product Attention can be shown in the left part of Figure 12. In Scaled Dot-Product Attention, Q and K are subjected to matrix dot multiplication (MatMul), the result of the matrix multiplication is scaled (Scale) and masked (Mask), and the masked result is processed through the activation function SoftMax. Processing, the matrix dot multiplication of the processing result of the activation function and V can be expressed as the following formula:
Figure PCTCN2022091292-appb-000001
Figure PCTCN2022091292-appb-000001
其中,Q、K、V分别为注意力层的输入与模型参数执行矩阵运算得到的三个矩阵。d k表示归一化因子,T表示矩阵的转置操作。在d k较小时和点积结果相似,在d k较大时,如果不进行缩放则表现更好,但点积的计算速度更快,进行缩放后可减少影响。 Among them, Q, K, and V are the three matrices obtained by performing matrix operations on the input of the attention layer and the model parameters respectively. d k represents the normalization factor, and T represents the transpose operation of the matrix. When d k is small, the results are similar to the dot product. When d k is large, the performance is better without scaling, but the dot product is calculated faster and the impact can be reduced after scaling.
需要说明的是,预测模型采用以Transformer为基本结构的模型的原因为:Transformer的优势是通过多头注意力机制Multi-HeadAttention中的多头自注意力机制MultiHeadSelfAttention,捕获长距离字符间的相关性。例如,多头注意力机制Multi-HeadAttention可以如图12右侧部分所示。It should be noted that the reason why the prediction model uses a model with Transformer as the basic structure is that the advantage of Transformer is to capture the correlation between long-distance characters through the multi-head self-attention mechanism MultiHeadSelfAttention in the multi-HeadAttention mechanism. For example, the multi-head attention mechanism Multi-HeadAttention can be shown in the right part of Figure 12.
训练数据:在训练语料上,可以从多个数据源抓取多个新闻数据,将上述多个新闻数据拆分为超过2亿条训练语料(本公开中记为样本文本)。不同于在模型训练前对样本文本进行掩码的方式而言,本公开中,可以在训练过程中,对样本文本进行随机掩码Mask,以第一随机概率(比如10%)将样本文本中任意一个字符换成随机字符,以第二随机概率(比如10%)将样本文本中任意一个字符换成Mask字符(即固定字符)。Training data: On the training corpus, multiple news data can be captured from multiple data sources, and the above multiple news data can be split into more than 200 million pieces of training corpus (recorded as sample text in this disclosure). Different from the method of masking the sample text before model training, in this disclosure, the sample text can be randomly masked during the training process, and the sample text can be masked with a first random probability (such as 10%). Any character is replaced with a random character, and any character in the sample text is replaced with a Mask character (i.e., a fixed character) with a second random probability (such as 10%).
此外,在样本文本上,为了减小模型体积,可以采用最常见的3900个汉字字符,组成设定字典;为了提高模型预测结果的准确率,可以将样本文本中的数字、英文字符、标点、汉字中如量词千、百等语义通用性很强的字符,替换为特定字符,比如OOV的字符。In addition, in the sample text, in order to reduce the model size, the most common 3900 Chinese characters can be used to form a set dictionary; in order to improve the accuracy of the model prediction results, the numbers, English characters, punctuation, and Characters with strong semantic universality in Chinese characters, such as the quantifiers thousand and hundred, are replaced with specific characters, such as OOV characters.
损失函数Loss设计:采用交叉熵损失函数,其中,可以通过以下方式,确定损失函数的取值:交叉熵Loss由两部分组成,一部分是整句话所有预测字符的交叉熵Loss(本公开中记为第一损失值),一部分是Mask掉的字符的交叉熵Loss(本公开中记为第二损失值),因为在最终的使用上更关注第二部分,所以第二部分Loss的权重会更高。Loss function Loss design: Use cross-entropy loss function, in which the value of the loss function can be determined in the following way: Cross-entropy Loss consists of two parts, one part is the cross-entropy Loss of all predicted characters in the entire sentence (note in this disclosure is the first loss value), and part of it is the cross-entropy Loss of the masked characters (recorded as the second loss value in this disclosure). Because in the final use, more attention is paid to the second part, so the weight of the second part Loss will be more high.
在Loss计算过程中,需要将OOV字符预测的Loss(本公开中记为第三损失值)去除,并将剩余Loss根据有效字符数等比放大,主要是避免有效字符较少,导致Loss较低的情况发生。During the loss calculation process, it is necessary to remove the loss predicted by OOV characters (recorded as the third loss value in this disclosure), and amplify the remaining loss proportionally according to the number of effective characters, mainly to avoid having fewer effective characters, resulting in lower loss. situation occurs.
其中,在线调用:在线调用时,可以将OCR识别结果中,置信度或识别概率p低于f的字符换成OOV的字符,如句子包含的字符个数超过32个,则可以对句子进行首尾进行截断,以确保掩码的字符位于句子的中间部位。虽然OCR识别结果包含所有的预测字符,但是,本公开只使用低概率或低置信度的预测字符作为召回,忽略其他位置的预测字符,比如召回20个(Top20)预测字符。Among them, online calling: When calling online, the characters whose confidence or recognition probability p is lower than f in the OCR recognition results can be replaced with OOV characters. If the sentence contains more than 32 characters, the beginning and end of the sentence can be Truncate to ensure that the masked characters are in the middle of the sentence. Although the OCR recognition results include all predicted characters, this disclosure only uses low-probability or low-confidence predicted characters as recalls, ignoring predicted characters in other positions, such as recalling 20 (Top20) predicted characters.
第三部分为:纠正字排序。The third part is: Correcting word sorting.
由上述可知,整个召回的部分是一个纯语义的模型,只用模型召回的Top1进行纠正,无法保证纠正结果的准确率,例如模型的输入为:“花园的[mask]南角有颗桃花树”,预测的字符“东、西”随便用哪个都可能有错误,所以需要一个带有先验知识的排序模块选择最准确的字符。As can be seen from the above, the entire recall part is a purely semantic model. Only the Top 1 recalled by the model are used for correction. The accuracy of the correction results cannot be guaranteed. For example, the input of the model is: "There is a peach blossom tree in the south corner of the garden [mask]" , the predicted characters "East and West" may be wrong no matter which one is used, so a sorting module with prior knowledge is needed to select the most accurate characters.
因为前边有提到,OCR识别错误的主要原因就是形似字或者字符有干扰,所以本公开中,可以采用字符相似度,作为排序指标。As mentioned before, the main cause of OCR recognition errors is similar characters or character interference, so in this disclosure, character similarity can be used as a sorting index.
其中,可以通过以下三种方法,计算字符相似度:Among them, character similarity can be calculated through the following three methods:
第一种,汉字四角编码算法。基于汉字四角编码算法,计算字符的编码值,根据字符的编码值,确定字符间的相似度。The first is the Chinese character four-corner encoding algorithm. Based on the Chinese character four-corner encoding algorithm, the encoding value of the character is calculated, and the similarity between characters is determined based on the encoding value of the character.
第二种,图片相似度。可以采用楷体、宋体等字体,将每个字符铺满绘制到白底的128*128的图片上,然后两两计算图片相似度作为字符相似度,或者,可以对图片进行特征提取,根据提取的特征向量,计算图片相似度。从而可以根据图片相似度,确定字符相似度。The second type is image similarity. You can use regular fonts, Song fonts and other fonts to fill each character on a 128*128 picture with a white background, and then calculate the similarity between the pictures as the character similarity. Alternatively, you can extract features from the pictures and extract them based on the extracted characters. Feature vector to calculate image similarity. Thus, the character similarity can be determined based on the image similarity.
第三种,OCR特征向量。可以根据OCR模型中的最后一层的softmax矩阵,确定各字符的特征向量,从而可以基于字符的特征向量进行相似度计算。上述矩阵为N×D的矩阵,其中,N为字符数,D为向量维度,可以认为该矩阵的每行D维向量就是对应的字符的表征向量或特征向量。通过第三种方法,可以得到如图13所示的字符相似度。其中,图13中的第三列为字符形状相似度。The third type, OCR feature vector. The feature vector of each character can be determined based on the softmax matrix of the last layer in the OCR model, so that similarity calculation can be performed based on the feature vector of the character. The above matrix is an N×D matrix, where N is the number of characters and D is the vector dimension. It can be considered that the D-dimensional vector in each row of the matrix is the representation vector or feature vector of the corresponding character. Through the third method, the character similarity shown in Figure 13 can be obtained. Among them, the third column in Figure 13 is the character shape similarity.
经过实验,可知,第三种的效果高于第一种和第二种。可以根据字符形状相似度,确定召回的字符和待纠错的字符之间的字符相似度,若字符相似度高于设定阈值,则可以选择字符相似度最高的字符,作为目标字符。After experiments, it can be seen that the third kind is more effective than the first and second kinds. The character similarity between the recalled characters and the characters to be corrected can be determined based on the character shape similarity. If the character similarity is higher than the set threshold, the character with the highest character similarity can be selected as the target character.
在模型上线后,发明人在大规模的测试集上对模型进行测试,得到:在F1指标已经很高的基础上,仍然能够实现对F1指标提升超过0.03%。After the model went online, the inventor tested the model on a large-scale test set and found that on the basis of the already high F1 index, the F1 index could still be improved by more than 0.03%.
作为一种示例,如图14所示,在对图像中的文本行进行检测时,如果文本行字符的部分区域被裁切,导致“提”被误识别为“摆”,则通过上述模型,可以实现正常纠正。As an example, as shown in Figure 14, when detecting a text line in an image, if part of the text line character is cropped, causing "ti" to be mistakenly recognized as "pendulum", then through the above model, Normal correction can be achieved.
作为另一种示例,如图15所示,在图像中存在红章干扰时,导致“通用”被误识别为“涌用”,通过上述模型,可以实现正常纠正。As another example, as shown in Figure 15, when there is red stamp interference in the image, causing "general" to be misidentified as "general", normal correction can be achieved through the above model.
上述为文本纠错方法对应的各个实施例,本公开还提出一种用于对上述任一方法实施例中的预测模型进行训练的方法。The above are various embodiments corresponding to the text error correction method. The present disclosure also proposes a method for training the prediction model in any of the above method embodiments.
图16为本公开实施例所提供的一种训练方法的流程示意图。Figure 16 is a schematic flowchart of a training method provided by an embodiment of the present disclosure.
如图16所示,该训练方法可以包括以下步骤:As shown in Figure 16, the training method may include the following steps:
步骤801,获取样本文本。Step 801: Obtain sample text.
步骤801,对样本文本中的至少一个样本字符进行掩码,得到掩码后的样本文本。Step 801: Mask at least one sample character in the sample text to obtain a masked sample text.
步骤802,将掩码后的样本文本输入至初始的预测模型,以采用预测模型对掩码后的样本文本进行字符预测,得到输出文本。Step 802: Input the masked sample text into the initial prediction model, so as to use the prediction model to perform character prediction on the masked sample text to obtain output text.
步骤803,根据样本文本和输出文本之间的差异,对预测模型中的模型参数进行调整。Step 803: Adjust the model parameters in the prediction model based on the difference between the sample text and the output text.
在本公开实施例的一种可能的实现方式中,可以根据样本文本和输出文本之间的字符的置信度分布差异,生成第一损失值;确定进行掩码的至少一个样本字符在样本文本中的第一位置;根据样本文本中的第一位置处的样本字符和输出文本中第一位置处的输出字符之间的差异,生成第二损失值;根据第一损失值和第二损失值,生成目标损失值;根据目标损失值,对预测模型中的模型参数进行调整。In a possible implementation of the embodiment of the present disclosure, the first loss value can be generated based on the difference in confidence distribution of characters between the sample text and the output text; it is determined that at least one sample character to be masked is in the sample text the first position; generate a second loss value based on the difference between the sample character at the first position in the sample text and the output character at the first position in the output text; according to the first loss value and the second loss value, Generate a target loss value; adjust the model parameters in the prediction model based on the target loss value.
在本公开实施例的一种可能的实现方式中,可以获取设定字典;判断掩码后的样本文本中的各字符是否位于设定字典中;在掩码后的样本文本中存在未位于设定字典中的第一字符的情况下,将掩码后的 样本文本中的第一字符替换为特定字符;采用预测模型对替换后的样本文本进行字符预测,以得到输出文本。In a possible implementation of the embodiment of the present disclosure, the setting dictionary can be obtained; it is determined whether each character in the masked sample text is located in the setting dictionary; if there is a character in the masked sample text that is not located in the setting dictionary, When the first character in the dictionary is specified, replace the first character in the masked sample text with a specific character; use a prediction model to predict characters in the replaced sample text to obtain the output text.
在本公开实施例的一种可能的实现方式中,可以确定第一字符在样本文本中的第二位置;根据特定字符和输出文本中位于第二位置处的输出字符之间的差异,生成第三损失值;根据第一损失值的第一权重和第二损失值的第二权重,将第一损失值和第二损失值进行加权,以得到第四损失值,其中,第二权重大于第一权重;根据第四损失值和第三损失值之间的差异,生成目标损失值。In a possible implementation of the embodiment of the present disclosure, a second position of the first character in the sample text may be determined; and a third character may be generated based on the difference between the specific character and the output character located at the second position in the output text. Three loss values; according to the first weight of the first loss value and the second weight of the second loss value, the first loss value and the second loss value are weighted to obtain a fourth loss value, where the second weight is greater than the third loss value. One weight; generate a target loss value based on the difference between the fourth loss value and the third loss value.
需要说明的是,前述任一实施例中对基于RPA和AI的文本纠错方法的解释说明也适用于该实施例,其实现原理类似,在此不做赘述。It should be noted that the explanation of the text error correction method based on RPA and AI in any of the foregoing embodiments also applies to this embodiment, and the implementation principles are similar and will not be described again here.
本公开实施例的训练方法,通过获取样本文本;对样本文本中的至少一个样本字符进行掩码,得到掩码后的样本文本;将掩码后的样本文本输入至初始的预测模型,以采用预测模型对掩码后的样本文本进行字符预测,得到输出文本;根据样本文本和输出文本之间的差异,对预测模型中的模型参数进行调整。由此,通过预先对预测模型进行训练,可以提升预测模型的预测效果。The training method of the embodiment of the present disclosure obtains sample text; masks at least one sample character in the sample text to obtain the masked sample text; and inputs the masked sample text into the initial prediction model to adopt The prediction model performs character prediction on the masked sample text to obtain the output text; based on the difference between the sample text and the output text, the model parameters in the prediction model are adjusted. Therefore, by training the prediction model in advance, the prediction effect of the prediction model can be improved.
与上述图3至图9实施例提供的基于RPA和AI的文本纠错方法相对应,本公开还提供一种基于RPA和AI的文本纠错装置,由于本公开实施例提供的基于RPA和AI的文本纠错装置与上述图3至图9实施例提供的基于RPA和AI的文本纠错方法相对应,因此在基于RPA和AI的文本纠错方法的实施方式也适用于本公开实施例提供的基于RPA和AI的文本纠错装置,在本公开实施例中不再详细描述。Corresponding to the text error correction method based on RPA and AI provided by the above embodiments of FIG. 3 to FIG. 9 , the present disclosure also provides a text error correction device based on RPA and AI. Since the RPA and AI based text error correction method provided by the embodiment of the present disclosure The text error correction device corresponds to the text error correction method based on RPA and AI provided by the above-mentioned embodiments of Figures 3 to 9. Therefore, the implementation of the text error correction method based on RPA and AI is also applicable to the text error correction method provided by the embodiment of the present disclosure. The text error correction device based on RPA and AI will not be described in detail in the embodiment of this disclosure.
图17为本公开实施例提供的一种基于RPA和AI的文本纠错装置的结构示意图。Figure 17 is a schematic structural diagram of a text error correction device based on RPA and AI provided by an embodiment of the present disclosure.
如图17所示,该基于RPA和AI的文本纠错装置1700,可以包括:识别模块1710、确定模块1720、掩码模块1730、预测模块1740以及替换模块1750。As shown in Figure 17, the text error correction device 1700 based on RPA and AI may include: an identification module 1710, a determination module 1720, a mask module 1730, a prediction module 1740, and a replacement module 1750.
其中,识别模块1710,用于基于光学字符识别OCR模型,对待识别图像进行字符识别,以得到预测文本以及预测文本中各预测字符的置信度。Among them, the recognition module 1710 is used to perform character recognition on the image to be recognized based on the optical character recognition OCR model, so as to obtain the predicted text and the confidence of each predicted character in the predicted text.
确定模块1720,用于根据各预测字符的置信度,从各预测字符中确定待处理字符。The determination module 1720 is used to determine the character to be processed from each predicted character according to the confidence of each predicted character.
掩码模块1730,用于将预测文本中的待处理字符进行掩码。 Masking module 1730 is used to mask the characters to be processed in the predicted text.
预测模块1740,用于采用预测模型对掩码后的预测文本进行字符预测,以得到待处理字符对应的至少一个替换字符。The prediction module 1740 is configured to use a prediction model to perform character prediction on the masked predicted text to obtain at least one replacement character corresponding to the character to be processed.
替换模块1750,用于根据至少一个替换字符与待处理字符的相似度,从至少一个替换字符中确定目标字符,并利用目标字符替换预测文本中的待处理字符,以得到识别文本。The replacement module 1750 is configured to determine a target character from at least one replacement character based on the similarity between the at least one replacement character and the character to be processed, and use the target character to replace the character to be processed in the predicted text to obtain the recognized text.
在本公开实施例的一种可能的实现方式中,该基于RPA和AI的文本纠错装置1700,可以应用于RPA机器人。In a possible implementation of the embodiment of the present disclosure, the text error correction device 1700 based on RPA and AI can be applied to RPA robots.
在本公开实施例的一种可能的实现方式中,掩码模块1730,用于:确定待处理字符在预测文本中的目标位置;获取掩码字符,并利用掩码字符替换预测文本中目标位置处的待处理字符,以得到掩码后的预测文本。In a possible implementation of the embodiment of the present disclosure, the mask module 1730 is used to: determine the target position of the character to be processed in the predicted text; obtain the mask character, and use the mask character to replace the target position in the predicted text to obtain the masked predicted text.
预测模块1740,用于:将掩码后的预测文本输入至预测模型,以采用预测模型对掩码后的预测文本进行字符预测,以得到至少一个目标文本;将至少一个目标文本中目标位置处的字符,作为至少一个替换字符。The prediction module 1740 is used to: input the masked predicted text into the prediction model, and use the prediction model to perform character prediction on the masked predicted text to obtain at least one target text; characters as at least one replacement character.
在本公开实施例的一种可能的实现方式中,预测模型通过以下模块训练得到:In a possible implementation of the embodiment of the present disclosure, the prediction model is trained through the following modules:
获取模块,用于获取样本文本。Get module, used to get sample text.
掩码模块,还用于对样本文本中的至少一个样本字符进行掩码,得到掩码后的样本文本。The masking module is also used to mask at least one sample character in the sample text to obtain the masked sample text.
输入模块,用于将掩码后的样本文本输入至初始的预测模型,以采用预测模型对掩码后的样本文本进行字符预测,得到输出文本。The input module is used to input the masked sample text into the initial prediction model, so as to use the prediction model to perform character prediction on the masked sample text to obtain the output text.
调整模块,用于根据样本文本和输出文本之间的差异,对预测模型中的模型参数进行调整。The adjustment module is used to adjust the model parameters in the prediction model based on the difference between the sample text and the output text.
在本公开实施例的一种可能的实现方式中,调整模块,用于:根据样本文本和输出文本之间的字符的置信度分布差异,生成第一损失值;确定进行掩码的至少一个样本字符在样本文本中的第一位置;根据样本文本中的第一位置处的样本字符和输出文本中第一位置处的输出字符之间的差异,生成第二损失值;根据第一损失值和第二损失值,生成目标损失值;根据目标损失值,对预测模型中的模型参数进行调整。In a possible implementation of the embodiment of the present disclosure, the adjustment module is configured to: generate a first loss value based on the difference in confidence distribution of characters between the sample text and the output text; determine at least one sample for masking the first position of the character in the sample text; generate a second loss value based on the difference between the sample character at the first position in the sample text and the output character at the first position in the output text; generate a second loss value based on the first loss value and The second loss value generates a target loss value; based on the target loss value, the model parameters in the prediction model are adjusted.
在本公开实施例的一种可能的实现方式中,输入模块,用于:获取设定字典;判断掩码后的样本文本中的各字符是否位于设定字典中;在掩码后的样本文本中存在未位于设定字典中的第一字符的情况下,将掩码后的样本文本中的第一字符替换为特定字符;将替换后的样本文本输入至预测模型,以采用预测模型对替换后的样本文本进行字符预测,得到输出文本。In a possible implementation of the embodiment of the present disclosure, the input module is used to: obtain a setting dictionary; determine whether each character in the masked sample text is in the setting dictionary; If there is a first character that is not in the set dictionary, replace the first character in the masked sample text with a specific character; input the replaced sample text into the prediction model to use the prediction model to replace Perform character prediction on the final sample text to obtain the output text.
在本公开实施例的一种可能的实现方式中,调整模块,用于:确定第一字符在样本文本中的第二位置;根据特定字符和输出文本中位于第二位置处的输出字符之间的差异,生成第三损失值;根据第一损失值的第一权重和第二损失值的第二权重,将第一损失值和第二损失值进行加权,以得到第四损失值,其中,第二权重大于第一权重;根据第四损失值和第三损失值之间的差异,生成目标损失值。In a possible implementation of the embodiment of the present disclosure, the adjustment module is configured to: determine the second position of the first character in the sample text; according to the difference between the specific character and the output character located at the second position in the output text The difference of , generates a third loss value; according to the first weight of the first loss value and the second weight of the second loss value, the first loss value and the second loss value are weighted to obtain the fourth loss value, where, The second weight is greater than the first weight; a target loss value is generated based on the difference between the fourth loss value and the third loss value.
在本公开实施例的一种可能的实现方式中,该基于RPA和AI的文本纠错装置1700,还可以包括:In a possible implementation of the embodiment of the present disclosure, the text error correction device 1700 based on RPA and AI may also include:
第一处理模块,用于:基于设定编码算法,对至少一个替换字符进行编码,以得到至少一个替换字符的第一编码值;基于设定编码算法,对待处理字符进行编码,以得到待处理字符的第二编码值;根据至少一个替换字符的第一编码值和第二编码值之间的差异,确定至少一个替换字符与待处理字符的相似度。The first processing module is used to: encode at least one replacement character based on the set encoding algorithm to obtain the first encoding value of the at least one replacement character; encode the character to be processed based on the set encoding algorithm to obtain the to-be-processed character. The second encoding value of the character; determine the similarity between the at least one replacement character and the character to be processed based on the difference between the first encoding value and the second encoding value of the at least one replacement character.
在本公开实施例的一种可能的实现方式中,该基于RPA和AI的文本纠错装置1700,还可以包括:In a possible implementation of the embodiment of the present disclosure, the text error correction device 1700 based on RPA and AI may also include:
第二处理模块,用于:针对每个替换字符,根据替换字符进行绘制,得到第一图像;根据待处理字符进行绘制,得到第二图像;根据第一图像和第二图像之间的相似度,确定替换字符与待处理字符之间的相似度。The second processing module is used to: for each replacement character, draw according to the replacement character to obtain the first image; draw according to the character to be processed to obtain the second image; and draw based on the similarity between the first image and the second image , determine the similarity between the replacement character and the character to be processed.
在本公开实施例的一种可能的实现方式中,该基于RPA和AI的文本纠错装置1700,还可以包括:In a possible implementation of the embodiment of the present disclosure, the text error correction device 1700 based on RPA and AI may also include:
第三处理模块,用于:对至少一个替换字符进行特征提取,得到至少一个替换字符的特征向量;对待处理字符进行特征提取,得到待处理字符的特征向量;根据至少一个替换字符的特征向量和待处理字符的特征向量之间的相似度,确定至少一个替换字符与待处理字符的相似度。The third processing module is used to: perform feature extraction on at least one replacement character to obtain a feature vector of at least one replacement character; perform feature extraction on the character to be processed to obtain a feature vector of the character to be processed; and based on the sum of the feature vectors of at least one replacement character and The similarity between the feature vectors of the characters to be processed determines the similarity between at least one replacement character and the character to be processed.
在本公开实施例的一种可能的实现方式中,识别模块1710,用于:采用OCR模型中的特征提取分支,对待识别图像进行特征提取,以得到第一特征图;采用OCR模型中的融合分支,将第一特征图与位置图进行融合,以得到第二特征图,其中,位置图中各元素与第一特征图中各元素一一对应,位置图中的元素,用于指示第一特征图中对应元素在待识别图像中的坐标;采用OCR模型中的特征变换分支,将第二特征图进行特征变换,得到第三特征图;采用OCR模型中的预测分支,对第三特征图进行解码,以得到预测文本和预测文本中各预测字符对应的置信度。In a possible implementation of the embodiment of the present disclosure, the recognition module 1710 is used to: use the feature extraction branch in the OCR model to extract features of the image to be recognized to obtain the first feature map; use the fusion in the OCR model Branch, fuse the first feature map and the position map to obtain the second feature map, where each element in the position map corresponds one-to-one to each element in the first feature map, and the elements in the position map are used to indicate the first The coordinates of the corresponding elements in the feature map in the image to be recognized; use the feature transformation branch in the OCR model to perform feature transformation on the second feature map to obtain the third feature map; use the prediction branch in the OCR model to transform the third feature map Decode to obtain the predicted text and the confidence corresponding to each predicted character in the predicted text.
在本公开实施例的一种可能的实现方式中,该基于RPA和AI的文本纠错装置1700,还可以包括:In a possible implementation of the embodiment of the present disclosure, the text error correction device 1700 based on RPA and AI may also include:
第四处理模块,用于对待识别图像进行角度预测,确定待识别图像的倾斜角度;根据倾斜角度,对待识别图像进行旋转处理。The fourth processing module is used to predict the angle of the image to be recognized, determine the tilt angle of the image to be recognized, and perform rotation processing on the image to be recognized based on the tilt angle.
在本公开实施例的一种可能的实现方式中,确定模块1720,用于:将置信度高于置信度阈值的预测字符,作为待处理字符;或者,将置信度最大的预测字符,作为待处理字符;或者,将各预测字符按照置信度的取值由大至小排序,选取排序在前的目标个数的预测字符,作为待处理字符,其中,目标个数的取值与预测文本的长度正相关。In a possible implementation of the embodiment of the present disclosure, the determination module 1720 is configured to: use predicted characters with a confidence level higher than a confidence threshold as characters to be processed; or, use predicted characters with the highest confidence level as characters to be processed. Process characters; or, sort each predicted character according to the confidence value from large to small, and select the target number of predicted characters ranked first as the characters to be processed, where the value of the target number is the same as the predicted text. Length is positively related.
本公开实施例的基于RPA和AI的文本纠错装置,通过基于OCR模型,对待识别图像进行字符识别,以得到预测文本以及预测文本中各预测字符的置信度;根据各预测字符的置信度,从各预测字符中确定待处理字符;将预测文本中的待处理字符进行掩码,并采用预测模型对掩码后的预测文本进行字符预测,以得到待处理字符对应的至少一个替换字符;根据至少一个替换字符与待处理字符的相似度,从至少一个替换字符中确定目标字符,并利用目标字符替换预测文本中的待处理字符,以得到识别文本。由此,在基于OCR技术,识别得到文本信息后,对文本信息中的字符进行纠正,可以提升文本信息识别结果的准确性和可靠性。此外,无需人工对文本信息中的字符进行纠正,可以释放人力资源,降低人工成本,提升该方法的适用性。The text error correction device based on RPA and AI in the embodiment of the present disclosure performs character recognition on the image to be recognized based on the OCR model to obtain the predicted text and the confidence of each predicted character in the predicted text; according to the confidence of each predicted character, Determine the characters to be processed from each predicted character; mask the characters to be processed in the predicted text, and use a prediction model to predict the characters of the masked predicted text to obtain at least one replacement character corresponding to the character to be processed; according to The similarity between at least one replacement character and the to-be-processed character is determined, the target character is determined from the at least one replacement character, and the target character is used to replace the to-be-processed character in the predicted text to obtain the recognized text. Therefore, after the text information is recognized based on OCR technology, the characters in the text information are corrected, which can improve the accuracy and reliability of the text information recognition results. In addition, there is no need to manually correct characters in text information, which can free up human resources, reduce labor costs, and improve the applicability of this method.
与上述图16实施例提供的训练方法相对应,本公开还提供一种训练装置,由于本公开实施例提供的训练装置与上述图16实施例提供的训练方法相对应,因此在训练方法的实施方式也适用于本公开实施例提供的训练装置,在本公开实施例中不再详细描述。Corresponding to the training method provided by the above-mentioned embodiment of FIG. 16, the present disclosure also provides a training device. Since the training device provided by the embodiment of the present disclosure corresponds to the training method provided by the above-mentioned embodiment of FIG. 16, the implementation of the training method The method is also applicable to the training device provided in the embodiment of the present disclosure, and will not be described in detail in the embodiment of the present disclosure.
图18为本公开实施例提供的一种训练装置的结构示意图。Figure 18 is a schematic structural diagram of a training device provided by an embodiment of the present disclosure.
如图18所示,该训练装置1800,可以包括:获取模块1810、掩码模块1820、输入模块1830以及调整模块1840。As shown in Figure 18, the training device 1800 may include: an acquisition module 1810, a mask module 1820, an input module 1830, and an adjustment module 1840.
其中,获取模块1810,用于获取样本文本。Among them, the acquisition module 1810 is used to acquire sample text.
掩码模块1820,用于对样本文本中的至少一个样本字符进行掩码,得到掩码后的样本文本。The masking module 1820 is used to mask at least one sample character in the sample text to obtain a masked sample text.
输入模块1830,用于将掩码后的样本文本输入至初始的预测模型,以采用预测模型对掩码后的样本文本进行字符预测,得到输出文本。The input module 1830 is used to input the masked sample text into the initial prediction model, so as to use the prediction model to perform character prediction on the masked sample text to obtain output text.
调整模块1840,用于根据样本文本和输出文本之间的差异,对预测模型中的模型参数进行调整。The adjustment module 1840 is used to adjust the model parameters in the prediction model based on the difference between the sample text and the output text.
在本公开实施例的一种可能的实现方式中,调整模块1840,用于:根据样本文本和输出文本之间的字符的置信度分布差异,生成第一损失值;确定进行掩码的至少一个样本字符在样本文本中的第一位置;根据样本文本中的第一位置处的样本字符和输出文本中第一位置处的输出字符之间的差异,生成第二损失值;根据第一损失值和第二损失值,生成目标损失值;根据目标损失值,对预测模型中的模型参数进行调整。In a possible implementation of the embodiment of the present disclosure, the adjustment module 1840 is configured to: generate a first loss value based on the difference in confidence distribution of characters between the sample text and the output text; determine at least one masking method the first position of the sample character in the sample text; generating a second loss value based on the difference between the sample character at the first position in the sample text and the output character at the first position in the output text; based on the first loss value and the second loss value to generate a target loss value; adjust the model parameters in the prediction model according to the target loss value.
在本公开实施例的一种可能的实现方式中,输入模块1830,用于:获取设定字典;判断掩码后的样本文本中的各字符是否位于设定字典中;在掩码后的样本文本中存在未位于设定字典中的第一字符的情况下,将掩码后的样本文本中的第一字符替换为特定字符;将替换后的样本文本输入至预测模型,以采用预测模型对替换后的样本文本进行字符预测,得到输出文本。In a possible implementation of the embodiment of the present disclosure, the input module 1830 is used to: obtain a setting dictionary; determine whether each character in the masked sample text is located in the setting dictionary; If there is a first character in the text that is not in the set dictionary, replace the first character in the masked sample text with a specific character; input the replaced sample text into the prediction model to use the prediction model to predict the text. Character prediction is performed on the replaced sample text to obtain the output text.
在本公开实施例的一种可能的实现方式中,调整模块1840,用于:确定第一字符在样本文本中的第二位置;根据特定字符和输出文本中位于第二位置处的输出字符之间的差异,生成第三损失值;根据第一损失值的第一权重和第二损失值的第二权重,将第一损失值和第二损失值进行加权,以得到第四损失值,其中,第二权重大于第一权重;根据第四损失值和第三损失值之间的差异,生成目标损失值。In a possible implementation of the embodiment of the present disclosure, the adjustment module 1840 is configured to: determine the second position of the first character in the sample text; The difference between , generates a third loss value; according to the first weight of the first loss value and the second weight of the second loss value, the first loss value and the second loss value are weighted to obtain the fourth loss value, where , the second weight is greater than the first weight; a target loss value is generated based on the difference between the fourth loss value and the third loss value.
本公开实施例的基于RPA和AI的文本纠错装置,通过获取样本文本;对样本文本中的至少一个样本字符进行掩码,得到掩码后的样本文本;将掩码后的样本文本输入至初始的预测模型,以采用预测模型对掩码后的样本文本进行字符预测,得到输出文本;根据样本文本和输出文本之间的差异,对预测模型中的模型参数进行调整。由此,通过预先对预测模型进行训练,可以提升预测模型的预测效果。The text error correction device based on RPA and AI in the embodiment of the present disclosure obtains sample text; masks at least one sample character in the sample text to obtain a masked sample text; and inputs the masked sample text into The initial prediction model uses the prediction model to predict the characters of the masked sample text to obtain the output text; based on the difference between the sample text and the output text, the model parameters in the prediction model are adjusted. Therefore, by training the prediction model in advance, the prediction effect of the prediction model can be improved.
本公开实施例还提出一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时,实现如前述任一方法实施例所述的基于RPA和AI的文本纠错方法,或者,实现如前述任一方法实施例所述的训练方法。An embodiment of the present disclosure also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, any one of the foregoing methods is implemented. The text error correction method based on RPA and AI as described in the example, or implement the training method as described in any of the foregoing method embodiments.
本公开实施例还提出一种非临时性计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如前述任一方法实施例所述的基于RPA和AI的文本纠错方法,或者,实现如前述任一方法实施例所述的训练方法。Embodiments of the present disclosure also provide a non-transitory computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the text correction based on RPA and AI as described in any of the foregoing method embodiments is implemented. wrong method, or implement the training method as described in any of the foregoing method embodiments.
本公开实施例还提出一种计算机程序产品,当所述计算机程序产品中的指令处理器执行时,实现如前述任一方法实施例所述的基于RPA和AI的文本纠错方法,或者,实现如前述任一方法实施例所述的训练方法。Embodiments of the present disclosure also provide a computer program product. When the instruction processor in the computer program product is executed, the text error correction method based on RPA and AI as described in any of the foregoing method embodiments is implemented, or, the text error correction method based on RPA and AI is implemented. The training method as described in any of the aforementioned method embodiments.
图19示出了适于用来实现本公开实施方式的示例性电子设备的框图。图19显示的电子设备12仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。19 illustrates a block diagram of an exemplary electronic device suitable for implementing embodiments of the present disclosure. The electronic device 12 shown in FIG. 19 is only an example and should not bring any limitations to the functions and scope of use of the embodiments of the present disclosure.
如图19所示,电子设备12以通用计算设备的形式表现。电子设备12的组件可以包括但不限于:一个或者多个处理器或者处理单元16,系统存储器28,连接不同系统组件(包括存储器28和处理单元16)的总线18。As shown in Figure 19, electronic device 12 is embodied in the form of a general computing device. Components of the electronic device 12 may include, but are not limited to: one or more processors or processing units 16, system memory 28, and a bus 18 connecting various system components (including memory 28 and processing unit 16).
总线18表示几类总线结构中的一种或多种,包括存储器总线或者存储器控制器,外围总线,图形加速端口,处理器或者使用多种总线结构中的任意总线结构的局域总线。举例来说,这些体系结构包括但不限于工业标准体系结构(Industry Standard Architecture;以下简称:ISA)总线,微通道体系结构(Micro Channel Architecture;以下简称:MAC)总线,增强型ISA总线、视频电子标准协会(Video Electronics Standards Association;以下简称:VESA)局域总线以及外围组件互连(Peripheral Component Interconnection;以下简称:PCI)总线。 Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a graphics accelerated port, a processor, or a local bus using any of a variety of bus structures. For example, these architectures include but are not limited to Industry Standard Architecture (hereinafter referred to as: ISA) bus, Micro Channel Architecture (Micro Channel Architecture; hereafter referred to as: MAC) bus, enhanced ISA bus, video electronics Standards Association (Video Electronics Standards Association; hereinafter referred to as: VESA) local bus and Peripheral Component Interconnection (hereinafter referred to as: PCI) bus.
电子设备12典型地包括多种计算机系统可读介质。这些介质可以是任何能够被电子设备12访问的可用介质,包括易失性和非易失性介质,可移动的和不可移动的介质。 Electronic device 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by electronic device 12, including volatile and nonvolatile media, removable and non-removable media.
存储器28可以包括易失性存储器形式的计算机系统可读介质,例如随机存取存储器(Random Access Memory;以下简称:RAM)30和/或高速缓存存储器32。电子设备12可以进一步包括其它可移动/不可移动的、易失性/非易失性计算机系统存储介质。仅作为举例,存储系统34可以用于读写不可移动的、非易失性磁介质(图19未显示,通常称为“硬盘驱动器”)。尽管图19中未示出,可以提供用于对可移动非易失性磁盘(例如“软盘”)读写的磁盘驱动器,以及对可移动非易失性光盘(例如:光盘只读存储器(Compact Disc Read Only Memory;以下简称:CD-ROM)、数字多功能只读光盘(Digital Video Disc Read Only Memory;以下简称:DVD-ROM)或者其它光介质)读写的光盘驱动器。在这些情况下,每个驱动器可以通过一个或者多个数据介质接口与总线18相连。存储器28可以包括至少一个程序产品,该程序产品具有一组(例如至少一个)程序模块,这些程序模块被配置以执行本公开各实施例的功能。The memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (Random Access Memory; hereinafter referred to as: RAM) 30 and/or cache memory 32. Electronic device 12 may further include other removable/non-removable, volatile/non-volatile computer system storage media. By way of example only, storage system 34 may be used to read and write to non-removable, non-volatile magnetic media (not shown in Figure 19, commonly referred to as a "hard drive"). Although not shown in FIG. 19, a disk drive for reading and writing a removable non-volatile disk (e.g., a "floppy disk") and a removable non-volatile optical disk (e.g., a compact disk read-only memory) may be provided. Disc Read Only Memory (hereinafter referred to as: CD-ROM), Digital Video Disc Read Only Memory (hereinafter referred to as: DVD-ROM) or other optical media) read and write optical disc drives. In these cases, each drive may be connected to bus 18 through one or more data media interfaces. Memory 28 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of embodiments of the present disclosure.
具有一组(至少一个)程序模块42的程序/实用工具40,可以存储在例如存储器28中,这样的程序模块42包括但不限于操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。程序模块42通常执行本公开所描述的实施例中的功能和/或方法。A program/utility 40 having a set of (at least one) program modules 42, including but not limited to an operating system, one or more application programs, other program modules, and program data, may be stored, for example, in memory 28 , each of these examples or some combination may include the implementation of a network environment. Program modules 42 generally perform functions and/or methods in the embodiments described in this disclosure.
电子设备12也可以与一个或多个外部设备14(例如键盘、指向设备、显示器24等)通信,还可与一个或者多个使得用户能与该电子设备12交互的设备通信,和/或与使得该电子设备12能与一个或多个其它计算设备进行通信的任何设备(例如网卡,调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口22进行。并且,电子设备12还可以通过网络适配器20与一个或者多个网络(例如局域网(Local Area Network;以下简称:LAN),广域网(Wide Area Network;以下简称:WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器20通过总线18与电子设备12的其它模块通信。应当明白,尽管图中未示出,可以结合电子设备12使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。 Electronic device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), may also communicate with one or more devices that enable a user to interact with electronic device 12, and/or with Any device (eg, network card, modem, etc.) that enables the electronic device 12 to communicate with one or more other computing devices. This communication may occur through input/output (I/O) interface 22. Moreover, the electronic device 12 can also communicate with one or more networks (such as a local area network (Local Area Network; hereinafter referred to as: LAN), a wide area network (Wide Area Network; hereinafter referred to as: WAN)) and/or a public network, such as the Internet, through the network adapter 20 ) communication. As shown, network adapter 20 communicates with other modules of electronic device 12 via bus 18 . It should be understood that, although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives And data backup storage system, etc.
处理单元16通过运行存储在存储器28中的程序,从而执行各种功能应用以及数据处理,例如实现前述实施例中提及的方法。The processing unit 16 executes programs stored in the memory 28 to perform various functional applications and data processing, such as implementing the methods mentioned in the previous embodiments.
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本公开的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of this specification, reference to the terms "one embodiment," "some embodiments," "an example," "specific examples," or "some examples" or the like means that specific features are described in connection with the embodiment or example. , structures, materials, or features are included in at least one embodiment or example of the present disclosure. In this specification, the schematic expressions of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the specific features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, those skilled in the art may combine and combine different embodiments or examples and features of different embodiments or examples described in this specification unless they are inconsistent with each other.
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本公开的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。In addition, the terms “first” and “second” are used for descriptive purposes only and cannot be understood as indicating or implying relative importance or implicitly indicating the quantity of indicated technical features. Therefore, features defined as "first" and "second" may explicitly or implicitly include at least one of these features. In the description of the present disclosure, "plurality" means at least two, such as two, three, etc., unless otherwise expressly and specifically limited.
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现定制逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本公开的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本公开的实施例所属技术领域的技术人员所理解。Any process or method descriptions in flowcharts or otherwise described herein may be understood to represent modules, segments, or portions of code that include one or more executable instructions for implementing customized logical functions or steps of the process. , and the scope of the preferred embodiments of the present disclosure includes additional implementations in which functions may be performed out of the order shown or discussed, including in a substantially simultaneous manner or in the reverse order, depending on the functionality involved, which shall It should be understood by those skilled in the art to which embodiments of the present disclosure belong.
在流程图中表示或在此以其他方式描述的逻辑和/或步骤,例如,可以被认为是用于实现逻辑功能的可执行指令的定序列表,可以具体实现在任何计算机可读介质中,以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用,或结合这些指令执行系统、装置或设备而使用。就本说明书而言,"计算机可读介质"可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。计算机可读介质的更具体的示例(非穷尽性列表)包括以下:具有一个或多个布线的电连接部(电子装置),便携式计算机盘盒(磁装置),随机存取存储器(RAM),只读存储器(ROM),可擦除可编辑只读存储器(EPROM或闪速存储器),光纤装置,以及便携式光盘只读存储器(CDROM)。另外,计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质,因为可以例如通过对纸或其他介质进行光学扫描,接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得 所述程序,然后将其存储在计算机存储器中。The logic and/or steps represented in the flowcharts or otherwise described herein, for example, may be considered a sequenced list of executable instructions for implementing the logical functions, and may be embodied in any computer-readable medium, For use by, or in combination with, instruction execution systems, devices or devices (such as computer-based systems, systems including processors or other systems that can fetch instructions from and execute instructions from the instruction execution system, device or device) or equipment. For the purposes of this specification, a "computer-readable medium" may be any device that can contain, store, communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. More specific examples (non-exhaustive list) of computer readable media include the following: electrical connections with one or more wires (electronic device), portable computer disk cartridges (magnetic device), random access memory (RAM), Read-only memory (ROM), erasable and programmable read-only memory (EPROM or flash memory), fiber optic devices, and portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium may even be paper or other suitable medium on which the program may be printed, as the paper or other medium may be optically scanned, for example, and subsequently edited, interpreted, or otherwise suitable as necessary. process to obtain the program electronically and then store it in computer memory.
应当理解,本公开的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。如,如果用硬件来实现和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。It should be understood that various parts of the present disclosure may be implemented in hardware, software, firmware, or combinations thereof. In the above embodiments, various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if it is implemented in hardware, as in another embodiment, it can be implemented by any one of the following technologies known in the art or their combination: discrete logic gate circuits with logic functions for implementing data signals; Logic circuits, application specific integrated circuits with suitable combinational logic gates, programmable gate arrays (PGA), field programmable gate arrays (FPGA), etc.
本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。Those of ordinary skill in the art can understand that all or part of the steps involved in implementing the methods of the above embodiments can be completed by instructing relevant hardware through a program. The program can be stored in a computer-readable storage medium. The program can be stored in a computer-readable storage medium. When executed, one of the steps of the method embodiment or a combination thereof is included.
此外,在本公开各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。In addition, each functional unit in various embodiments of the present disclosure may be integrated into one processing module, each unit may exist physically alone, or two or more units may be integrated into one module. The above integrated modules can be implemented in the form of hardware or software function modules. If the integrated module is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer-readable storage medium.
上述提到的存储介质可以是只读存储器,磁盘或光盘等。尽管上面已经示出和描述了本公开的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本公开的限制,本领域的普通技术人员在本公开的范围内可以对上述实施例进行变化、修改、替换和变型。The storage media mentioned above can be read-only memory, magnetic disks or optical disks, etc. Although the embodiments of the present disclosure have been shown and described above, it can be understood that the above-mentioned embodiments are illustrative and should not be construed as limitations of the present disclosure. Those of ordinary skill in the art can make modifications to the above-mentioned embodiments within the scope of the present disclosure. The embodiments are subject to changes, modifications, substitutions and variations.

Claims (21)

  1. 一种基于机器人流程自动化RPA和人工智能AI的文本纠错方法,包括:A text error correction method based on robotic process automation RPA and artificial intelligence AI, including:
    基于光学字符识别OCR模型,对待识别图像进行字符识别,以得到预测文本以及所述预测文本中各预测字符的置信度;Based on the optical character recognition OCR model, perform character recognition on the image to be recognized to obtain the predicted text and the confidence of each predicted character in the predicted text;
    根据各所述预测字符的置信度,从各所述预测字符中确定待处理字符;Determine the character to be processed from each of the predicted characters according to the confidence of each of the predicted characters;
    将所述预测文本中的所述待处理字符进行掩码,并采用预测模型对掩码后的预测文本进行字符预测,以得到所述待处理字符对应的至少一个替换字符;Mask the characters to be processed in the predicted text, and use a prediction model to perform character prediction on the masked predicted text to obtain at least one replacement character corresponding to the character to be processed;
    根据所述至少一个替换字符与所述待处理字符的相似度,从所述至少一个替换字符中确定目标字符,并利用所述目标字符替换所述预测文本中的所述待处理字符,以得到识别文本。According to the similarity between the at least one replacement character and the character to be processed, a target character is determined from the at least one replacement character, and the target character is used to replace the character to be processed in the predicted text to obtain Identify text.
  2. 根据权利要求1所述的方法,其中,所述方法由RPA机器人执行。The method of claim 1, wherein the method is performed by an RPA robot.
  3. 根据权利要求1或2所述的方法,其中,所述将所述预测文本中的所述待处理字符进行掩码,并采用预测模型对掩码后的预测文本进行字符预测以得到所述待处理字符对应的至少一个替换字符,包括:The method according to claim 1 or 2, wherein the characters to be processed in the predicted text are masked, and a prediction model is used to perform character prediction on the masked predicted text to obtain the characters to be processed. Process at least one replacement character corresponding to the character, including:
    确定所述待处理字符在所述预测文本中的目标位置;Determine the target position of the character to be processed in the predicted text;
    获取掩码字符,并利用所述掩码字符替换所述预测文本中所述目标位置处的所述待处理字符,以得到掩码后的预测文本;Obtain a masked character and replace the character to be processed at the target position in the predicted text with the masked character to obtain a masked predicted text;
    将所述掩码后的预测文本输入至所述预测模型,以采用所述预测模型对所述掩码后的预测文本进行字符预测,以得到至少一个目标文本;Input the masked predicted text into the prediction model to use the prediction model to perform character prediction on the masked predicted text to obtain at least one target text;
    将所述至少一个目标文本中所述目标位置处的字符,作为所述至少一个替换字符。The character at the target position in the at least one target text is used as the at least one replacement character.
  4. 根据权利要求1至3中任一项所述的方法,其中,所述预测模型通过以下步骤训练得到:The method according to any one of claims 1 to 3, wherein the prediction model is trained through the following steps:
    获取样本文本;Get sample text;
    对所述样本文本中的至少一个样本字符进行掩码,得到掩码后的样本文本;Mask at least one sample character in the sample text to obtain a masked sample text;
    将所述掩码后的样本文本输入至初始的预测模型,以采用所述预测模型对所述掩码后的样本文本进行字符预测,得到输出文本;Input the masked sample text into an initial prediction model to use the prediction model to perform character prediction on the masked sample text to obtain output text;
    根据所述样本文本和所述输出文本之间的差异,对所述预测模型中的模型参数进行调整。Model parameters in the prediction model are adjusted based on the difference between the sample text and the output text.
  5. 根据权利要求1-4中任一项所述的方法,其中,所述根据所述至少一个替换字符与所述待处理字符的相似度,从所述至少一个替换字符中确定目标字符之前,所述方法还包括:The method according to any one of claims 1-4, wherein before determining the target character from the at least one replacement character according to the similarity between the at least one replacement character and the character to be processed, the The above methods also include:
    基于设定编码算法,对所述至少一个替换字符进行编码,以得到所述至少一个替换字符的第一编码值;Encoding the at least one replacement character based on a set encoding algorithm to obtain a first encoding value of the at least one replacement character;
    基于所述设定编码算法,对所述待处理字符进行编码,以得到所述待处理字符的第二编码值;Based on the set encoding algorithm, encode the character to be processed to obtain a second encoding value of the character to be processed;
    根据所述至少一个替换字符的第一编码值和所述第二编码值之间的差异,确定所述至少一个替换字符与所述待处理字符的相似度。The similarity between the at least one replacement character and the character to be processed is determined based on the difference between the first encoding value and the second encoding value of the at least one replacement character.
  6. 根据权利要求1-4中任一项所述的方法,其中,所述根据所述至少一个替换字符与所述待处理字符的相似度,从所述至少一个替换字符中确定目标字符之前,所述方法还包括:The method according to any one of claims 1-4, wherein before determining the target character from the at least one replacement character according to the similarity between the at least one replacement character and the character to be processed, the The above methods also include:
    针对每个所述替换字符,根据所述替换字符进行绘制,得到第一图像;For each replacement character, draw according to the replacement character to obtain a first image;
    根据所述待处理字符进行绘制,得到第二图像;Draw according to the character to be processed to obtain a second image;
    根据所述第一图像和所述第二图像之间的相似度,确定所述替换字符与所述待处理字符之间的相似度。The similarity between the replacement character and the character to be processed is determined based on the similarity between the first image and the second image.
  7. 根据权利要求1-4中任一项所述的方法,其中,所述根据所述至少一个替换字符与所述待处理字符的相似度,从所述至少一个替换字符中确定目标字符之前,所述方法还包括:The method according to any one of claims 1-4, wherein before determining the target character from the at least one replacement character according to the similarity between the at least one replacement character and the character to be processed, the The above methods also include:
    对所述至少一个替换字符进行特征提取,得到所述至少一个替换字符的特征向量;Perform feature extraction on the at least one replacement character to obtain a feature vector of the at least one replacement character;
    对所述待处理字符进行特征提取,得到所述待处理字符的特征向量;Perform feature extraction on the characters to be processed to obtain the feature vectors of the characters to be processed;
    根据所述至少一个替换字符的特征向量和所述待处理字符的特征向量之间的相似度,确定所述至少一个替换字符与所述待处理字符的相似度。The similarity between the at least one replacement character and the character to be processed is determined based on the similarity between the feature vector of the at least one replacement character and the feature vector of the character to be processed.
  8. 根据权利要求1-7中任一项所述的方法,其中,所述基于光学字符识别OCR模型,对待识别图像进行字符识别,包括:The method according to any one of claims 1 to 7, wherein the character recognition of the image to be recognized based on the optical character recognition OCR model includes:
    采用所述OCR模型中的特征提取分支,对所述待识别图像进行特征提取,以得到第一特征图;Using the feature extraction branch in the OCR model, perform feature extraction on the image to be recognized to obtain a first feature map;
    采用所述OCR模型中的融合分支,将所述第一特征图与位置图进行融合,以得到第二特征图,其中,所述位置图中各元素与所述第一特征图中各元素一一对应,所述位置图中的元素,用于指示所述第一特征图中对应元素在所述待识别图像中的坐标;Using the fusion branch in the OCR model, the first feature map and the location map are fused to obtain a second feature map, where each element in the location map is the same as each element in the first feature map. One correspondence, the elements in the position map are used to indicate the coordinates of the corresponding elements in the first feature map in the image to be recognized;
    采用所述OCR模型中的特征变换分支,将所述第二特征图进行特征变换,得到第三特征图;Use the feature transformation branch in the OCR model to perform feature transformation on the second feature map to obtain a third feature map;
    采用所述OCR模型中的预测分支,对所述第三特征图进行解码,以得到所述预测文本和所述预测文本中各所述预测字符对应的置信度。The prediction branch in the OCR model is used to decode the third feature map to obtain the predicted text and the confidence corresponding to each predicted character in the predicted text.
  9. 根据权利要求8所述的方法,其中,所述采用所述OCR模型中的特征提取分支,对所述待识别图像进行特征提取,以得到第一特征图之前,所述方法还包括:The method according to claim 8, wherein before using the feature extraction branch in the OCR model to perform feature extraction on the image to be recognized to obtain the first feature map, the method further includes:
    对所述待识别图像进行角度预测,确定所述待识别图像的倾斜角度;Perform angle prediction on the image to be recognized and determine the tilt angle of the image to be recognized;
    根据所述倾斜角度,对所述待识别图像进行旋转处理。The image to be recognized is rotated according to the tilt angle.
  10. 根据权利要求1-9中任一项所述的方法,其中,所述根据各所述预测字符的置信度,从各所述预测字符中确定待处理字符,包括:The method according to any one of claims 1 to 9, wherein determining the characters to be processed from each of the predicted characters according to the confidence of each of the predicted characters includes:
    将置信度高于置信度阈值的预测字符,作为所述待处理字符;Use predicted characters with a confidence level higher than the confidence threshold as the characters to be processed;
    或者,or,
    将置信度最大的预测字符,作为所述待处理字符;Use the predicted character with the highest confidence as the character to be processed;
    或者,or,
    将各所述预测字符按照置信度的取值由大至小排序,选取排序在前的目标个数的预测字符,作为所述待处理字符,其中,所述目标个数的取值与所述预测文本的长度正相关。The predicted characters are sorted from large to small according to the value of the confidence level, and the predicted characters with the first target number are selected as the characters to be processed, wherein the value of the target number is the same as the value of the target number. The length of the predicted text is positively correlated.
  11. 一种用于对权利要求1-10中任一项中所述的预测模型进行训练的方法,包括:A method for training the prediction model described in any one of claims 1-10, comprising:
    获取样本文本;Get sample text;
    对所述样本文本中的至少一个样本字符进行掩码,得到掩码后的样本文本;Mask at least one sample character in the sample text to obtain a masked sample text;
    将所述掩码后的样本文本输入至初始的预测模型,以采用所述预测模型对所述掩码后的样本文本进行字符预测,得到输出文本;Input the masked sample text into an initial prediction model to use the prediction model to perform character prediction on the masked sample text to obtain output text;
    根据所述样本文本和所述输出文本之间的差异,对所述预测模型中的模型参数进行调整。Model parameters in the prediction model are adjusted based on the difference between the sample text and the output text.
  12. 根据权利要求11所述的方法,其中,所述根据所述样本文本和所述输出文本之间的差异,对所述预测模型中的模型参数进行调整,包括:The method of claim 11, wherein adjusting model parameters in the prediction model based on the difference between the sample text and the output text includes:
    根据所述样本文本和所述输出文本之间的字符的置信度分布差异,生成第一损失值;Generate a first loss value based on the difference in confidence distribution of characters between the sample text and the output text;
    确定进行掩码的所述至少一个样本字符在所述样本文本中的第一位置;Determine the first position of the at least one sample character to be masked in the sample text;
    根据所述样本文本中的所述第一位置处的样本字符和所述输出文本中所述第一位置处的输出字符之间的差异,生成第二损失值;generating a second loss value based on the difference between the sample character at the first position in the sample text and the output character at the first position in the output text;
    根据所述第一损失值和所述第二损失值,生成目标损失值;Generate a target loss value according to the first loss value and the second loss value;
    根据所述目标损失值,对所述预测模型中的模型参数进行调整。According to the target loss value, the model parameters in the prediction model are adjusted.
  13. 根据权利要求11或12所述的方法,其中,所述将所述掩码后的样本文本输入至初始的预测模型,以采用所述预测模型对所述掩码后的样本文本进行字符预测,得到输出文本,包括:The method according to claim 11 or 12, wherein the masked sample text is input into an initial prediction model to use the prediction model to perform character prediction on the masked sample text, Get the output text, including:
    获取设定字典;Get the settings dictionary;
    判断所述掩码后的样本文本中的各字符是否位于所述设定字典中;Determine whether each character in the masked sample text is located in the set dictionary;
    在所述掩码后的样本文本中存在未位于所述设定字典中的第一字符的情况下,将所述掩码后的样本文本中的所述第一字符替换为特定字符;If there is a first character in the masked sample text that is not in the set dictionary, replace the first character in the masked sample text with a specific character;
    将替换后的样本文本输入至所述预测模型,以采用所述预测模型对替换后的样本文本进行字符预测,得到所述输出文本。The replaced sample text is input into the prediction model, so that the prediction model is used to perform character prediction on the replaced sample text to obtain the output text.
  14. 根据权利要求13所述的方法,其中,所述根据所述第一损失值和所述第二损失值,生成目标损失值,包括:The method according to claim 13, wherein generating a target loss value according to the first loss value and the second loss value includes:
    确定所述第一字符在所述样本文本中的第二位置;Determine the second position of the first character in the sample text;
    根据所述特定字符和所述输出文本中位于所述第二位置处的输出字符之间的差异,生成第三损失值;generating a third loss value based on the difference between the specific character and the output character located at the second position in the output text;
    根据所述第一损失值的第一权重和所述第二损失值的第二权重,将所述第一损失值和所述第二损失 值进行加权,以得到第四损失值,其中,所述第二权重大于所述第一权重;The first loss value and the second loss value are weighted according to the first weight of the first loss value and the second weight of the second loss value to obtain a fourth loss value, where The second weight is greater than the first weight;
    根据所述第四损失值和所述第三损失值之间的差异,生成所述目标损失值。The target loss value is generated based on the difference between the fourth loss value and the third loss value.
  15. 一种基于机器人流程自动化RPA和人工智能AI的文本纠错装置,包括:A text error correction device based on robotic process automation RPA and artificial intelligence AI, including:
    识别模块,用于基于光学字符识别OCR模型,对待识别图像进行字符识别,以得到预测文本以及所述预测文本中各预测字符的置信度;A recognition module, configured to perform character recognition on the image to be recognized based on the optical character recognition OCR model, to obtain the predicted text and the confidence of each predicted character in the predicted text;
    确定模块,用于根据各所述预测字符的置信度,从各所述预测字符中确定待处理字符;A determination module, configured to determine the characters to be processed from each of the predicted characters according to the confidence of each of the predicted characters;
    掩码模块,用于将所述预测文本中的所述待处理字符进行掩码;A masking module, used to mask the characters to be processed in the predicted text;
    预测模块,用于采用预测模型对掩码后的预测文本进行字符预测,以得到所述待处理字符对应的至少一个替换字符;A prediction module, configured to use a prediction model to perform character prediction on the masked predicted text to obtain at least one replacement character corresponding to the character to be processed;
    替换模块,用于根据所述至少一个替换字符与所述待处理字符的相似度,从所述至少一个替换字符中确定目标字符,并利用所述目标字符替换所述预测文本中的所述待处理字符,以得到识别文本。a replacement module, configured to determine a target character from the at least one replacement character based on the similarity between the at least one replacement character and the to-be-processed character, and use the target character to replace the to-be-processed character in the predicted text. Process characters to get recognized text.
  16. 根据权利要求15所述的装置,其中,所述装置应用于RPA机器人。The device of claim 15, wherein the device is applied to an RPA robot.
  17. 一种用于对权利要求15或16中所述的预测模型进行训练的装置,包括:A device for training the prediction model described in claim 15 or 16, comprising:
    获取模块,用于获取样本文本;Obtain module, used to obtain sample text;
    掩码模块,用于对所述样本文本中的至少一个样本字符进行掩码,得到掩码后的样本文本;A masking module, used to mask at least one sample character in the sample text to obtain the masked sample text;
    输入模块,用于将所述掩码后的样本文本输入至初始的预测模型,以采用所述预测模型对所述掩码后的样本文本进行字符预测,得到输出文本;An input module for inputting the masked sample text into an initial prediction model, so as to use the prediction model to perform character prediction on the masked sample text to obtain output text;
    调整模块,用于根据所述样本文本和所述输出文本之间的差异,对所述预测模型中的模型参数进行调整。An adjustment module, configured to adjust model parameters in the prediction model based on the difference between the sample text and the output text.
  18. 根据权利要求17所述的装置,其中,所述调整模块,用于:The device according to claim 17, wherein the adjustment module is used for:
    根据所述样本文本和所述输出文本之间的字符的置信度分布差异,生成第一损失值;Generate a first loss value based on the difference in confidence distribution of characters between the sample text and the output text;
    确定进行掩码的所述至少一个样本字符在所述样本文本中的第一位置;Determine the first position of the at least one sample character to be masked in the sample text;
    根据所述样本文本中的所述第一位置处的样本字符和所述输出文本中所述第一位置处的输出字符之间的差异,生成第二损失值;generating a second loss value based on the difference between the sample character at the first position in the sample text and the output character at the first position in the output text;
    根据所述第一损失值和所述第二损失值,生成目标损失值;Generate a target loss value according to the first loss value and the second loss value;
    根据所述目标损失值,对所述预测模型中的模型参数进行调整。According to the target loss value, the model parameters in the prediction model are adjusted.
  19. 一种电子设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时,实现如权利要求1-10中任一项所述的方法,或者,实现如权利要求11-14中任一项所述的方法。An electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, the implementation as described in any one of claims 1-10 is achieved. method, or implement the method as described in any one of claims 11-14.
  20. 一种非临时性计算机可读存储介质,其上存储有计算机程序,其中,该计算机程序被处理器执行时实现如权利要求1-10中任一项所述的方法,或者,实现如权利要求11-14中任一项所述的方法。A non-transitory computer-readable storage medium with a computer program stored thereon, wherein the computer program implements the method as claimed in any one of claims 1-10 when executed by a processor, or implements the method as claimed in any one of claims 1-10 The method described in any one of 11-14.
  21. 一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现根据权利要求1-10中任一项所述的方法,或者,实现根据权利要求11-14中任一所述的方法。A computer program product, comprising a computer program that, when executed by a processor, implements the method according to any one of claims 1-10, or implements the method according to any one of claims 11-14 Methods.
PCT/CN2022/091292 2022-03-16 2022-05-06 Rpa and ai based text error correction method, training method and related device thereof WO2023173560A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210261001.6 2022-03-16
CN202210261001.6A CN114863429A (en) 2022-03-16 2022-03-16 Text error correction method and training method based on RPA and AI and related equipment thereof

Publications (1)

Publication Number Publication Date
WO2023173560A1 true WO2023173560A1 (en) 2023-09-21

Family

ID=82627433

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/091292 WO2023173560A1 (en) 2022-03-16 2022-05-06 Rpa and ai based text error correction method, training method and related device thereof

Country Status (2)

Country Link
CN (1) CN114863429A (en)
WO (1) WO2023173560A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117290487A (en) * 2023-10-27 2023-12-26 知学云(北京)科技股份有限公司 Automatic scrolling method based on large language model, electronic equipment and storage medium
CN117765133A (en) * 2024-02-22 2024-03-26 青岛海尔科技有限公司 Correction method and device for generated text, storage medium and electronic equipment

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115061679B (en) * 2022-08-08 2022-11-11 杭州实在智能科技有限公司 Offline RPA element picking method and system
CN115204151A (en) * 2022-09-15 2022-10-18 华东交通大学 Chinese text error correction method, system and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210166093A1 (en) * 2019-12-02 2021-06-03 UiPath, Inc. Training optical character detection and recognition models for robotic process automation
CN113095067A (en) * 2021-03-03 2021-07-09 北京邮电大学 OCR error correction method, device, electronic equipment and storage medium
CN113743415A (en) * 2021-08-05 2021-12-03 杭州远传新业科技有限公司 Method, system, electronic device and medium for identifying and correcting image text
CN113792741A (en) * 2021-09-17 2021-12-14 平安普惠企业管理有限公司 Character recognition method, device, equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210166093A1 (en) * 2019-12-02 2021-06-03 UiPath, Inc. Training optical character detection and recognition models for robotic process automation
CN113095067A (en) * 2021-03-03 2021-07-09 北京邮电大学 OCR error correction method, device, electronic equipment and storage medium
CN113743415A (en) * 2021-08-05 2021-12-03 杭州远传新业科技有限公司 Method, system, electronic device and medium for identifying and correcting image text
CN113792741A (en) * 2021-09-17 2021-12-14 平安普惠企业管理有限公司 Character recognition method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117290487A (en) * 2023-10-27 2023-12-26 知学云(北京)科技股份有限公司 Automatic scrolling method based on large language model, electronic equipment and storage medium
CN117765133A (en) * 2024-02-22 2024-03-26 青岛海尔科技有限公司 Correction method and device for generated text, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN114863429A (en) 2022-08-05

Similar Documents

Publication Publication Date Title
WO2023173560A1 (en) Rpa and ai based text error correction method, training method and related device thereof
US10558893B2 (en) Systems and methods for recognizing characters in digitized documents
CN108829683B (en) Hybrid label learning neural network model and training method and device thereof
CN111737991B (en) Text sentence breaking position identification method and system, electronic equipment and storage medium
Wang et al. Stroke constrained attention network for online handwritten mathematical expression recognition
WO2023134402A1 (en) Calligraphy character recognition method based on siamese convolutional neural network
CN116127953B (en) Chinese spelling error correction method, device and medium based on contrast learning
CN114255159A (en) Handwritten text image generation method and device, electronic equipment and storage medium
CN111046771A (en) Training method of network model for recovering writing track
CN112016638A (en) Method, device and equipment for identifying steel bar cluster and storage medium
WO2023093525A1 (en) Model training method, chinese text error correction method, electronic device, and storage medium
CN113344826A (en) Image processing method, image processing device, electronic equipment and storage medium
Shan et al. Robust encoder-decoder learning framework towards offline handwritten mathematical expression recognition based on multi-scale deep neural network
JP7172351B2 (en) Character string recognition device and character string recognition program
CN113743101B (en) Text error correction method, apparatus, electronic device and computer storage medium
CN111008624A (en) Optical character recognition method and method for generating training sample for optical character recognition
WO2024055864A1 (en) Training method and apparatus for implementing ia classification model using rpa and ai
CN112488111B (en) Indication expression understanding method based on multi-level expression guide attention network
Al Ghamdi A novel approach to printed Arabic optical character recognition
CN108829896B (en) Reply information feedback method and device
CN110929013A (en) Image question-answer implementation method based on bottom-up entry and positioning information fusion
US11687700B1 (en) Generating a structure of a PDF-document
CN115546801A (en) Method for extracting paper image data features of test document
Sheng et al. End-to-end chinese image text recognition with attention model
CN116012656B (en) Sample image generation method and image processing model training method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22931587

Country of ref document: EP

Kind code of ref document: A1