WO2023173560A1

WO2023173560A1 - Rpa and ai based text error correction method, training method and related device thereof

Info

Publication number: WO2023173560A1
Application number: PCT/CN2022/091292
Authority: WO
Inventors: 王建周
Original assignee: 来也科技(北京)有限公司
Priority date: 2022-03-16
Filing date: 2022-05-06
Publication date: 2023-09-21
Also published as: CN114863429A

Abstract

A RPA and AI based text error correction method, comprising: on the basis of an OCR model, performing character recognition on an image to be recognized, so as to obtain a predicted text and a confidence coefficient of each predicted character in the predicted text; determining a character to be processed among the predicted characters according to the confidence coefficient of each predicted character; masking in the predicted text the character to be processed; performing character prediction on the masked predicted text by using a prediction model, so as to obtain at least one substitute character corresponding to the character to be processed; determining a target character among the at least one substitute character according to the similarity between the at least one substitute character and the character to be processed; and using the target character to substitute for the character to be processed in the predicted text so as to obtain a recognized text.

Description

Text error correction methods, training methods and related equipment based on RPA and AI

Cross-references to related applications

This application is filed based on a Chinese patent application with application number 202210261001.6 and a filing date of March 16, 2022, and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is hereby incorporated by reference into this application.

Technical field

The present disclosure relates to the fields of Artificial Intelligence (AI for short) and Robotic Process Automation (RPA for short), and in particular to a text error correction method, training method and related equipment based on RPA and AI.

Background technique

RPA uses specific "robot software" to simulate human operations on computers and automatically execute process tasks according to rules.

AI is a technical science that studies and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence.

For enterprises, it may often be necessary to identify and extract character information in images or PDF (Portable Document Format) documents, for example, to identify and extract invoice information, order information, etc. in images. In related technologies, character information in images or documents can be extracted based on Optical Character Recognition (OCR) technology.

However, limited by the size of the image, the accuracy of the OCR recognition results may not be guaranteed. For example, when the image size is small, the text "account number" in the image may be mistakenly recognized as "account number". In addition, when the image is stamped with a seal, the accuracy of the OCR recognition results cannot be guaranteed. For example, the word "car" in the image may be mistakenly recognized as "military".

Contents of the invention

The present disclosure aims to solve one of the technical problems in the related art, at least to a certain extent.

To this end, the present disclosure proposes a text error correction method, training method and related equipment based on RPA and AI to correct the characters in the text information after identifying the text information based on OCR technology, thereby improving the text Accuracy and reliability of information identification results.

The first embodiment of the present disclosure proposes a text error correction method based on RPA and AI. The method includes:

Based on the optical character recognition OCR model, perform character recognition on the image to be recognized to obtain the predicted text and the confidence of each predicted character in the predicted text;

Determine the character to be processed from each of the predicted characters according to the confidence of each of the predicted characters;

Mask the characters to be processed in the predicted text, and use a prediction model to perform character prediction on the masked predicted text to obtain at least one replacement character corresponding to the character to be processed;

According to the similarity between the at least one replacement character and the character to be processed, a target character is determined from the at least one replacement character, and the target character is used to replace the character to be processed in the predicted text to obtain Identify text.

The second embodiment of the present disclosure provides a method for training the prediction model described in the first embodiment of the present disclosure, including:

Get sample text;

Mask at least one sample character in the sample text to obtain a masked sample text;

Input the masked sample text into an initial prediction model to use the prediction model to perform character prediction on the masked sample text to obtain output text;

Model parameters in the prediction model are adjusted based on the difference between the sample text and the output text.

The third embodiment of the present disclosure proposes a text error correction device based on RPA and AI, including:

A recognition module, configured to perform character recognition on the image to be recognized based on the optical character recognition OCR model, to obtain the predicted text and the confidence of each predicted character in the predicted text;

A determination module, configured to determine the characters to be processed from each of the predicted characters according to the confidence of each of the predicted characters;

A masking module, used to mask the characters to be processed in the predicted text;

A prediction module, used to use a prediction model to perform character prediction on the masked predicted text to obtain at least one replacement character corresponding to the character to be processed;

a replacement module, configured to determine a target character from the at least one replacement character based on the similarity between the at least one replacement character and the to-be-processed character, and use the target character to replace the to-be-processed character in the predicted text. Process characters to get recognized text.

The fourth embodiment of the present disclosure provides a device for training the prediction model described in the third embodiment of the present disclosure, including:

Obtain module, used to obtain sample text;

A masking module, used to mask at least one sample character in the sample text to obtain the masked sample text;

An input module for inputting the masked sample text into an initial prediction model, so as to use the prediction model to perform character prediction on the masked sample text to obtain output text;

An adjustment module, configured to adjust model parameters in the prediction model based on the difference between the sample text and the output text.

The embodiment of the fifth aspect of the present disclosure provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, the present disclosure is implemented. The method described in the above-mentioned embodiment of the first aspect, or the method described in the above-mentioned embodiment of the second aspect of the present disclosure.

The sixth embodiment of the present disclosure provides a non-transitory computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the method described in the first embodiment of the disclosure is implemented. Alternatively, implement the method as described in the embodiment of the second aspect of the present disclosure.

The seventh embodiment of the present disclosure proposes a computer program product, including a computer program. When executed by a processor, the computer program implements the method described in the first embodiment of the present disclosure, or implements the method as described in the first aspect of the present disclosure. The method described in the embodiment of the second aspect above.

The technical solutions provided by the embodiments of the present disclosure include the following beneficial effects:

Based on the OCR model, character recognition is performed on the image to be recognized to obtain the predicted text and the confidence of each predicted character in the predicted text; based on the confidence of each predicted character, the character to be processed is determined from each predicted character; the characters in the predicted text are The characters to be processed are masked, and a prediction model is used to predict the characters of the masked predicted text to obtain at least one replacement character corresponding to the character to be processed; based on the similarity between the at least one replacement character and the character to be processed, from at least one Determine the target character among the replacement characters, and use the target character to replace the character to be processed in the predicted text to obtain the recognized text. Therefore, after the text information is recognized based on OCR technology, the characters in the text information are corrected, which can improve the accuracy and reliability of the text information recognition results. In addition, there is no need to manually correct characters in text information, which can free up human resources, reduce labor costs, and improve the applicability of this method.

Additional aspects and advantages of the disclosure will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the disclosure.

Description of the drawings

The above and/or additional aspects and advantages of the present disclosure will become apparent and readily understood from the following description of the embodiments in conjunction with the accompanying drawings, in which:

Figure 1 is a schematic diagram of the image to be recognized.

Figure 2 is a schematic diagram of the image to be recognized.

FIG. 3 is a schematic flowchart of a text error correction method based on RPA and AI provided by an embodiment of the present disclosure.

Figure 4 is a schematic flowchart of a text error correction method based on RPA and AI provided by an embodiment of the present disclosure.

Figure 5 is a schematic flowchart of a text error correction method based on RPA and AI provided by an embodiment of the present disclosure.

Figure 6 is a schematic flowchart of a text error correction method based on RPA and AI provided by an embodiment of the present disclosure.

Figure 7 is a schematic flowchart of a text error correction method based on RPA and AI provided by an embodiment of the present disclosure.

Figure 8 is a schematic flowchart of a text error correction method based on RPA and AI provided by an embodiment of the present disclosure.

Figure 9 is a schematic flowchart of a text error correction method based on RPA and AI provided by an embodiment of the present disclosure.

Figure 10 is a schematic diagram of the OCR recognition process in an embodiment of the present disclosure.

Figure 11 is a schematic flowchart of Chinese text line content recognition according to an embodiment of the present disclosure.

Figure 12 is a schematic structural diagram of a prediction model in an embodiment of the present disclosure.

Figure 13 is a schematic diagram of character similarity in an embodiment of the present disclosure.

Figure 14 is a schematic diagram of the image to be recognized.

Figure 15 is a schematic diagram of the image to be recognized.

Figure 16 is a schematic flowchart of a training method provided by an embodiment of the present disclosure.

Figure 17 is a schematic structural diagram of a text error correction device based on RPA and AI provided by an embodiment of the present disclosure.

Figure 18 is a schematic structural diagram of a training device provided by an embodiment of the present disclosure.

19 illustrates a block diagram of an exemplary electronic device suitable for implementing embodiments of the present disclosure.

Detailed ways

Embodiments of the present disclosure are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals throughout represent the same or similar elements or elements having the same or similar functions. The embodiments described below with reference to the accompanying drawings are exemplary and intended to explain the present disclosure and are not to be construed as limitations of the present disclosure.

In related technologies, although the comprehensive indicators (such as the F1 indicator, where the F1 indicator is the harmonic average of the precision rate and the recall rate) of the optical character recognition (Optical Character Recognition, OCR) model have achieved high index values, however, Due to the shortcomings of the OCR model of visual deep learning, there is still a problem of low accuracy of recognition results on some long-tail problems. Typical scenarios include the following aspects:

The first aspect is interference from stains, such as red stamps, ink marks, etc.;

Secondly, the shape is similar to characters, such as "戍" and "戌,成,五";

The third aspect is font deformation caused by image deformation.

For example, as shown in Figure 1, when the characters in the image are too small, it is easier to misidentify the word "Account" as the word "Account"; for another example, as shown in Figure 2, there is red seal interference on the image. At this time, the word "car" in the image is easily misidentified as "military".

In response to the above problems, commonly used solutions in related technologies mainly include the following:

The first is to add semantic information during the OCR decoding process to ensure that not only image features are used during decoding, but also semantic alignment with the previous character is achieved.

The second is to add multi-task prediction of semantic information. That is, characters are randomly blocked on the image through mosaic. The model finally has two prediction networks (i.e., head network). One prediction network is used to predict the characters displayed on the image, and the other prediction network is used to predict the characters blocked on the image. , whereby the model can learn semantic-based error correction information.

In view of the above method, the inventor found after many tests that in order to improve the prediction effect of the model, in a large number of synthesized image samples, the number of exposures of some long-tail characters needs to be increased. For example, some rare characters need to appear enough in the image samples. A large number of times, that is, it is necessary to add random text containing rare words to the image sample, and in order to enable the model to distinguish some similar words, multiple similar words need to be deliberately allowed to appear in one image sample, which will result in a synthetic image sample The text in lacks semantic coherence. Therefore, the first way above will not apply.

In addition, in real image samples, in order to reduce the labeling cost of the sample, the position of a single character is generally not labeled. This will result in the inability to accurately mask the character at a certain position in the real image sample. Therefore, , the second method above will not be applicable.

In response to the above problems, this disclosure proposes a text error correction method, training method and related equipment based on RPA and AI.

The text error correction method, training method and related equipment based on RPA and AI according to the embodiments of the present disclosure will be described below with reference to the accompanying drawings. Before describing the embodiments of the present disclosure in detail, in order to facilitate understanding, common technical terms are first introduced:

"RPA", the abbreviation of Robotic Process Automation, provides professional and comprehensive process automation solutions for enterprises and individuals. RPA uses specific "robot software" to simulate human operations on computers and automatically execute process tasks according to rules. That is, the RPA robot can quickly and accurately collect data from the user operation interface by simulating the user's mouse and keyboard operations, process the data based on clear logical rules, and then quickly and accurately input it into another system or interface. As a result, labor cost investment can be significantly reduced, existing office efficiency can be effectively improved, and work can be completed accurately, stably, and quickly.

"AI" is the abbreviation of Artificial Intelligence. It is a technical science that researches and develops theories, methods, technologies and application systems for simulating, extending and expanding human intelligence. AI is the study of using computers to simulate certain human thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.). It has both hardware-level technology and software-level technology. AI hardware technology generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, and big data processing; AI software technology mainly includes computer vision technology, speech recognition technology, and Natural Language Processing (NLP). technology and machine learning/deep learning, big data processing technology, knowledge graph technology and other major directions.

"OCR" refers to the process in which electronic equipment checks characters printed on paper, determines their shape by detecting dark and light patterns, and then uses character recognition methods to translate the shape into computer text; that is, for printed characters, optical This method converts the text in the paper document into a black and white dot matrix image file, and uses recognition software to convert the text in the image into a text format for further editing and processing by word processing software.

"OCR model" is a pre-trained model that has learned the correspondence between the input image and the output text.

"Image to be recognized" refers to any image that needs to be recognized. For example, the image to be recognized can be an image containing invoice information, an image containing order information, etc.

"Predicted text" refers to the text information or OCR recognition results obtained by the OCR model for character recognition of the image to be recognized.

"Confidence", also known as recognition probability or classification probability, refers to the probability value output by the OCR model.

"Prediction model" refers to a model obtained after training. The prediction model is used to predict characters of input text.

"Mask character" refers to the character used to mask text information. The mask character can be a preset character, or the mask character can also be a random character. For example, it can be selected from a set dictionary. Get a random character as a mask character.

The "set dictionary" refers to a preset dictionary. For example, the set dictionary may include common or commonly used characters. For example, taking the setting dictionary to include Chinese characters as an example, the setting dictionary can include 3900 common Chinese characters.

"Target text" refers to the text information output by the prediction model.

"Sample text" refers to the text information used to train the prediction model.

"Specific characters" refer to preset special characters. For example, the specific characters can be OOV (Out of Vocabulary, outside the vocabulary) characters, or they can also be other characters. This disclosure does not Make restrictions.

"Set encoding algorithm" refers to a preset encoding algorithm. For example, the set encoding algorithm can be a four-corner encoding algorithm, or it can also be other encoding algorithms, and this disclosure does not limit this.

"Confidence threshold" refers to a preset threshold.

In a possible implementation provided by the embodiments of the present disclosure, the present disclosure takes as an example that the text error correction method based on RPA and AI is configured in a text error correction device. The text error correction device can be applied to any device with computing power in electronic devices.

The electronic device may be a personal computer, a mobile terminal, etc. The mobile terminal is, for example, a mobile phone, a tablet computer, a personal digital assistant and other hardware devices with various operating systems.

In another possible implementation of the embodiment of the present disclosure, the text error correction method based on RPA and AI can be applied to an RPA robot, where the RPA robot can run in any electronic device with computing capabilities.

As shown in Figure 3, the text error correction method based on RPA and AI can include the following steps:

Step 101: Based on the OCR model, perform character recognition on the image to be recognized to obtain the predicted text and the confidence of each predicted character in the predicted text.

In the embodiment of the present disclosure, the image to be recognized may be directly acquired. For example, the image to be recognized containing text information may be directly acquired. Alternatively, the image to be recognized can also be obtained indirectly. For example, documents in PDF, PSD (PSD is a special format of Adobe's graphic design software Photoshop) containing text information can be obtained, and the documents containing text information can be extracted or intercepted from the above documents. The image of text information to be recognized.

Among them, the format of the image to be recognized can be JPG (or JPEG (Joint Photographic Experts Group, Joint Photographic Experts Group)), PNG (Portable Network Graphics, portable network graphics) and other image formats, and this disclosure does not limit this.

In the embodiment of the present disclosure, the OCR model can be used to perform character recognition on the image to be recognized, and the predicted text and the confidence of each predicted character in the predicted text can be obtained.

Step 102: Determine the character to be processed from each predicted character based on the confidence level of each predicted character.

In this embodiment of the present disclosure, characters to be processed may be determined from each predicted character according to the confidence level of each predicted character, where the number of characters to be processed may be at least one.

In a possible implementation of the embodiment of the present disclosure, the confidence of each predicted character can be compared with a set confidence threshold, and the predicted characters with a confidence higher than the confidence threshold can be used as characters to be processed. .

In another possible implementation of the embodiment of the present disclosure, the predicted character with the highest confidence level may be used as the character to be processed.

In another possible implementation of the embodiment of the present disclosure, the predicted characters can be sorted from large to small according to the confidence value, and the target number of predicted characters ranked first are selected as the characters to be processed.

Among them, the value of the number of targets is positively related to the length of the predicted text, that is, the longer the length of the predicted text, the greater the value of the number of targets.

As an example, considering that in most error scenarios recognized by OCR, a sentence has only one wrong character, therefore, in this disclosure, the number of sentences contained in the predicted text can be determined, and the target number is determined based on the number of sentences. value. Among them, the number of targets and the number of statements are positively related, that is, the more the number of statements, the greater the value of the target number, and conversely, the fewer the number of statements, the smaller the value of the target number.

In summary, the characters to be processed can be determined from each predicted character based on different methods, which can improve the flexibility and applicability of the method.

Step 103: Mask the characters to be processed in the predicted text, and use a prediction model to perform character prediction on the masked predicted text to obtain at least one replacement character corresponding to the character to be processed.

In the embodiment of the present disclosure, the characters to be processed in the predicted text can be masked to obtain the masked predicted text, and the masked predicted text can be input into the prediction model to use the prediction model to predict the masked text. Predict text to perform character prediction and obtain at least one replacement character corresponding to the character to be processed.

Step 104: Determine the target character from the at least one replacement character based on the similarity between the at least one replacement character and the character to be processed, and use the target character to replace the character to be processed in the predicted text to obtain the recognized text.

In the embodiment of the present disclosure, the similarity between each replacement character and the character to be processed can be calculated, and the target character is determined from each replacement character based on the similarity between each replacement character and the character to be processed, and the target character is used to predict the replacement The characters to be processed in the text to get the recognized text.

Wherein, the similarity may be character similarity (or character shape similarity). That is to say, in this disclosure, it is considered that the main reasons for OCR recognition errors are: similar characters or interference between characters. Therefore, in this disclosure, the character similarity (or character shape) between each replacement character and the character to be processed can be calculated. Similarity), according to the character similarity (or character shape similarity), determine the target character from each replacement character, and use the target character to replace the character to be processed in the predicted text.

As a possible implementation method, each replacement character can be encoded based on the set encoding algorithm to obtain the first encoding value of each replacement character, and based on the set encoding algorithm, the character to be processed can be encoded to obtain the value of the character to be processed. The second encoding value, so in the present disclosure, the similarity between each replacement character and the character to be processed can be determined based on the difference between the first encoding value and the second encoding value of each replacement character.

As an example, let's take Chinese characters as illustrative examples. The encoding algorithm can be a four-corner encoding algorithm. It can be understood that the more similar the glyphs of Chinese characters are, the closer the four-corner encoding values will be. Based on the above characteristics, each replacement can be calculated. The difference between the first encoding value and the second encoding value of the character determines the similarity between each replacement character and the character to be processed based on the difference between the first encoding value and the second encoding value of each replacement character. Among them, similarity and difference have an inverse relationship, that is, the smaller the difference, the higher the similarity, and conversely, the larger the difference, the lower the similarity.

It should be noted that the above example only uses the four-corner encoding algorithm as the set encoding algorithm. In actual application, the set encoding algorithm can also be other encoding algorithms, and this disclosure does not limit this.

As another possible implementation, for each replacement character, the replacement character can be drawn according to the replacement character to obtain the first image. For example, the replacement character can be drawn on a blank image to obtain the first image, and the first image can be obtained according to the character to be processed. Draw to obtain the second image. For example, you can also draw the character to be processed on a blank image to obtain the second image. In the present disclosure, the similarity between the first image and the second image can be calculated, and the similarity between the replacement character and the character to be processed is determined based on the similarity between the first image and the second image.

Wherein, the similarity between the replacement character and the character to be processed has a positive relationship with the similarity between the first image and the second image, that is, the higher the similarity between the first image and the second image, the higher the similarity between the replacement character and the character to be processed. The higher the similarity between the character and the character to be processed, and conversely, the lower the similarity between the first image and the second image, the lower the similarity between the replacement character and the character to be processed.

As another possible implementation method, for each replacement character, feature extraction can be performed on the replacement character based on the feature extraction algorithm to obtain the feature vector corresponding to the replacement character, and feature extraction is performed on the character to be processed to obtain the character to be processed. eigenvector. In the present disclosure, the similarity between the replacement character and the character to be processed may be determined based on the similarity between the feature vector of the replacement character and the feature vector of the character to be processed.

Among them, the similarity between the replacement character and the character to be processed is positively related to the similarity between the feature vector of the replacement character and the feature vector of the character to be processed.

It should be noted that the above calculation method of similarity is only an illustrative explanation, but the present disclosure is not limited thereto, that is, the present disclosure does not limit the calculation method of similarity, as long as the similarity between each replacement character and the character to be processed can be obtained That’s it.

In the embodiment of the present disclosure, after determining the similarity between the replacement character and the character to be processed, the target character can be determined from each replacement character according to the similarity between the replacement character and the character to be processed. For example, the similarity can be The largest replacement character is used as the target character, so that the target character can be used to replace the character to be processed in the predicted text to obtain the recognized text.

For example, let's take the predicted text as "There is a peach blossom tree in the south corner of the garden" and the character to be processed is "乐". After using the mask character "开" to mask the word "乐", the masked result obtained is The coded predicted text is "There is a peach blossom tree in the southern corner of the garden." After the prediction model performs character prediction on the masked predicted text, the replacement characters obtained are "East" and "West". Since "East" and " The character similarity between "乐" and "乐" is higher than the character similarity between "西" and "乐". Therefore, "东" can be used as the target character, and the target character can be used to replace the "to-be-processed character" in the predicted text. The obtained recognition text is "There is a peach blossom tree in the southeast corner of the garden."

The text error correction method based on RPA and AI in the embodiment of the present disclosure performs character recognition on the image to be recognized based on the OCR model to obtain the predicted text and the confidence of each predicted character in the predicted text; according to the confidence of each predicted character, Determine the characters to be processed from each predicted character; mask the characters to be processed in the predicted text, and use a prediction model to predict the characters of the masked predicted text to obtain at least one replacement character corresponding to the character to be processed; according to The similarity between at least one replacement character and the to-be-processed character is determined, the target character is determined from the at least one replacement character, and the target character is used to replace the to-be-processed character in the predicted text to obtain the recognized text. Therefore, after the text information is recognized based on OCR technology, the characters in the text information are corrected, which can improve the accuracy and reliability of the text information recognition results. In addition, there is no need to manually correct characters in text information, which can free up human resources, reduce labor costs, and improve the applicability of this method.

In order to clearly explain how to obtain at least one replacement character corresponding to the character to be processed in the above embodiments of the present disclosure, the present disclosure also proposes a text error correction method based on RPA and AI.

Figure 4 is a schematic flowchart of another text error correction method based on RPA and AI provided by an embodiment of the present disclosure.

As shown in Figure 4, the text error correction method based on RPA and AI can include the following steps:

Step 201: Based on the OCR model, perform character recognition on the image to be recognized to obtain the predicted text and the confidence of each predicted character in the predicted text.

Step 202: Determine the character to be processed from each predicted character based on the confidence level of each predicted character.

The execution process of steps 201 to 202 can refer to the execution process of any embodiment of the present disclosure, and will not be described again here.

Step 203: Determine the target position of the character to be processed in the predicted text.

In the embodiment of the present disclosure, the position of the character to be processed in the predicted text can be determined, which is recorded as the target position in the present disclosure.

For example, if the predicted text is "There is a peach blossom tree in the southern corner of the garden" and the character to be processed is "乐", the target position can be: the fourth character position.

Step 204: Obtain the masked character and replace the character to be processed at the target position in the predicted text with the masked character to obtain the masked predicted text.

In the embodiment of the disclosure, the mask characters may be preset fixed characters, or the mask characters may be random characters, which the disclosure does not limit.

In the embodiment of the present disclosure, the masked character can be used to replace the character to be processed at the target position in the predicted text to obtain the masked predicted text.

Still using the above example, assuming that the mask character is "开", you can use the mask character "开" to mask the word "乐", and get the predicted text after masking as "There is a star in the south corner of the garden." Peach Blossom Tree".

Step 205: Input the masked predicted text into the prediction model, so as to use the prediction model to perform character prediction on the masked predicted text to obtain at least one target text.

In the embodiment of the present disclosure, the masked predicted text can be input to the prediction model, so that the prediction model performs character prediction on the masked predicted text to obtain at least one target text. That is to say, in this disclosure, the prediction model can use a similar method to machine translation to predict all characters in the entire text and obtain at least one target text.

Still using the above example, the target text output by the prediction model can be "There is a peach blossom tree in the southeast corner of the garden" and "There is a peach blossom tree in the southwest corner of the garden."

Step 206: Use at least one character at the target position in the target text as at least one replacement character.

In this embodiment of the present disclosure, a character at a target position in at least one target text may be used as at least one replacement character.

Still using the above example, the character at the fourth character position in each target text can be used as the replacement character. For example, the replacement characters can be "East" and "West".

Step 207: Determine the target character from the at least one replacement character based on the similarity between the at least one replacement character and the character to be processed, and use the target character to replace the character to be processed in the predicted text to obtain the recognized text.

The execution process of step 207 can be referred to the execution process of any embodiment of the present disclosure, and will not be described again.

The text error correction method based on RPA and AI in the embodiment of the present disclosure determines the target position of the character to be processed in the predicted text; obtains the mask character, and uses the mask character to replace the character to be processed at the target position in the predicted text, to obtain the masked predicted text; input the masked predicted text into the prediction model, and use the prediction model to perform character prediction on the masked predicted text to obtain at least one target text; convert at least one target text into the target The character at position, as at least one replacement character. Therefore, using deep learning technology to predict at least one replacement character can improve the accuracy and reliability of prediction results.

In order to clearly explain how the prediction model in the above embodiment is trained, the present disclosure also proposes a text error correction method.

Figure 5 is a schematic flowchart of another text error correction method based on RPA and AI provided by an embodiment of the present disclosure.

As shown in Figure 5, based on the embodiment shown in Figure 4, the text error correction method may also include the following steps:

Step 301: Obtain sample text.

In the embodiment of the present disclosure, the sample text can be obtained from the existing training set, or the sample text can also be collected online, for example, the sample text can be collected online through web crawler technology, or the sample text can also be collected offline, such as Images of paper text content can be collected, and then each character in the image can be identified through OCR technology to obtain sample text, or the sample text can also be artificially synthesized, etc. The embodiments of the present disclosure are not limited to this .

Step 302: Mask at least one sample character in the sample text to obtain a masked sample text.

In the embodiment of the present disclosure, mask characters may be used to mask at least one sample character in the sample text to obtain a masked sample text.

As an example, at least one sample character in the sample text can be replaced with a random character with a set first random probability, and/or at least one sample character in the sample text can be replaced with a set second random probability. Characters are replaced with fixed characters. The first random probability and the second random probability may be the same or different, and this disclosure does not limit this.

For example, it can be determined whether the current random probability matches the first random probability. If the current random probability matches the first random probability, then at least one sample character in the sample text is replaced with a random character, and it is determined whether the current random probability matches the second random probability. Random probability matching. If the current random probability matches the second random probability, continue to replace at least one sample character in the sample text with a fixed character. If the current random probability does not match the second random probability, no processing will be performed.

If the current random probability does not match the first random probability, continue to determine whether the current random probability matches the second random probability. If the current random probability matches the second random probability, replace at least one sample character in the sample text with a fixed character, if the current random probability does not match the second random probability, no processing will be performed.

As an example, let the second random probability and the first random probability be both 10/100=10%. Assume that the currently selected random number hits 30-40 out of 0-100, then determine the current random probability and the first random probability. and second random probability match. When masking the sample text, you can randomly select a random number and determine whether the random number is between 30 and 40. If so, determine that the current random probability matches the first random probability, and at least one sample in the sample text can be masked. Characters are replaced with random characters, and continue to select a random number to determine whether the random number is between 30 and 40. If so, determine that the current random probability matches the second random probability, and you can continue to replace at least one sample character in the sample text with Fixed characters.

Step 303: Input the masked sample text into the initial prediction model, so as to use the prediction model to perform character prediction on the masked sample text to obtain output text.

In the embodiment of the present disclosure, the masked sample text can be input to the initial prediction model, so that the prediction model performs character prediction on the masked sample text to obtain output text.

Step 304: Adjust the model parameters in the prediction model based on the difference between the sample text and the output text.

In embodiments of the present disclosure, model parameters in the prediction model can be adjusted based on the difference between the sample text and the output text.

As an example, the target loss value can be generated based on the difference between the sample text and the output text, where the target loss value has a positive relationship with the above difference, that is, the smaller the difference, the smaller the target loss value, and vice versa. , the greater the difference, the greater the target loss value.

Therefore, in the present disclosure, the prediction model can be trained according to the target loss value, that is, the model parameters in the prediction model can be adjusted. For example, the prediction model can be trained based on the target loss value to minimize the target loss value. It should be noted that the above example only uses the termination condition of model training as the minimization of the target loss value. In actual application, other termination conditions can also be set. For example, the termination condition can also be when the number of training times reaches the set number. Thresholds, etc., this disclosure does not limit this.

The text error correction method based on RPA and AI in the embodiment of the present disclosure obtains sample text; masks at least one sample character in the sample text to obtain a masked sample text; and inputs the masked sample text into The initial prediction model uses the prediction model to predict the characters of the masked sample text to obtain the output text; based on the difference between the sample text and the output text, the model parameters in the prediction model are adjusted. Therefore, by pre-training the prediction model, the prediction effect of the prediction model can be improved. That is, using the trained prediction model to predict the characters of the masked prediction text can improve the accuracy and reliability of the target text prediction results. .

In order to clearly illustrate how the prediction model is trained in any embodiment of the present disclosure, the present disclosure also proposes a text error correction method.

Figure 6 is a schematic flowchart of another text error correction method based on RPA and AI provided by an embodiment of the present disclosure.

As shown in Figure 6, based on the embodiment shown in 4, the text error correction method may also include the following steps:

Step 401: Obtain sample text.

Step 402: Mask at least one sample character in the sample text to obtain a masked sample text.

Step 403: Input the masked sample text into the initial prediction model, so as to use the prediction model to perform character prediction on the masked sample text to obtain output text.

The execution process of steps 401 to 403 can refer to the execution process of any embodiment of the present disclosure, and will not be described again here.

Step 404: Generate a first loss value based on the difference in confidence distribution of characters between the sample text and the output text.

In an embodiment of the present disclosure, the difference in confidence distribution (or probability distribution) of characters between the sample text and the output text can be determined, and the first loss value is generated based on the above confidence distribution difference (or probability distribution difference).

Among them, the first loss value has a positive relationship with the difference in confidence distribution. That is, the smaller the difference in confidence distribution, the smaller the value of the first loss value. On the contrary, the greater the difference in confidence distribution, the smaller the value of the first loss value. The bigger. For example, the first loss value may be cross-entropy loss Loss.

Step 405: Determine the first position of at least one sample character to be masked in the sample text.

In the embodiment of the present disclosure, the position of at least one sample character to be masked in the sample text can be determined, which is recorded as the first position in the present disclosure.

Step 406: Generate a second loss value based on the difference between the sample character at the first position in the sample text and the output character at the first position in the output text.

In an embodiment of the present disclosure, a difference between a sample character at a first position in the sample text and an output character at a first position in the output text may be determined, and a second loss value is generated based on the difference.

Among them, the second loss value has a positive relationship with the above-mentioned difference, that is, the smaller the difference, the smaller the value of the second loss value, and conversely, the larger the difference, the greater the value of the second loss value. For example, the second loss value can also be cross entropy Loss.

Step 407: Generate a target loss value based on the first loss value and the second loss value.

In the embodiment of the present disclosure, the target loss value may be generated based on the first loss value and the second loss value. Among them, the target loss value and the first loss value have a positive relationship, and the target loss value and the second loss value also have a positive relationship.

As an example, the first loss value and the second loss value can be weighted according to the first weight of the first loss value and the second weight of the second loss value to obtain a weighted result, and the target loss value is determined based on the weighted result, Among them, the target loss value has a positive relationship with the weighted result.

It should be noted that during the training process of the prediction model, more attention is generally paid to the loss value of the masked characters. Therefore, in the present disclosure, the second weight may be greater than the first weight.

Step 408: Adjust the model parameters in the prediction model according to the target loss value.

In the embodiment of the present disclosure, the prediction model can be trained according to the target loss value, that is, the model parameters in the prediction model can be adjusted. For example, the prediction model can be trained based on the target loss value to minimize the target loss value. It should be noted that the above example only uses the termination condition of model training as the minimization of the target loss value. In actual application, other termination conditions can also be set. For example, the termination condition can also be when the number of training times reaches the set number. Thresholds, etc., this disclosure does not limit this.

The text error correction method based on RPA and AI in the embodiment of the present disclosure can improve the prediction effect of the prediction model by pre-training the prediction model, that is, using the trained prediction model to perform character prediction on the masked prediction text. It can improve the accuracy and reliability of target text prediction results.

In a possible implementation of the embodiment of the present disclosure, in order to improve the accuracy of the model prediction results, numbers, English characters, punctuation points, and Chinese characters such as the quantifiers thousand and hundred in the sample text can be Characters are replaced with specific characters (such as OOV characters), so that the model does not need to pay attention to these specific characters. Specifically, for step 205 in the above embodiment, the setting dictionary can be obtained, and it can be determined whether each character in the masked sample text is located in the setting dictionary. If each character in the masked sample text is located in By setting the dictionary, the masked sample text can be directly input into the initial prediction model, so that the prediction model can be used to predict the characters of the masked sample text to obtain the output text. At this time, the target loss value may be the weighted result of the first loss value and the second loss value, that is, the first loss value and the second loss value may be combined according to the first weight of the first loss value and the second weight of the second loss value. The loss values are weighted to obtain the target loss value.

If there is a first character in the masked sample text that is not in the set dictionary, you can replace the first character in the masked sample text with a specific character, and input the replaced sample text into the prediction The model is used to predict characters of the replaced sample text using a prediction model to obtain the output text. At this time, the target loss value is not only obtained based on the weighted result of the first loss value and the second loss value, but also needs to be obtained based on specific characters. The above process will be described in detail below with reference to Figure 7.

FIG. 7 is a schematic flowchart of another text error correction method based on RPA and AI provided by an embodiment of the present disclosure.

As shown in Figure 7, based on the embodiment shown in Figure 6, step 407 may include the following steps:

Step 501: Determine the second position of the first character in the sample text.

In the embodiment of the present disclosure, when there is a first character in the masked sample text that is not in the set dictionary, the position of the first character in the sample text can be determined, which is denoted as the first character in the present disclosure. Two positions.

Step 502: Generate a third loss value based on the difference between the specific character and the output character located at the second position in the output text.

In embodiments of the present disclosure, the third loss value may be generated based on the difference between the specific character and the output character located at the second position in the output text. Among them, the third loss value has a positive relationship with the above-mentioned difference, that is, the smaller the difference, the smaller the value of the third loss value, and conversely, the greater the difference, the greater the value of the third loss value.

Step 503: Weight the first loss value and the second loss value according to the first weight of the first loss value and the second weight of the second loss value to obtain a fourth loss value, where the second weight is greater than the first Weights.

In the embodiment of the present disclosure, the first loss value and the second loss value can be weighted according to the first weight of the first loss value and the second weight of the second loss value to obtain a weighted result, and the weighted result is used as the fourth loss value.

Step 504: Generate a target loss value based on the difference between the fourth loss value and the third loss value.

In the embodiment of the present disclosure, the target loss value can be generated according to the difference between the fourth loss value and the third loss value, where the target loss value is positively related to the above difference, that is, the smaller the difference, the smaller the target loss value is. The smaller the value, conversely, the greater the difference, the greater the target loss value.

As an example, the third loss value can be subtracted from the fourth loss value to obtain the fifth loss value, and the fifth loss value can be amplified to obtain the target loss value. That is to say, the third loss value corresponding to the specific characters that the model does not need to pay attention to can be removed from the fourth loss value, and in order to avoid having fewer effective characters in the output text, resulting in a lower target loss value. If this happens, in this disclosure, the fifth loss value can be amplified.

FIG. 8 is a schematic flowchart of another text error correction method based on RPA and AI provided by an embodiment of the present disclosure.

As shown in Figure 8, the text error correction method based on RPA and AI can include the following steps:

Step 601: Based on the OCR model, perform character recognition on the image to be recognized to obtain the predicted text and the confidence of each predicted character in the predicted text.

Step 602: Determine the character to be processed from each predicted character based on the confidence level of each predicted character.

The execution process of steps 601 to 602 can refer to the execution process of any embodiment of the present disclosure, and will not be described again here.

Step 603: Use masked characters to replace characters to be processed in the predicted text to obtain masked predicted text.

In the embodiment of the present disclosure, masked characters can be used to replace characters to be processed in the predicted text to obtain masked predicted text.

Step 604: Input the masked predicted text into the prediction model, so as to use the prediction model to predict the masked characters in the masked predicted text to obtain at least one replacement character.

In the embodiment of the present disclosure, the masked predicted text can be input to the prediction model, so that the prediction model predicts the masked characters in the masked predicted text to obtain at least one replacement character. That is to say, in the present disclosure, the prediction model can predict only the masked characters, similar to the cloze task.

In a possible implementation of the embodiment of the present disclosure, the prediction model can be trained in the following manner: obtaining sample text. In order to distinguish it from the sample text in the above embodiment, the sample text in the above Figure 5 to Figure 7 can be Marked as the first sample text, the sample text in this embodiment can be marked as the second sample text, and at least one second sample character in the second sample text can be masked to obtain the masked second sample text , and input the masked second sample text into the initial prediction model, so that the prediction model predicts at least one masked second sample character to obtain at least one recognized character, so that in this disclosure, it can be based on at least The difference between a recognized character and at least one first sample character adjusts model parameters in the prediction model.

As an example, a target loss value can be generated based on the difference between at least one recognized character and at least one first sample character, where the target loss value has a positive relationship with the above-mentioned difference, that is, the smaller the difference, the smaller the target loss value. The smaller the value, conversely, the greater the difference, the greater the value of the target loss value. Therefore, in the present disclosure, the prediction model can be trained according to the target loss value, that is, the model parameters in the prediction model can be adjusted. For example, the prediction model can be trained based on the target loss value to minimize the target loss value. It should be noted that the above example only uses the termination condition of model training as the minimization of the target loss value. In actual application, other termination conditions can also be set. For example, the termination condition can also be when the number of training times reaches the set number. Thresholds, etc., this disclosure does not limit this.

Step 605: Determine the target character from the at least one replacement character based on the similarity between the at least one replacement character and the character to be processed, and use the target character to replace the character to be processed in the predicted text to obtain the recognized text.

The execution process of step 605 can be referred to the execution process of any embodiment of the present disclosure, and will not be described again here.

The text error correction method based on RPA and AI in the embodiment of the present disclosure uses masked characters to replace the characters to be processed in the predicted text to obtain the masked predicted text; input the masked predicted text into the prediction model , to use a prediction model to predict the masked characters in the masked prediction text, and obtain at least one replacement character. Therefore, using deep learning technology to predict at least one replacement character can improve the accuracy and reliability of prediction results. In addition, predicting replacement characters in a different manner from the embodiment shown in Figure 2 can improve the flexibility and applicability of the method.

In order to clearly explain how character recognition is performed on the image to be recognized in the above embodiments of the present disclosure, the present disclosure also proposes a text error correction method.

Figure 9 is a schematic flowchart of another text error correction method based on RPA and AI provided by an embodiment of the present disclosure.

As shown in Figure 9, the text error correction method based on RPA and AI can include the following steps:

Step 701: Use the feature extraction branch in the OCR model to extract features from the image to be recognized to obtain the first feature map.

In the embodiment of the present disclosure, the feature extraction branch in the OCR model can be used to extract features of the image to be recognized to obtain the first feature map. For example, the feature extraction branch can be a backbone network such as CNN (Convolutional Neural Network) or VIT (Vision Transformer). Features are extracted from the image to be identified through the above backbone network to obtain the first feature map.

It should be noted that the image to be recognized may be tilted, deformed or flipped to a certain extent, and the above situations will affect the reliability of subsequent model recognition results. Therefore, in a possible implementation of the embodiment of the present disclosure, in order to improve the accuracy of subsequent model recognition results, after obtaining the image to be recognized, the tilt angle of the image to be recognized can be corrected.

Specifically, angle prediction can be performed on the image to be recognized, and the tilt angle of the document image to be processed can be determined.

As an example, when the tilt angle of the image to be recognized is large, an image classification model can be used to predict the angle of the image to be recognized and determine the tilt angle of the image to be recognized.

As another example, when the tilt angle of the image to be recognized is small, angle prediction of the image to be recognized can be performed based on a corner point detection algorithm to determine the tilt angle of the image to be recognized.

It should be understood that when the image to be recognized is tilted, what needs to be detected is an irregular quadrilateral, so the conventional target detection algorithm is invalid. The key point detection algorithm can be used to detect the four corner points of the irregular quadrilateral. Then the quadrilateral is extracted based on the four corner points, so that the tilt angle of the image to be recognized can be determined based on the extracted quadrilateral.

It should be noted that in practical applications, the above two examples can also be combined to predict the tilt angle of the image to be recognized.

In the embodiment of the present disclosure, after determining the tilt angle of the image to be recognized, the image to be recognized can be rotated according to the above tilt angle, so that the angle of the image to be recognized that is tilted or flipped can be corrected. Improve the reliability of subsequent image recognition results.

Step 702: Use the fusion branch in the OCR model to fuse the first feature map and the location map to obtain a second feature map. Each element in the location map corresponds to each element in the first feature map. The location map The elements in are used to indicate the coordinates of the corresponding elements in the first feature map in the image to be recognized.

In the embodiment of the present disclosure, the position of the image to be recognized can be coded to obtain a position map, in which each element in the position map corresponds to each element in the first feature map one-to-one, and each element in the position map is used to Indicates the coordinates of the corresponding element in the first feature map in the image to be recognized.

In the embodiment of the present disclosure, the fusion branch in the OCR model can be used to fuse the first feature map and the position map to obtain the second feature map. For example, the first feature map and the corresponding position map can be spliced to obtain the second feature map. Alternatively, the first feature map and the corresponding position map can be spliced to obtain a spliced feature map, and the spliced feature map can be input into the convolution layer to fuse to obtain the second feature map.

Step 703: Use the feature transformation branch in the OCR model to perform feature transformation on the second feature map to obtain a third feature map.

In the embodiment of the present disclosure, the feature transformation branch in the OCR model can be used to perform feature transformation on the second feature map to obtain the third feature map.

Step 704: Use the prediction branch in the OCR model to decode the third feature map to obtain the predicted text and the confidence corresponding to each predicted character in the predicted text.

In the embodiment of the present disclosure, the prediction branch in the OCR model can be used to decode the third feature map to obtain the predicted text and the confidence corresponding to each predicted character in the predicted text.

Step 705: Determine the character to be processed from each predicted character based on the confidence level of each predicted character.

Step 706: Mask the characters to be processed in the predicted text, and use a prediction model to perform character prediction on the masked predicted text to obtain at least one replacement character corresponding to the character to be processed.

Step 707: Determine a target character from at least one replacement character based on the similarity between the at least one replacement character and the character to be processed, and use the target character to replace the character to be processed in the predicted text to obtain the recognized text.

In the embodiment of the present disclosure, the execution process of steps 705 to 707 can be referred to the execution process of any embodiment of the present disclosure, and will not be described again.

As an example, error correction can be used as a post-processing model that is not coupled to OCR recognition. For example, the inventor uses the OCR recognition process as shown in Figure 10 to exemplify the recognition of character information in the image. As shown in Figure 10, the OCR recognition process mainly includes: rotating the image; detecting the text rows (columns) on the image; performing character recognition on the content of the text rows (columns); and restoring the coordinate information of each character recognized. ; Output each character, the coordinates of each character and the confidence of each character.

After analysis, it can be determined that the main reason for the long-tail recognition error is the text line content recognition stage in Figure 10. Therefore, a post-processing module can be added after the stage of text line content identification and before the coordinate information restoration stage, and the above-mentioned post-processing module can be used to correct the text line content.

The model structure used in this stage of text line content recognition can be shown in Figure 11. The feature extraction branch, such as a backbone network such as CNN or VIT, can be used to extract features of the image to be recognized to obtain the first feature map; through the fusion branch, the The extracted first feature map is fused with the position map to obtain a second feature map; through the feature transformation branch, feature transformation is performed on the second feature map to obtain a third feature map. For example, the feature transformation branch may include a transformation branch (the The conversion branch is used to convert the second feature map into one-dimensional features (Reshape to 1-D) or two-dimensional features (Reshape to 2-D)), and Transformer (used to continue feature transformation on the converted features to obtain features sequence feature) and MLP (Multi Layer Perceptron, multi-layer perceptron) (used to perform feature transformation on the feature sequence to obtain the third feature map); through prediction branches, such as CTC (Connectionist temporal classification, neural network-based time series class Classification), decode the third feature map to obtain the predicted text and the confidence corresponding to each predicted character in the predicted text.

In one embodiment, in the final stage of model training, multi-task training, such as center-loss and rdrop (Regularized Dropout, a regularization method), can be turned on to improve the accuracy of model recognition.

The text error correction method based on RPA and AI in the embodiment of the present disclosure can improve the accuracy and reliability of the recognition results by using deep learning technology to perform OCR recognition on the image to be recognized.

It should be noted that in related technologies, dictionary-based search algorithms, traditional Bayesian-based machine learning algorithms, and deep learning algorithms can be used to correct text errors. In general text error correction, there are generally three types of problems that need to be solved:

The first category, replace SubStitution. For example, replace "Today I feel so happy" with "Today I feel very happy."

The second category is Insert & Delete. For example, add "Today I feel very happy" to "Today I feel very happy"; for another example, delete "Today I feel very happy" to "Today I feel very happy".

The third category is rewriting Local Paraphrasing (to a minimum extent). For example, change "I feel very happy today" to "I feel very happy today."

However, considering that the main cause of OCR recognition errors is: interference or misrecognition caused by similar words, therefore, the problems that need to be solved are mainly the first type of problems mentioned above. In addition, since error correction is a general scenario, dictionary-based solutions are not suitable and model-based solutions can be used.

If a supervised model is used to correct text errors, a large number of annotated sample pairs need to be used to train the above-mentioned supervised model before error correction. The annotated sample pairs include incorrect results and accurate results of OCR recognition. Specifically, the training process is: training the above-mentioned supervised model based on the difference between the incorrect results and the accurate results of the labeled sample alignment.

However, in the above method, in order to improve the prediction effect of the model, it is necessary to generate enough labeled sample pairs to train the model, and the labeling cost is high. Therefore, supervised models that use labeled sample pairs for training are not suitable.

As can be seen from Figure 10, the OCR recognition process includes multiple stages, each stage requires a GPU (graphics processing unit, image processing unit). In order to save expensive GPU, the post-processing module (or error correction model) can be lightweight The model can be executed on the CPU (Central Processing Unit). Therefore, lightweight unsupervised models and solutions can be used to correct OCR recognition results.

The specific implementation plan is:

Considering that the commonly used models for text error correction in related technologies are basically divided into the following two categories:

The first category is sequence tasks, such as machine translation. Simply put, it is to translate sentences with typos into accurate sentences.

The second category is to detect errors and correct them. To put it simply, it is to detect possible errors in the sentence, and if a typo is detected, the typo will be corrected.

On a single error correction task, the second type of error correction effect is better, such as the soft-masked BERT (Bidirectional Encoder Representations from Transformers, bidirectional encoder representation from the converter) model, which can achieve better correction The reason is that the additional typo detection task can effectively alleviate the false recall problem in error correction.

Therefore, in this disclosure, the above-mentioned second type of design model can be used to divide the entire error correction task into three parts:

The first part is: error word detection.

In the OCR task, the last layer of the OCR model is an N classification task (N is the number of characters recognized). From the perspective of the loss function of the classification network, if the predicted characters are more accurate, the confidence of the character in the classification task will be The degree (or softmax recognition probability) p will also be higher.

The inventor used the OCR model to test the test text in the test set and obtained: the confidence or recognition probability of each character. When the confidence threshold (or probability threshold) is gradually decreased from 1.0, the remaining characters The recognition accuracy can be improved. From this, it can be inferred that the optimal confidence threshold (or optimal probability threshold) f is the recognition accuracy of the remaining characters when it is lower than the optimal confidence threshold (or optimal probability threshold) f. The improvement is not big.

In this disclosure, in the OCR error correction scenario, the confidence level (or recognition probability) p of the character is used as a priori knowledge to detect incorrect words, without the need for an additional error detection model. Specifically, when the confidence level (or recognition probability) p of a character is higher than the optimal confidence threshold (or optimal probability threshold) f, the character is considered to be recognized correctly. On the contrary, when the confidence level (or recognition probability) of the character ( If the recognition probability) p is not higher than the optimal confidence threshold (or the optimal probability threshold) f, it is considered that the character recognition is incorrect.

When a certain character is recognized incorrectly, the character can be recalled as a corrective word.

The second part is: correct word recall.

Corrective word recall can be illustrated from offline model training and online model calling respectively.

Among them, offline model training:

Model design: After statistical analysis, in the vast majority of error scenarios recognized by OCR, a sentence has only one error character, and due to the lack of supervised annotation sample pairs, in this disclosure, MLM (Mask Language Modeling) can be used , mask language model) method to implement model training based on self-supervised training tasks.

For model design, there are two options: the first is to predict only the characters removed by the Mask, a task similar to cloze; the second is to predict all characters of the entire sentence, similar to machine translation. After experiments, it can be found that the second kind of prediction effect is better. For example, for the training corpus "There is a peach blossom tree in the southeast corner of the garden" (recorded as sample text in this disclosure), the following tasks can be designed (the left side is the model input, the right side is the model output):

There is a peach blossom tree in the south corner of [mask (fixed character)] of the garden -> there is a peach blossom tree in the southeast corner of the garden;

There is a peach blossom tree in the south corner of [random character] of the garden -> there is a peach blossom tree in the southeast corner of the garden;

In the model architecture design, in order to reduce the output delay, non-autoregressive tasks can be used, the decoding Decoder part of the BERT model is discarded, and only the encoding Encoder part is used to comprehensively predict the performance and prediction effect. In this disclosure, as shown in Figure 12, The prediction model can use the standard six-layer encoding layer TransformerEncoderLayer stack. The maximum character length of the model input is 32 bits. The model parameter head_num is selected as 6. The dimension of the character feature vector embedding is 128, that is, the prediction model (as shown on the right side of Figure 12). Multi-Head Attention) can include 6 layers of attention coding layers. The coding sequence (up to 32 bits) is input into the prediction model. The last layer of the prediction model can output a 32×128 coding vector, and finally the 6th layer can be The output features of the attention encoding layer are classified into 32 predicted characters through softmax on 32 feature vectors.

Among them, Multi-Head Attention projects Q, K, and V through multiple (e.g., h) different linear transformations, and then splices different attention results; the splicing results are scaled dot product (Scaled Dot). -Product Attention) performs attention calculation; the results of attention calculation are subjected to non-linear transformation processing through the connection layer (Concat) and the linear layer (Liner). Among them, the structure of Scaled Dot-Product Attention can be shown in the left part of Figure 12. In Scaled Dot-Product Attention, Q and K are subjected to matrix dot multiplication (MatMul), the result of the matrix multiplication is scaled (Scale) and masked (Mask), and the masked result is processed through the activation function SoftMax. Processing, the matrix dot multiplication of the processing result of the activation function and V can be expressed as the following formula:

Among them, Q, K, and V are the three matrices obtained by performing matrix operations on the input of the attention layer and the model parameters respectively. d _k represents the normalization factor, and T represents the transpose operation of the matrix. When d _k is small, the results are similar to the dot product. When d _k is large, the performance is better without scaling, but the dot product is calculated faster and the impact can be reduced after scaling.

It should be noted that the reason why the prediction model uses a model with Transformer as the basic structure is that the advantage of Transformer is to capture the correlation between long-distance characters through the multi-head self-attention mechanism MultiHeadSelfAttention in the multi-HeadAttention mechanism. For example, the multi-head attention mechanism Multi-HeadAttention can be shown in the right part of Figure 12.

Training data: On the training corpus, multiple news data can be captured from multiple data sources, and the above multiple news data can be split into more than 200 million pieces of training corpus (recorded as sample text in this disclosure). Different from the method of masking the sample text before model training, in this disclosure, the sample text can be randomly masked during the training process, and the sample text can be masked with a first random probability (such as 10%). Any character is replaced with a random character, and any character in the sample text is replaced with a Mask character (i.e., a fixed character) with a second random probability (such as 10%).

In addition, in the sample text, in order to reduce the model size, the most common 3900 Chinese characters can be used to form a set dictionary; in order to improve the accuracy of the model prediction results, the numbers, English characters, punctuation, and Characters with strong semantic universality in Chinese characters, such as the quantifiers thousand and hundred, are replaced with specific characters, such as OOV characters.

Loss function Loss design: Use cross-entropy loss function, in which the value of the loss function can be determined in the following way: Cross-entropy Loss consists of two parts, one part is the cross-entropy Loss of all predicted characters in the entire sentence (note in this disclosure is the first loss value), and part of it is the cross-entropy Loss of the masked characters (recorded as the second loss value in this disclosure). Because in the final use, more attention is paid to the second part, so the weight of the second part Loss will be more high.

During the loss calculation process, it is necessary to remove the loss predicted by OOV characters (recorded as the third loss value in this disclosure), and amplify the remaining loss proportionally according to the number of effective characters, mainly to avoid having fewer effective characters, resulting in lower loss. situation occurs.

Among them, online calling: When calling online, the characters whose confidence or recognition probability p is lower than f in the OCR recognition results can be replaced with OOV characters. If the sentence contains more than 32 characters, the beginning and end of the sentence can be Truncate to ensure that the masked characters are in the middle of the sentence. Although the OCR recognition results include all predicted characters, this disclosure only uses low-probability or low-confidence predicted characters as recalls, ignoring predicted characters in other positions, such as recalling 20 (Top20) predicted characters.

The third part is: Correcting word sorting.

As can be seen from the above, the entire recall part is a purely semantic model. Only the Top 1 recalled by the model are used for correction. The accuracy of the correction results cannot be guaranteed. For example, the input of the model is: "There is a peach blossom tree in the south corner of the garden [mask]" , the predicted characters "East and West" may be wrong no matter which one is used, so a sorting module with prior knowledge is needed to select the most accurate characters.

As mentioned before, the main cause of OCR recognition errors is similar characters or character interference, so in this disclosure, character similarity can be used as a sorting index.

Among them, character similarity can be calculated through the following three methods:

The first is the Chinese character four-corner encoding algorithm. Based on the Chinese character four-corner encoding algorithm, the encoding value of the character is calculated, and the similarity between characters is determined based on the encoding value of the character.

The second type is image similarity. You can use regular fonts, Song fonts and other fonts to fill each character on a 128*128 picture with a white background, and then calculate the similarity between the pictures as the character similarity. Alternatively, you can extract features from the pictures and extract them based on the extracted characters. Feature vector to calculate image similarity. Thus, the character similarity can be determined based on the image similarity.

The third type, OCR feature vector. The feature vector of each character can be determined based on the softmax matrix of the last layer in the OCR model, so that similarity calculation can be performed based on the feature vector of the character. The above matrix is an N×D matrix, where N is the number of characters and D is the vector dimension. It can be considered that the D-dimensional vector in each row of the matrix is the representation vector or feature vector of the corresponding character. Through the third method, the character similarity shown in Figure 13 can be obtained. Among them, the third column in Figure 13 is the character shape similarity.

After experiments, it can be seen that the third kind is more effective than the first and second kinds. The character similarity between the recalled characters and the characters to be corrected can be determined based on the character shape similarity. If the character similarity is higher than the set threshold, the character with the highest character similarity can be selected as the target character.

After the model went online, the inventor tested the model on a large-scale test set and found that on the basis of the already high F1 index, the F1 index could still be improved by more than 0.03%.

As an example, as shown in Figure 14, when detecting a text line in an image, if part of the text line character is cropped, causing "ti" to be mistakenly recognized as "pendulum", then through the above model, Normal correction can be achieved.

As another example, as shown in Figure 15, when there is red stamp interference in the image, causing "general" to be misidentified as "general", normal correction can be achieved through the above model.

The above are various embodiments corresponding to the text error correction method. The present disclosure also proposes a method for training the prediction model in any of the above method embodiments.

As shown in Figure 16, the training method may include the following steps:

Step 801: Obtain sample text.

Step 801: Mask at least one sample character in the sample text to obtain a masked sample text.

Step 802: Input the masked sample text into the initial prediction model, so as to use the prediction model to perform character prediction on the masked sample text to obtain output text.

Step 803: Adjust the model parameters in the prediction model based on the difference between the sample text and the output text.

In a possible implementation of the embodiment of the present disclosure, the first loss value can be generated based on the difference in confidence distribution of characters between the sample text and the output text; it is determined that at least one sample character to be masked is in the sample text the first position; generate a second loss value based on the difference between the sample character at the first position in the sample text and the output character at the first position in the output text; according to the first loss value and the second loss value, Generate a target loss value; adjust the model parameters in the prediction model based on the target loss value.

In a possible implementation of the embodiment of the present disclosure, the setting dictionary can be obtained; it is determined whether each character in the masked sample text is located in the setting dictionary; if there is a character in the masked sample text that is not located in the setting dictionary, When the first character in the dictionary is specified, replace the first character in the masked sample text with a specific character; use a prediction model to predict characters in the replaced sample text to obtain the output text.

In a possible implementation of the embodiment of the present disclosure, a second position of the first character in the sample text may be determined; and a third character may be generated based on the difference between the specific character and the output character located at the second position in the output text. Three loss values; according to the first weight of the first loss value and the second weight of the second loss value, the first loss value and the second loss value are weighted to obtain a fourth loss value, where the second weight is greater than the third loss value. One weight; generate a target loss value based on the difference between the fourth loss value and the third loss value.

It should be noted that the explanation of the text error correction method based on RPA and AI in any of the foregoing embodiments also applies to this embodiment, and the implementation principles are similar and will not be described again here.

The training method of the embodiment of the present disclosure obtains sample text; masks at least one sample character in the sample text to obtain the masked sample text; and inputs the masked sample text into the initial prediction model to adopt The prediction model performs character prediction on the masked sample text to obtain the output text; based on the difference between the sample text and the output text, the model parameters in the prediction model are adjusted. Therefore, by training the prediction model in advance, the prediction effect of the prediction model can be improved.

Corresponding to the text error correction method based on RPA and AI provided by the above embodiments of FIG. 3 to FIG. 9 , the present disclosure also provides a text error correction device based on RPA and AI. Since the RPA and AI based text error correction method provided by the embodiment of the present disclosure The text error correction device corresponds to the text error correction method based on RPA and AI provided by the above-mentioned embodiments of Figures 3 to 9. Therefore, the implementation of the text error correction method based on RPA and AI is also applicable to the text error correction method provided by the embodiment of the present disclosure. The text error correction device based on RPA and AI will not be described in detail in the embodiment of this disclosure.

As shown in Figure 17, the text error correction device 1700 based on RPA and AI may include: an identification module 1710, a determination module 1720, a mask module 1730, a prediction module 1740, and a replacement module 1750.

Among them, the recognition module 1710 is used to perform character recognition on the image to be recognized based on the optical character recognition OCR model, so as to obtain the predicted text and the confidence of each predicted character in the predicted text.

The determination module 1720 is used to determine the character to be processed from each predicted character according to the confidence of each predicted character.

Masking module 1730 is used to mask the characters to be processed in the predicted text.

The prediction module 1740 is configured to use a prediction model to perform character prediction on the masked predicted text to obtain at least one replacement character corresponding to the character to be processed.

The replacement module 1750 is configured to determine a target character from at least one replacement character based on the similarity between the at least one replacement character and the character to be processed, and use the target character to replace the character to be processed in the predicted text to obtain the recognized text.

In a possible implementation of the embodiment of the present disclosure, the text error correction device 1700 based on RPA and AI can be applied to RPA robots.

In a possible implementation of the embodiment of the present disclosure, the mask module 1730 is used to: determine the target position of the character to be processed in the predicted text; obtain the mask character, and use the mask character to replace the target position in the predicted text to obtain the masked predicted text.

The prediction module 1740 is used to: input the masked predicted text into the prediction model, and use the prediction model to perform character prediction on the masked predicted text to obtain at least one target text; characters as at least one replacement character.

In a possible implementation of the embodiment of the present disclosure, the prediction model is trained through the following modules:

Get module, used to get sample text.

The masking module is also used to mask at least one sample character in the sample text to obtain the masked sample text.

The input module is used to input the masked sample text into the initial prediction model, so as to use the prediction model to perform character prediction on the masked sample text to obtain the output text.

The adjustment module is used to adjust the model parameters in the prediction model based on the difference between the sample text and the output text.

In a possible implementation of the embodiment of the present disclosure, the adjustment module is configured to: generate a first loss value based on the difference in confidence distribution of characters between the sample text and the output text; determine at least one sample for masking the first position of the character in the sample text; generate a second loss value based on the difference between the sample character at the first position in the sample text and the output character at the first position in the output text; generate a second loss value based on the first loss value and The second loss value generates a target loss value; based on the target loss value, the model parameters in the prediction model are adjusted.

In a possible implementation of the embodiment of the present disclosure, the input module is used to: obtain a setting dictionary; determine whether each character in the masked sample text is in the setting dictionary; If there is a first character that is not in the set dictionary, replace the first character in the masked sample text with a specific character; input the replaced sample text into the prediction model to use the prediction model to replace Perform character prediction on the final sample text to obtain the output text.

In a possible implementation of the embodiment of the present disclosure, the adjustment module is configured to: determine the second position of the first character in the sample text; according to the difference between the specific character and the output character located at the second position in the output text The difference of , generates a third loss value; according to the first weight of the first loss value and the second weight of the second loss value, the first loss value and the second loss value are weighted to obtain the fourth loss value, where, The second weight is greater than the first weight; a target loss value is generated based on the difference between the fourth loss value and the third loss value.

In a possible implementation of the embodiment of the present disclosure, the text error correction device 1700 based on RPA and AI may also include:

The first processing module is used to: encode at least one replacement character based on the set encoding algorithm to obtain the first encoding value of the at least one replacement character; encode the character to be processed based on the set encoding algorithm to obtain the to-be-processed character. The second encoding value of the character; determine the similarity between the at least one replacement character and the character to be processed based on the difference between the first encoding value and the second encoding value of the at least one replacement character.

The second processing module is used to: for each replacement character, draw according to the replacement character to obtain the first image; draw according to the character to be processed to obtain the second image; and draw based on the similarity between the first image and the second image , determine the similarity between the replacement character and the character to be processed.

The third processing module is used to: perform feature extraction on at least one replacement character to obtain a feature vector of at least one replacement character; perform feature extraction on the character to be processed to obtain a feature vector of the character to be processed; and based on the sum of the feature vectors of at least one replacement character and The similarity between the feature vectors of the characters to be processed determines the similarity between at least one replacement character and the character to be processed.

In a possible implementation of the embodiment of the present disclosure, the recognition module 1710 is used to: use the feature extraction branch in the OCR model to extract features of the image to be recognized to obtain the first feature map; use the fusion in the OCR model Branch, fuse the first feature map and the position map to obtain the second feature map, where each element in the position map corresponds one-to-one to each element in the first feature map, and the elements in the position map are used to indicate the first The coordinates of the corresponding elements in the feature map in the image to be recognized; use the feature transformation branch in the OCR model to perform feature transformation on the second feature map to obtain the third feature map; use the prediction branch in the OCR model to transform the third feature map Decode to obtain the predicted text and the confidence corresponding to each predicted character in the predicted text.

The fourth processing module is used to predict the angle of the image to be recognized, determine the tilt angle of the image to be recognized, and perform rotation processing on the image to be recognized based on the tilt angle.

In a possible implementation of the embodiment of the present disclosure, the determination module 1720 is configured to: use predicted characters with a confidence level higher than a confidence threshold as characters to be processed; or, use predicted characters with the highest confidence level as characters to be processed. Process characters; or, sort each predicted character according to the confidence value from large to small, and select the target number of predicted characters ranked first as the characters to be processed, where the value of the target number is the same as the predicted text. Length is positively related.

The text error correction device based on RPA and AI in the embodiment of the present disclosure performs character recognition on the image to be recognized based on the OCR model to obtain the predicted text and the confidence of each predicted character in the predicted text; according to the confidence of each predicted character, Determine the characters to be processed from each predicted character; mask the characters to be processed in the predicted text, and use a prediction model to predict the characters of the masked predicted text to obtain at least one replacement character corresponding to the character to be processed; according to The similarity between at least one replacement character and the to-be-processed character is determined, the target character is determined from the at least one replacement character, and the target character is used to replace the to-be-processed character in the predicted text to obtain the recognized text. Therefore, after the text information is recognized based on OCR technology, the characters in the text information are corrected, which can improve the accuracy and reliability of the text information recognition results. In addition, there is no need to manually correct characters in text information, which can free up human resources, reduce labor costs, and improve the applicability of this method.

Corresponding to the training method provided by the above-mentioned embodiment of FIG. 16, the present disclosure also provides a training device. Since the training device provided by the embodiment of the present disclosure corresponds to the training method provided by the above-mentioned embodiment of FIG. 16, the implementation of the training method The method is also applicable to the training device provided in the embodiment of the present disclosure, and will not be described in detail in the embodiment of the present disclosure.

As shown in Figure 18, the training device 1800 may include: an acquisition module 1810, a mask module 1820, an input module 1830, and an adjustment module 1840.

Among them, the acquisition module 1810 is used to acquire sample text.

The masking module 1820 is used to mask at least one sample character in the sample text to obtain a masked sample text.

The input module 1830 is used to input the masked sample text into the initial prediction model, so as to use the prediction model to perform character prediction on the masked sample text to obtain output text.

The adjustment module 1840 is used to adjust the model parameters in the prediction model based on the difference between the sample text and the output text.

In a possible implementation of the embodiment of the present disclosure, the adjustment module 1840 is configured to: generate a first loss value based on the difference in confidence distribution of characters between the sample text and the output text; determine at least one masking method the first position of the sample character in the sample text; generating a second loss value based on the difference between the sample character at the first position in the sample text and the output character at the first position in the output text; based on the first loss value and the second loss value to generate a target loss value; adjust the model parameters in the prediction model according to the target loss value.

In a possible implementation of the embodiment of the present disclosure, the input module 1830 is used to: obtain a setting dictionary; determine whether each character in the masked sample text is located in the setting dictionary; If there is a first character in the text that is not in the set dictionary, replace the first character in the masked sample text with a specific character; input the replaced sample text into the prediction model to use the prediction model to predict the text. Character prediction is performed on the replaced sample text to obtain the output text.

In a possible implementation of the embodiment of the present disclosure, the adjustment module 1840 is configured to: determine the second position of the first character in the sample text; The difference between , generates a third loss value; according to the first weight of the first loss value and the second weight of the second loss value, the first loss value and the second loss value are weighted to obtain the fourth loss value, where , the second weight is greater than the first weight; a target loss value is generated based on the difference between the fourth loss value and the third loss value.

The text error correction device based on RPA and AI in the embodiment of the present disclosure obtains sample text; masks at least one sample character in the sample text to obtain a masked sample text; and inputs the masked sample text into The initial prediction model uses the prediction model to predict the characters of the masked sample text to obtain the output text; based on the difference between the sample text and the output text, the model parameters in the prediction model are adjusted. Therefore, by training the prediction model in advance, the prediction effect of the prediction model can be improved.

An embodiment of the present disclosure also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, any one of the foregoing methods is implemented. The text error correction method based on RPA and AI as described in the example, or implement the training method as described in any of the foregoing method embodiments.

Embodiments of the present disclosure also provide a non-transitory computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the text correction based on RPA and AI as described in any of the foregoing method embodiments is implemented. wrong method, or implement the training method as described in any of the foregoing method embodiments.

Embodiments of the present disclosure also provide a computer program product. When the instruction processor in the computer program product is executed, the text error correction method based on RPA and AI as described in any of the foregoing method embodiments is implemented, or, the text error correction method based on RPA and AI is implemented. The training method as described in any of the aforementioned method embodiments.

19 illustrates a block diagram of an exemplary electronic device suitable for implementing embodiments of the present disclosure. The electronic device 12 shown in FIG. 19 is only an example and should not bring any limitations to the functions and scope of use of the embodiments of the present disclosure.

As shown in Figure 19, electronic device 12 is embodied in the form of a general computing device. Components of the electronic device 12 may include, but are not limited to: one or more processors or processing units 16, system memory 28, and a bus 18 connecting various system components (including memory 28 and processing unit 16).

Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a graphics accelerated port, a processor, or a local bus using any of a variety of bus structures. For example, these architectures include but are not limited to Industry Standard Architecture (hereinafter referred to as: ISA) bus, Micro Channel Architecture (Micro Channel Architecture; hereafter referred to as: MAC) bus, enhanced ISA bus, video electronics Standards Association (Video Electronics Standards Association; hereinafter referred to as: VESA) local bus and Peripheral Component Interconnection (hereinafter referred to as: PCI) bus.

Electronic device 12 typically includes a variety of computer system readable media. These media can be any available media that can be accessed by electronic device 12, including volatile and nonvolatile media, removable and non-removable media.

The memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (Random Access Memory; hereinafter referred to as: RAM) 30 and/or cache memory 32. Electronic device 12 may further include other removable/non-removable, volatile/non-volatile computer system storage media. By way of example only, storage system 34 may be used to read and write to non-removable, non-volatile magnetic media (not shown in Figure 19, commonly referred to as a "hard drive"). Although not shown in FIG. 19, a disk drive for reading and writing a removable non-volatile disk (e.g., a "floppy disk") and a removable non-volatile optical disk (e.g., a compact disk read-only memory) may be provided. Disc Read Only Memory (hereinafter referred to as: CD-ROM), Digital Video Disc Read Only Memory (hereinafter referred to as: DVD-ROM) or other optical media) read and write optical disc drives. In these cases, each drive may be connected to bus 18 through one or more data media interfaces. Memory 28 may include at least one program product having a set (eg, at least one) of program modules configured to perform the functions of embodiments of the present disclosure.

A program/utility 40 having a set of (at least one) program modules 42, including but not limited to an operating system, one or more application programs, other program modules, and program data, may be stored, for example, in memory 28 , each of these examples or some combination may include the implementation of a network environment. Program modules 42 generally perform functions and/or methods in the embodiments described in this disclosure.

Electronic device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), may also communicate with one or more devices that enable a user to interact with electronic device 12, and/or with Any device (eg, network card, modem, etc.) that enables the electronic device 12 to communicate with one or more other computing devices. This communication may occur through input/output (I/O) interface 22. Moreover, the electronic device 12 can also communicate with one or more networks (such as a local area network (Local Area Network; hereinafter referred to as: LAN), a wide area network (Wide Area Network; hereinafter referred to as: WAN)) and/or a public network, such as the Internet, through the network adapter 20 ) communication. As shown, network adapter 20 communicates with other modules of electronic device 12 via bus 18 . It should be understood that, although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 12, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives And data backup storage system, etc.

The processing unit 16 executes programs stored in the memory 28 to perform various functional applications and data processing, such as implementing the methods mentioned in the previous embodiments.

In the description of this specification, reference to the terms "one embodiment," "some embodiments," "an example," "specific examples," or "some examples" or the like means that specific features are described in connection with the embodiment or example. , structures, materials, or features are included in at least one embodiment or example of the present disclosure. In this specification, the schematic expressions of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the specific features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, those skilled in the art may combine and combine different embodiments or examples and features of different embodiments or examples described in this specification unless they are inconsistent with each other.

In addition, the terms “first” and “second” are used for descriptive purposes only and cannot be understood as indicating or implying relative importance or implicitly indicating the quantity of indicated technical features. Therefore, features defined as "first" and "second" may explicitly or implicitly include at least one of these features. In the description of the present disclosure, "plurality" means at least two, such as two, three, etc., unless otherwise expressly and specifically limited.

Any process or method descriptions in flowcharts or otherwise described herein may be understood to represent modules, segments, or portions of code that include one or more executable instructions for implementing customized logical functions or steps of the process. , and the scope of the preferred embodiments of the present disclosure includes additional implementations in which functions may be performed out of the order shown or discussed, including in a substantially simultaneous manner or in the reverse order, depending on the functionality involved, which shall It should be understood by those skilled in the art to which embodiments of the present disclosure belong.

The logic and/or steps represented in the flowcharts or otherwise described herein, for example, may be considered a sequenced list of executable instructions for implementing the logical functions, and may be embodied in any computer-readable medium, For use by, or in combination with, instruction execution systems, devices or devices (such as computer-based systems, systems including processors or other systems that can fetch instructions from and execute instructions from the instruction execution system, device or device) or equipment. For the purposes of this specification, a "computer-readable medium" may be any device that can contain, store, communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. More specific examples (non-exhaustive list) of computer readable media include the following: electrical connections with one or more wires (electronic device), portable computer disk cartridges (magnetic device), random access memory (RAM), Read-only memory (ROM), erasable and programmable read-only memory (EPROM or flash memory), fiber optic devices, and portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium may even be paper or other suitable medium on which the program may be printed, as the paper or other medium may be optically scanned, for example, and subsequently edited, interpreted, or otherwise suitable as necessary. process to obtain the program electronically and then store it in computer memory.

It should be understood that various parts of the present disclosure may be implemented in hardware, software, firmware, or combinations thereof. In the above embodiments, various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if it is implemented in hardware, as in another embodiment, it can be implemented by any one of the following technologies known in the art or their combination: discrete logic gate circuits with logic functions for implementing data signals; Logic circuits, application specific integrated circuits with suitable combinational logic gates, programmable gate arrays (PGA), field programmable gate arrays (FPGA), etc.

Those of ordinary skill in the art can understand that all or part of the steps involved in implementing the methods of the above embodiments can be completed by instructing relevant hardware through a program. The program can be stored in a computer-readable storage medium. The program can be stored in a computer-readable storage medium. When executed, one of the steps of the method embodiment or a combination thereof is included.

In addition, each functional unit in various embodiments of the present disclosure may be integrated into one processing module, each unit may exist physically alone, or two or more units may be integrated into one module. The above integrated modules can be implemented in the form of hardware or software function modules. If the integrated module is implemented in the form of a software function module and sold or used as an independent product, it can also be stored in a computer-readable storage medium.

The storage media mentioned above can be read-only memory, magnetic disks or optical disks, etc. Although the embodiments of the present disclosure have been shown and described above, it can be understood that the above-mentioned embodiments are illustrative and should not be construed as limitations of the present disclosure. Those of ordinary skill in the art can make modifications to the above-mentioned embodiments within the scope of the present disclosure. The embodiments are subject to changes, modifications, substitutions and variations.

Claims

A text error correction method based on robotic process automation RPA and artificial intelligence AI, including:

Based on the optical character recognition OCR model, perform character recognition on the image to be recognized to obtain the predicted text and the confidence of each predicted character in the predicted text;

Determine the character to be processed from each of the predicted characters according to the confidence of each of the predicted characters;

Mask the characters to be processed in the predicted text, and use a prediction model to perform character prediction on the masked predicted text to obtain at least one replacement character corresponding to the character to be processed;

According to the similarity between the at least one replacement character and the character to be processed, a target character is determined from the at least one replacement character, and the target character is used to replace the character to be processed in the predicted text to obtain Identify text.
The method of claim 1, wherein the method is performed by an RPA robot.
The method according to claim 1 or 2, wherein the characters to be processed in the predicted text are masked, and a prediction model is used to perform character prediction on the masked predicted text to obtain the characters to be processed. Process at least one replacement character corresponding to the character, including:

Determine the target position of the character to be processed in the predicted text;

Obtain a masked character and replace the character to be processed at the target position in the predicted text with the masked character to obtain a masked predicted text;

Input the masked predicted text into the prediction model to use the prediction model to perform character prediction on the masked predicted text to obtain at least one target text;

The character at the target position in the at least one target text is used as the at least one replacement character.
The method according to any one of claims 1 to 3, wherein the prediction model is trained through the following steps:

Get sample text;

Mask at least one sample character in the sample text to obtain a masked sample text;

Input the masked sample text into an initial prediction model to use the prediction model to perform character prediction on the masked sample text to obtain output text;

Model parameters in the prediction model are adjusted based on the difference between the sample text and the output text.
The method according to any one of claims 1-4, wherein before determining the target character from the at least one replacement character according to the similarity between the at least one replacement character and the character to be processed, the The above methods also include:

Encoding the at least one replacement character based on a set encoding algorithm to obtain a first encoding value of the at least one replacement character;

Based on the set encoding algorithm, encode the character to be processed to obtain a second encoding value of the character to be processed;

The similarity between the at least one replacement character and the character to be processed is determined based on the difference between the first encoding value and the second encoding value of the at least one replacement character.
The method according to any one of claims 1-4, wherein before determining the target character from the at least one replacement character according to the similarity between the at least one replacement character and the character to be processed, the The above methods also include:

For each replacement character, draw according to the replacement character to obtain a first image;

Draw according to the character to be processed to obtain a second image;

The similarity between the replacement character and the character to be processed is determined based on the similarity between the first image and the second image.
The method according to any one of claims 1-4, wherein before determining the target character from the at least one replacement character according to the similarity between the at least one replacement character and the character to be processed, the The above methods also include:

Perform feature extraction on the at least one replacement character to obtain a feature vector of the at least one replacement character;

Perform feature extraction on the characters to be processed to obtain the feature vectors of the characters to be processed;

The similarity between the at least one replacement character and the character to be processed is determined based on the similarity between the feature vector of the at least one replacement character and the feature vector of the character to be processed.
The method according to any one of claims 1 to 7, wherein the character recognition of the image to be recognized based on the optical character recognition OCR model includes:

Using the feature extraction branch in the OCR model, perform feature extraction on the image to be recognized to obtain a first feature map;

Using the fusion branch in the OCR model, the first feature map and the location map are fused to obtain a second feature map, where each element in the location map is the same as each element in the first feature map. One correspondence, the elements in the position map are used to indicate the coordinates of the corresponding elements in the first feature map in the image to be recognized;

Use the feature transformation branch in the OCR model to perform feature transformation on the second feature map to obtain a third feature map;

The prediction branch in the OCR model is used to decode the third feature map to obtain the predicted text and the confidence corresponding to each predicted character in the predicted text.
The method according to claim 8, wherein before using the feature extraction branch in the OCR model to perform feature extraction on the image to be recognized to obtain the first feature map, the method further includes:

Perform angle prediction on the image to be recognized and determine the tilt angle of the image to be recognized;

The image to be recognized is rotated according to the tilt angle.
The method according to any one of claims 1 to 9, wherein determining the characters to be processed from each of the predicted characters according to the confidence of each of the predicted characters includes:

Use predicted characters with a confidence level higher than the confidence threshold as the characters to be processed;

or,

Use the predicted character with the highest confidence as the character to be processed;

or,

The predicted characters are sorted from large to small according to the value of the confidence level, and the predicted characters with the first target number are selected as the characters to be processed, wherein the value of the target number is the same as the value of the target number. The length of the predicted text is positively correlated.
A method for training the prediction model described in any one of claims 1-10, comprising:

Get sample text;

Mask at least one sample character in the sample text to obtain a masked sample text;

Input the masked sample text into an initial prediction model to use the prediction model to perform character prediction on the masked sample text to obtain output text;

Model parameters in the prediction model are adjusted based on the difference between the sample text and the output text.
The method of claim 11, wherein adjusting model parameters in the prediction model based on the difference between the sample text and the output text includes:

Generate a first loss value based on the difference in confidence distribution of characters between the sample text and the output text;

Determine the first position of the at least one sample character to be masked in the sample text;

generating a second loss value based on the difference between the sample character at the first position in the sample text and the output character at the first position in the output text;

Generate a target loss value according to the first loss value and the second loss value;

According to the target loss value, the model parameters in the prediction model are adjusted.
The method according to claim 11 or 12, wherein the masked sample text is input into an initial prediction model to use the prediction model to perform character prediction on the masked sample text, Get the output text, including:

Get the settings dictionary;

Determine whether each character in the masked sample text is located in the set dictionary;

If there is a first character in the masked sample text that is not in the set dictionary, replace the first character in the masked sample text with a specific character;

The replaced sample text is input into the prediction model, so that the prediction model is used to perform character prediction on the replaced sample text to obtain the output text.
The method according to claim 13, wherein generating a target loss value according to the first loss value and the second loss value includes:

Determine the second position of the first character in the sample text;

generating a third loss value based on the difference between the specific character and the output character located at the second position in the output text;

The first loss value and the second loss value are weighted according to the first weight of the first loss value and the second weight of the second loss value to obtain a fourth loss value, where The second weight is greater than the first weight;

The target loss value is generated based on the difference between the fourth loss value and the third loss value.
A text error correction device based on robotic process automation RPA and artificial intelligence AI, including:

A recognition module, configured to perform character recognition on the image to be recognized based on the optical character recognition OCR model, to obtain the predicted text and the confidence of each predicted character in the predicted text;

A determination module, configured to determine the characters to be processed from each of the predicted characters according to the confidence of each of the predicted characters;

A masking module, used to mask the characters to be processed in the predicted text;

A prediction module, configured to use a prediction model to perform character prediction on the masked predicted text to obtain at least one replacement character corresponding to the character to be processed;

a replacement module, configured to determine a target character from the at least one replacement character based on the similarity between the at least one replacement character and the to-be-processed character, and use the target character to replace the to-be-processed character in the predicted text. Process characters to get recognized text.
The device of claim 15, wherein the device is applied to an RPA robot.
A device for training the prediction model described in claim 15 or 16, comprising:

Obtain module, used to obtain sample text;

A masking module, used to mask at least one sample character in the sample text to obtain the masked sample text;

An input module for inputting the masked sample text into an initial prediction model, so as to use the prediction model to perform character prediction on the masked sample text to obtain output text;

An adjustment module, configured to adjust model parameters in the prediction model based on the difference between the sample text and the output text.
The device according to claim 17, wherein the adjustment module is used for:

Generate a first loss value based on the difference in confidence distribution of characters between the sample text and the output text;

Determine the first position of the at least one sample character to be masked in the sample text;

generating a second loss value based on the difference between the sample character at the first position in the sample text and the output character at the first position in the output text;

Generate a target loss value according to the first loss value and the second loss value;

According to the target loss value, the model parameters in the prediction model are adjusted.
An electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor. When the processor executes the computer program, the implementation as described in any one of claims 1-10 is achieved. method, or implement the method as described in any one of claims 11-14.
A non-transitory computer-readable storage medium with a computer program stored thereon, wherein the computer program implements the method as claimed in any one of claims 1-10 when executed by a processor, or implements the method as claimed in any one of claims 1-10 The method described in any one of 11-14.
A computer program product, comprising a computer program that, when executed by a processor, implements the method according to any one of claims 1-10, or implements the method according to any one of claims 11-14 Methods.