CN112257965A

CN112257965A - Prediction method and prediction system for image text recognition confidence

Info

Publication number: CN112257965A
Application number: CN202011348779.8A
Authority: CN
Inventors: 夏路遥; 黄贤俊; 侯进
Original assignee: Shenyuan Hengji Technology Co ltd
Current assignee: Shenyuan Hengji Technology Co ltd
Priority date: 2020-11-26
Filing date: 2020-11-26
Publication date: 2021-01-22

Abstract

The invention discloses a prediction method and a prediction system for image text recognition confidence coefficient, wherein the prediction method comprises the following steps: training a text recognition model, and training with the text recognition model with the training convergence to obtain a confidence coefficient model; inputting the text image into a confidence coefficient model of training convergence; extracting picture characteristics of the text image by using a deep convolutional neural network; after the picture characteristics are segmented, inputting the slices into an LSTM bidirectional cyclic neural network, and outputting comparison operation prediction marks of the slices; and performing merging operation by using a CTC algorithm, outputting a comparison operation prediction sequence, and predicting the position of a possible error of a predicted text corresponding to the text image and the type of the required error modification operation. By the technical scheme of the invention, the text recognition and the confidence prediction can obtain extremely high accuracy, the recognition difficulty of the text is obtained, and the characters in the text are predicted to be possible to have errors and the operation types are corrected by errors, so that the significance of the confidence to a user is determined.

Description

Prediction method and prediction system for image text recognition confidence

Technical Field

The invention relates to the technical field of image processing, in particular to a method and a system for predicting an image text recognition confidence coefficient.

Background

OCR (Optical Character Recognition) refers to a process in which an electronic device (e.g., a scanner or a digital camera) examines characters printed on paper and then translates the shape into computer text using a Character Recognition method. The current OCR technology is divided into two modules of detection and recognition. The confidence of the recognition model represents the possibility that the text recognized by the cut image is subjected to a greedy algorithm and then is subjected to the effect of fast merging of the correct text and the slice. In an actual use scene, the confidence coefficient has limitations, and the probability of the text being correct cannot be truly represented. First, if a cut word exists in an area, a situation that the confidence is high but the text is actually wrong occurs, or when the text is long (but the text is clear), because of too many targets, a situation that the confidence is low but the text is actually correct occurs, and thus the overall accuracy is affected.

Disclosure of Invention

Aiming at the problems, the invention provides a method and a system for predicting image text recognition confidence coefficient, wherein a confidence coefficient model is trained through a text recognition model with convergent training, picture features are extracted through a deep convolutional neural network, the picture features are transversely sliced, the slices are inferred through an LSTM bidirectional cyclic neural network, after the inference result is merged through a CTC algorithm, errors which possibly occur in a predicted text of a text image in the text recognition model are predicted, the text recognition and the confidence coefficient prediction can obtain extremely high accuracy rate based on the deep neural network mode, the confidence coefficient recognition represents the recognition difficulty degree of the text, and can return characters in the predicted text which possibly occur errors and the operation type of error modification, so that the significance of the confidence coefficient to a user is determined.

In order to achieve the above object, the present invention provides a method for predicting confidence of image text recognition, including: training a text recognition model by using a text image and a text label corresponding to the text image; taking a predicted text output by the text recognition model with the convergence of training and a corresponding text label as input, taking a comparison operation label of the predicted text and a corresponding position of the label content as output, and training to obtain a confidence coefficient model; inputting a text image into the confidence model of training convergence; extracting picture features of the text image by using a deep convolutional neural network; carrying out transverse equal-width segmentation on the picture features to form slices; inputting the slice into an LSTM bidirectional cyclic neural network, and outputting a comparison operation prediction identifier of the slice; utilizing a CTC algorithm to carry out the operation of combining and overlapping characters and placeholders on the comparison operation prediction identification, and outputting a comparison operation prediction sequence with the maximum probability; and predicting the position of the text image in the predicted text of the text recognition model where errors can occur and the type of the required error modification operation according to the comparison operation prediction sequence.

In the above technical solution, preferably, the specific training process for training the text recognition model by using the text image and the text label corresponding to the text image includes: acquiring the text image, and making a corresponding text label aiming at the text image; extracting picture features of the text image by using a deep convolutional neural network; carrying out transverse equal-width segmentation on the picture features to form slices; inputting the slice into an LSTM bidirectional cyclic neural network, and outputting a predicted text of the slice; performing word and placeholder combination and overlapping operation on the predicted text by using a CTC algorithm, and outputting a predicted text sequence with the maximum probability; comparing the predicted text sequence with the text labels and calculating a loss function; and performing gradient descent training on the model according to the predicted text sequence and the loss function of the text label to obtain the converged text recognition model.

In the foregoing technical solution, preferably, the specific training process for obtaining the confidence model by training using the predicted text output by the text recognition model converged by training and the corresponding text label as input and using the comparison operation label of the corresponding position of the predicted text and the label content as output includes:

inputting a text image into the text recognition model which trains convergence; comparing the predicted text output by the text recognition model with the text label, and performing comparison operation labeling on the error modification operation type from the predicted text to the position corresponding to the text label; extracting picture features of the text image by using a deep convolutional neural network; carrying out transverse equal-width segmentation on the picture features to form slices; inputting the slice into an LSTM bidirectional cyclic neural network, and outputting a comparison operation prediction identifier of the slice; utilizing a CTC algorithm to carry out the operation of combining and overlapping characters and placeholders on the comparison operation prediction identification, and outputting a comparison operation prediction sequence with the maximum probability; comparing the comparison operation prediction sequence with the comparison operation label, and calculating a loss function; and performing gradient descent training on the model according to the comparison operation prediction sequence and the loss function labeled by the comparison operation so as to obtain the converged confidence coefficient model.

In the above technical solution, preferably, the comparison operation marking includes not modifying, deleting 1 character, modifying 1 character, and adding 1 character; according to the comparison between the predicted text and the text label: if the characters of the corresponding positions are the same, the current position is marked without modification; if the characters at the corresponding positions are repeated, deleting 1 character mark at the position of the repeated character; if the corresponding position character is wrong, modifying 1 character label before the wrong character; and if the corresponding position character is missing, adding 1 character mark at the position of the missing character.

In the foregoing technical solution, preferably, the predicting, according to the comparison operation prediction sequence, a position where an error may occur in a predicted text of the text image in the text recognition model and a type of a required error modification operation specifically include: judging whether the predicted text at the corresponding position has errors or not according to the marks of different positions in the comparison operation prediction sequence; if the current position is not marked for modification, the character of the current position is predicted to be correct; if the current position is marked by deleting 1 character, predicting that the character at the current position is a repeated character; if the current position is marked by modifying 1 character, predicting that the character at the current position is an error character; if the current position is marked by adding 1 character, predicting that the current position character is missing.

The invention also provides a system for predicting the image text recognition confidence coefficient, which applies any one of the technical schemes to provide a method for predicting the image text recognition confidence coefficient, and comprises the following steps: the text recognition model training module is used for training a text recognition model by using a text image and a text label corresponding to the text image; the confidence model training module is used for training to obtain a confidence model by taking a predicted text output by the text recognition model converged by training and a corresponding text label as input and taking a comparison operation label of a corresponding position of the predicted text and the label content as output; the image input module is used for inputting the text image into the confidence coefficient model which is converged by training; the characteristic extraction module is used for extracting the picture characteristics of the text image by utilizing a deep convolutional neural network; the transverse segmentation module is used for carrying out transverse equal-width segmentation on the picture characteristics to form a slice; the slice prediction module is used for inputting the slice into an LSTM bidirectional circulation neural network and outputting a comparison operation prediction identifier of the slice; the merging operation module is used for performing merging and overlapping word and placeholder operations on the comparison operation prediction identification by utilizing a CTC algorithm and outputting a comparison operation prediction sequence with the maximum probability; and the confidence coefficient prediction module predicts the position of the text image possibly with errors in the predicted text in the text recognition model and the type of the required error modification operation according to the comparison operation prediction sequence.

In the above technical solution, preferably, the specific training process of the text recognition model training module includes: acquiring the text image, and making a corresponding text label aiming at the text image; extracting picture features of the text image by using a deep convolutional neural network; carrying out transverse equal-width segmentation on the picture features to form slices; inputting the slice into an LSTM bidirectional cyclic neural network, and outputting a predicted text of the slice; performing word and placeholder combination and overlapping operation on the predicted text by using a CTC algorithm, and outputting a predicted text sequence with the maximum probability; comparing the predicted text sequence with the text labels and calculating a loss function; and performing gradient descent training on the model according to the predicted text sequence and the loss function of the text label to obtain the converged text recognition model.

In the above technical solution, preferably, the specific training process of the confidence model training module includes: inputting a text image into the text recognition model which trains convergence; comparing the predicted text output by the text recognition model with the text label, and performing comparison operation labeling on the error modification operation type from the predicted text to the position corresponding to the text label; extracting picture features of the text image by using a deep convolutional neural network; carrying out transverse equal-width segmentation on the picture features to form slices; inputting the slice into an LSTM bidirectional cyclic neural network, and outputting a comparison operation prediction identifier of the slice; utilizing a CTC algorithm to carry out the operation of combining and overlapping characters and placeholders on the comparison operation prediction identification, and outputting a comparison operation prediction sequence with the maximum probability; comparing the comparison operation prediction sequence with the comparison operation label, and calculating a loss function; and performing gradient descent training on the model according to the comparison operation prediction sequence and the loss function labeled by the comparison operation so as to obtain the converged confidence coefficient model.

In the foregoing technical solution, preferably, the confidence prediction module is specifically configured to: judging whether the predicted text at the corresponding position has errors or not according to the marks of different positions in the comparison operation prediction sequence; if the current position is not marked for modification, the character of the current position is predicted to be correct; if the current position is marked by deleting 1 character, predicting that the character at the current position is a repeated character; if the current position is marked by modifying 1 character, predicting that the character at the current position is an error character; if the current position is marked by adding 1 character, predicting that the current position character is missing.

Compared with the prior art, the invention has the beneficial effects that: the confidence coefficient model is trained through a convergent text recognition model, picture features are extracted through a deep convolutional neural network, the picture features are transversely sliced, the slices are inferred through an LSTM bidirectional cyclic neural network, after the inference results are merged through a CTC algorithm, errors which may occur in a predicted text of a text image in the text recognition model are predicted, the text recognition and the prediction of the confidence coefficient can obtain extremely high accuracy rate based on the deep neural network mode, the confidence coefficient recognition represents the recognition difficulty degree of the text, the characters in the predicted text can be returned, the operation types of error modification can be returned, and the significance of the confidence coefficient on a user is determined.

Drawings

Fig. 1 is a schematic flowchart of a method for predicting confidence in image text recognition according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a training process of a text recognition model according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating a training process of a confidence model according to an embodiment of the present invention;

FIG. 4 is a block diagram illustrating an image text recognition confidence prediction system according to an embodiment of the present invention.

In the drawings, the correspondence between each component and the reference numeral is:

11. the system comprises a text recognition model training module, a confidence coefficient model training module 12, an image input module 13, a feature extraction module 14, a transverse segmentation module 15, a slice prediction module 16, a merging operation module 17 and a confidence coefficient prediction module 18.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

The invention is described in further detail below with reference to the attached drawing figures:

as shown in fig. 1, the image text recognition confidence prediction method provided by the present invention includes: training a text recognition model by using the text image and a text label corresponding to the text image; taking a predicted text output by the text recognition model with the convergence of training and a corresponding text label as input, taking a comparison operation label of the corresponding position of the predicted text and the label content as output, and training to obtain a confidence coefficient model; inputting the text image into a confidence coefficient model of training convergence; extracting picture characteristics of the text image by using a deep convolutional neural network; carrying out transverse equal-width segmentation on the picture characteristics to form slices; inputting the slices into an LSTM bidirectional cyclic neural network, and outputting comparison operation prediction marks of the slices; performing the operation of combining and overlapping characters and placeholders on the comparison operation prediction identification by using a CTC algorithm, and outputting a comparison operation prediction sequence with the maximum probability; and according to the comparison operation prediction sequence, predicting the position of the predicted text image in the predicted text in the text recognition model where errors can occur and the type of error modification operation required.

In the embodiment, a confidence coefficient model is trained through a convergent text recognition model, picture features are extracted through a deep convolutional neural network, the picture features are transversely sliced, the slices are inferred through an LSTM bidirectional cyclic neural network, after the inference results are merged through a CTC algorithm, errors which may occur in a predicted text of a text image in the text recognition model are predicted, the text recognition and the confidence coefficient can obtain extremely high accuracy rate based on the deep neural network, the confidence coefficient recognition represents the recognition difficulty degree of the text, and the characters in the predicted text can be returned to have errors and the error modification operation type, so that the significance of the confidence coefficient to a user is clarified.

Specifically, a confidence model is obtained through training by using a text recognition model converged through training as a basis in advance, taking a predicted text of a text image and a corresponding text label of the text recognition model as input, and taking whether a character at a corresponding position of the predicted text and the text label is wrong or not and an error modification operation type as output by comparing the predicted text and the text label in advance. After the confidence model is trained and converged, in the practical application process, the text image is directly input into the confidence model, a confidence prediction result can be directly obtained after a series of operations, and the prediction result is the position of a predicted text which is possibly wrong relative to a real text and the type of wrong modification operation if the text image is input into the text recognition model. For the user, according to the confidence degree prediction result, the confidence degree of the prediction result can be confirmed more clearly and more conveniently, and the confidence degree is specifically refined to the text recognition difficulty degree of a specific character of the prediction text.

In the above embodiment, preferably, the specific training process for training the text recognition model by using the text image and the text label corresponding to the text image includes: acquiring a text image, and making a corresponding text label aiming at the text image; extracting picture characteristics of the text image by using a deep convolutional neural network; carrying out transverse equal-width segmentation on the picture characteristics to form slices; inputting the slices into an LSTM bidirectional cyclic neural network, and outputting prediction texts of the slices; performing word combination and overlapping operation on the predicted text by using a CTC algorithm, and outputting a predicted text sequence with the maximum probability; comparing the predicted text sequence with the text label, and calculating a loss function; and performing gradient descent training on the model according to the loss function of the predicted text sequence and the text label to obtain a converged text recognition model.

As shown in fig. 2, specifically, taking a text image written with a numeral "130001" as an example, the text is labeled "130001". In the training process, the specific steps are as follows:

1) extracting text image features by using a deep convolutional neural network;

2) performing horizontal equal-width segmentation on the picture characteristics, inputting all the slices into an LSTM bidirectional cyclic neural network, and outputting prediction texts of all the slices;

3) combining and overlapping the characters and the placeholders by using a CTC algorithm, and outputting a predicted character sequence with the maximum probability;

4) comparing the predicted text with the text label content, and calculating ctc loss;

5) and performing gradient descent training on the model according to the loss to obtain a final training convergence text recognition model.

In the foregoing embodiment, preferably, the specific training process of training to obtain the confidence level model includes:

inputting the text image into a text recognition model for training convergence; comparing the predicted text output by the text recognition model with the text label, and carrying out comparison operation labeling on the error modification operation type from the predicted text to the corresponding position of the text label; extracting picture characteristics of the text image by using a deep convolutional neural network; carrying out transverse equal-width segmentation on the picture characteristics to form slices; inputting the slices into an LSTM bidirectional cyclic neural network, and outputting comparison operation prediction marks of the slices; performing the operation of combining and overlapping characters and placeholders on the comparison operation prediction identification by using a CTC algorithm, and outputting a comparison operation prediction sequence with the maximum probability; comparing the comparison operation prediction sequence with the comparison operation label, and calculating a loss function; and performing gradient descent training on the model according to the comparison operation prediction sequence and the loss function labeled by the comparison operation so as to obtain a converged confidence model.

As shown in fig. 3, in this embodiment, training a confidence level model using a text recognition model that is converged in training specifically includes:

firstly, a predicted text is obtained by utilizing a text recognition model for recognition, the predicted text is compared with a text label, and according to the comparison between the predicted text and the text label, the following 3 operations are allowed except that the label (N) is not modified:

i. deleting 1 character (D)

Modifying 1 character (C)

Adding 1 character (A)

Then, according to the error from the predicted content to the real text annotation, which operations are performed at least, the text annotation content of "130001" is still taken as an example:

i. identifying to obtain a predicted text of '130001', marking a comparison operation as 'NNNN', and indicating that 6 characters do not need to be modified;

identifying that the predicted text is '1130001', and the comparison operation is marked as 'DNNNNNN', which indicates that the first character needs to be deleted;

identifying the predicted text as "120001" and the comparison operation as "NCNNNN" indicating that the second character needs to be modified

identifying the predicted text as "30001" and the comparison operation as "ANNNNNN" indicating that the first character needs to be added.

According to the new comparison operation label, the training process of the confidence coefficient model is as follows:

1) extracting picture features by using a deep convolutional neural network;

2) performing horizontal equal-width segmentation on the picture characteristics, inputting all the slices into an LSTM bidirectional circulation neural network, and outputting comparison operation prediction marks of all the slices;

3) using a CTC algorithm to carry out the operation of combining and overlapping characters and placeholders on the comparison operation prediction identification, and outputting a comparison operation prediction sequence with the maximum probability;

4) comparing the comparison operation prediction sequence with the new comparison operation marking content, and calculating the ctc loss;

5) and carrying out gradient descent training on the model according to the loss to obtain a final confidence coefficient model of training convergence.

In the above embodiment, preferably, the marking of the comparison operation includes not modifying, deleting 1 character, modifying 1 character and adding 1 character; according to the comparison between the predicted text and the text label: if the characters of the corresponding positions are the same, the current position is marked without modification; if the characters at the corresponding positions are repeated, deleting 1 character mark at the position of the repeated character; if the corresponding position character is wrong, modifying 1 character label before the wrong character; and if the corresponding position character is missing, adding 1 character mark at the position of the missing character.

In the above embodiment, preferably, the predicting of the predicted text image in the text recognition model according to the comparison operation prediction sequence may include a location where an error may occur in the predicted text and a type of the error modification operation that is required: judging whether the predicted text of the corresponding position has errors or not according to the marks of different positions in the comparison operation prediction sequence; if the current position is not marked for modification, the character of the current position is predicted to be correct; if the current position is marked by deleting 1 character, predicting that the character at the current position is a repeated character; if the current position is marked by modifying 1 character, predicting that the character at the current position is an error character; if the current position is marked by adding 1 character, predicting that the current position character is missing.

As shown in fig. 4, the present invention further provides a system for predicting an image text recognition confidence, which applies any one of the methods for predicting an image text recognition confidence provided in the foregoing embodiments, and includes: the text recognition model training module 11 is configured to train a text recognition model by using the text image and a text label corresponding to the text image; the confidence model training module 12 is configured to train to obtain a confidence model by taking a predicted text output by the text recognition model converged in the training and a corresponding text label as input and taking a comparison operation label of a corresponding position of the predicted text and a label content as output; an image input module 13, configured to input a text image into a confidence model of training convergence; the feature extraction module 14 is configured to extract picture features of the text image by using a deep convolutional neural network; the transverse segmentation module 15 is used for transversely segmenting the image features into slices in equal width; the slice prediction module 16 is used for inputting the slices into the LSTM bidirectional circulation neural network and outputting comparison operation prediction marks of the slices; a merging operation module 17, configured to perform merging and overlapping word and placeholder operations on the comparison operation prediction identifier by using a CTC algorithm, and output a comparison operation prediction sequence with a maximum probability; and the confidence coefficient prediction module 18 predicts the position of the possible error of the predicted text of the text image in the text recognition model and the type of the error modification operation required according to the comparison operation prediction sequence.

In the above embodiment, preferably, the specific training process of the text recognition model training module 11 includes: acquiring a text image, and making a corresponding text label aiming at the text image; extracting picture characteristics of the text image by using a deep convolutional neural network; carrying out transverse equal-width segmentation on the picture characteristics to form slices; inputting the slices into an LSTM bidirectional cyclic neural network, and outputting prediction texts of the slices; performing word combination and overlapping operation on the predicted text by using a CTC algorithm, and outputting a predicted text sequence with the maximum probability; comparing the predicted text sequence with the text label, and calculating a loss function; and performing gradient descent training on the model according to the loss function of the predicted text sequence and the text label to obtain a converged text recognition model.

In the above embodiment, preferably, the specific training process of the confidence model training module 12 includes: inputting the text image into a text recognition model for training convergence; comparing the predicted text output by the text recognition model with the text label, and carrying out comparison operation labeling on the error modification operation type from the predicted text to the corresponding position of the text label; extracting picture characteristics of the text image by using a deep convolutional neural network; carrying out transverse equal-width segmentation on the picture characteristics to form slices; inputting the slices into an LSTM bidirectional cyclic neural network, and outputting comparison operation prediction marks of the slices; performing the operation of combining and overlapping characters and placeholders on the comparison operation prediction identification by using a CTC algorithm, and outputting a comparison operation prediction sequence with the maximum probability; comparing the comparison operation prediction sequence with the comparison operation label, and calculating a loss function; and performing gradient descent training on the model according to the comparison operation prediction sequence and the loss function labeled by the comparison operation so as to obtain a converged confidence model.

In the above embodiment, preferably, the confidence prediction module 18 is specifically configured to: judging whether the predicted text of the corresponding position has errors or not according to the marks of different positions in the comparison operation prediction sequence; if the current position is not marked for modification, the character of the current position is predicted to be correct; if the current position is marked by deleting 1 character, predicting that the character at the current position is a repeated character; if the current position is marked by modifying 1 character, predicting that the character at the current position is an error character; if the current position is marked by adding 1 character, predicting that the current position character is missing.

The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method for predicting image text recognition confidence is characterized by comprising the following steps:

training a text recognition model by using a text image and a text label corresponding to the text image;

taking a predicted text output by the text recognition model with the convergence of training and a corresponding text label as input, taking a comparison operation label of the predicted text and a corresponding position of the label content as output, and training to obtain a confidence coefficient model;

inputting a text image into the confidence model of training convergence;

extracting picture features of the text image by using a deep convolutional neural network;

carrying out transverse equal-width segmentation on the picture features to form slices;

inputting the slice into an LSTM bidirectional cyclic neural network, and outputting a comparison operation prediction identifier of the slice;

utilizing a CTC algorithm to carry out the operation of combining and overlapping characters and placeholders on the comparison operation prediction identification, and outputting a comparison operation prediction sequence with the maximum probability;

and predicting the position of the text image in the predicted text of the text recognition model where errors can occur and the type of the required error modification operation according to the comparison operation prediction sequence.

2. The method for predicting image text recognition confidence of claim 1, wherein the specific training process for training the text recognition model by using the text image and the text label corresponding to the text image comprises:

acquiring the text image, and making a corresponding text label aiming at the text image;

inputting the slice into an LSTM bidirectional cyclic neural network, and outputting a predicted text of the slice;

performing word and placeholder combination and overlapping operation on the predicted text by using a CTC algorithm, and outputting a predicted text sequence with the maximum probability;

comparing the predicted text sequence with the text labels and calculating a loss function;

and performing gradient descent training on the model according to the predicted text sequence and the loss function of the text label to obtain the converged text recognition model.

3. The method for predicting image text recognition confidence according to claim 1 or 2, wherein the specific training process for training the confidence model includes:

inputting a text image into the text recognition model which trains convergence;

comparing the predicted text output by the text recognition model with the text label, and performing comparison operation labeling on the error modification operation type from the predicted text to the position corresponding to the text label;

comparing the comparison operation prediction sequence with the comparison operation label, and calculating a loss function;

and performing gradient descent training on the model according to the comparison operation prediction sequence and the loss function labeled by the comparison operation so as to obtain the converged confidence coefficient model.

4. The image text recognition confidence prediction method of claim 3, wherein the comparison operation labels include no modification, deletion of 1 character, modification of 1 character, and addition of 1 character;

according to the comparison between the predicted text and the text label:

if the characters of the corresponding positions are the same, the current position is marked without modification;

if the characters at the corresponding positions are repeated, deleting 1 character mark at the position of the repeated character;

if the corresponding position character is wrong, modifying 1 character label before the wrong character;

and if the corresponding position character is missing, adding 1 character mark at the position of the missing character.

5. The method for predicting image text recognition confidence according to claim 4, wherein the predicting, according to the comparison operation prediction sequence, a position where an error may occur in a predicted text of the text image in the text recognition model and a type of an error modification operation required specifically includes:

judging whether the predicted text at the corresponding position has errors or not according to the marks of different positions in the comparison operation prediction sequence;

if the current position is not marked for modification, the character of the current position is predicted to be correct;

if the current position is marked by deleting 1 character, predicting that the character at the current position is a repeated character;

if the current position is marked by modifying 1 character, predicting that the character at the current position is an error character;

if the current position is marked by adding 1 character, predicting that the current position character is missing.

6. An image text recognition confidence prediction system applying the image text recognition confidence prediction method according to any one of claims 1 to 5, comprising:

the text recognition model training module is used for training a text recognition model by using a text image and a text label corresponding to the text image;

the confidence model training module is used for training to obtain a confidence model by taking a predicted text output by the text recognition model converged by training and a corresponding text label as input and taking a comparison operation label of a corresponding position of the predicted text and the label content as output;

the image input module is used for inputting the text image into the confidence coefficient model which is converged by training;

the characteristic extraction module is used for extracting the picture characteristics of the text image by utilizing a deep convolutional neural network;

the transverse segmentation module is used for carrying out transverse equal-width segmentation on the picture characteristics to form a slice;

the slice prediction module is used for inputting the slice into an LSTM bidirectional circulation neural network and outputting a comparison operation prediction identifier of the slice;

the merging operation module is used for performing merging and overlapping word and placeholder operations on the comparison operation prediction identification by utilizing a CTC algorithm and outputting a comparison operation prediction sequence with the maximum probability;

and the confidence coefficient prediction module predicts the position of the text image possibly with errors in the predicted text in the text recognition model and the type of the required error modification operation according to the comparison operation prediction sequence.

7. The system for predicting image text recognition confidence according to claim 6, wherein the specific training process of the text recognition model training module comprises:

8. The image text recognition confidence prediction system of claim 6, wherein the specific training process of the confidence model training module comprises:

9. The image text recognition confidence prediction system of claim 8, wherein the comparison operation labels include no modification, deletion of 1 character, modification of 1 character, and addition of 1 character;

according to the comparison between the predicted text and the text label:

10. The image text recognition confidence prediction system of claim 9, wherein the confidence prediction module is specifically configured to: