CN112257965A - Prediction method and prediction system for image text recognition confidence - Google Patents

Prediction method and prediction system for image text recognition confidence Download PDF

Info

Publication number
CN112257965A
CN112257965A CN202011348779.8A CN202011348779A CN112257965A CN 112257965 A CN112257965 A CN 112257965A CN 202011348779 A CN202011348779 A CN 202011348779A CN 112257965 A CN112257965 A CN 112257965A
Authority
CN
China
Prior art keywords
text
character
image
comparison operation
predicted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011348779.8A
Other languages
Chinese (zh)
Inventor
夏路遥
黄贤俊
侯进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyuan Hengji Technology Co ltd
Original Assignee
Shenyuan Hengji Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyuan Hengji Technology Co ltd filed Critical Shenyuan Hengji Technology Co ltd
Priority to CN202011348779.8A priority Critical patent/CN112257965A/en
Publication of CN112257965A publication Critical patent/CN112257965A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Human Resources & Organizations (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Strategic Management (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Game Theory and Decision Science (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Character Discrimination (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a prediction method and a prediction system for image text recognition confidence coefficient, wherein the prediction method comprises the following steps: training a text recognition model, and training with the text recognition model with the training convergence to obtain a confidence coefficient model; inputting the text image into a confidence coefficient model of training convergence; extracting picture characteristics of the text image by using a deep convolutional neural network; after the picture characteristics are segmented, inputting the slices into an LSTM bidirectional cyclic neural network, and outputting comparison operation prediction marks of the slices; and performing merging operation by using a CTC algorithm, outputting a comparison operation prediction sequence, and predicting the position of a possible error of a predicted text corresponding to the text image and the type of the required error modification operation. By the technical scheme of the invention, the text recognition and the confidence prediction can obtain extremely high accuracy, the recognition difficulty of the text is obtained, and the characters in the text are predicted to be possible to have errors and the operation types are corrected by errors, so that the significance of the confidence to a user is determined.

Description

Prediction method and prediction system for image text recognition confidence
Technical Field
The invention relates to the technical field of image processing, in particular to a method and a system for predicting an image text recognition confidence coefficient.
Background
OCR (Optical Character Recognition) refers to a process in which an electronic device (e.g., a scanner or a digital camera) examines characters printed on paper and then translates the shape into computer text using a Character Recognition method. The current OCR technology is divided into two modules of detection and recognition. The confidence of the recognition model represents the possibility that the text recognized by the cut image is subjected to a greedy algorithm and then is subjected to the effect of fast merging of the correct text and the slice. In an actual use scene, the confidence coefficient has limitations, and the probability of the text being correct cannot be truly represented. First, if a cut word exists in an area, a situation that the confidence is high but the text is actually wrong occurs, or when the text is long (but the text is clear), because of too many targets, a situation that the confidence is low but the text is actually correct occurs, and thus the overall accuracy is affected.
Disclosure of Invention
Aiming at the problems, the invention provides a method and a system for predicting image text recognition confidence coefficient, wherein a confidence coefficient model is trained through a text recognition model with convergent training, picture features are extracted through a deep convolutional neural network, the picture features are transversely sliced, the slices are inferred through an LSTM bidirectional cyclic neural network, after the inference result is merged through a CTC algorithm, errors which possibly occur in a predicted text of a text image in the text recognition model are predicted, the text recognition and the confidence coefficient prediction can obtain extremely high accuracy rate based on the deep neural network mode, the confidence coefficient recognition represents the recognition difficulty degree of the text, and can return characters in the predicted text which possibly occur errors and the operation type of error modification, so that the significance of the confidence coefficient to a user is determined.
In order to achieve the above object, the present invention provides a method for predicting confidence of image text recognition, including: training a text recognition model by using a text image and a text label corresponding to the text image; taking a predicted text output by the text recognition model with the convergence of training and a corresponding text label as input, taking a comparison operation label of the predicted text and a corresponding position of the label content as output, and training to obtain a confidence coefficient model; inputting a text image into the confidence model of training convergence; extracting picture features of the text image by using a deep convolutional neural network; carrying out transverse equal-width segmentation on the picture features to form slices; inputting the slice into an LSTM bidirectional cyclic neural network, and outputting a comparison operation prediction identifier of the slice; utilizing a CTC algorithm to carry out the operation of combining and overlapping characters and placeholders on the comparison operation prediction identification, and outputting a comparison operation prediction sequence with the maximum probability; and predicting the position of the text image in the predicted text of the text recognition model where errors can occur and the type of the required error modification operation according to the comparison operation prediction sequence.
In the above technical solution, preferably, the specific training process for training the text recognition model by using the text image and the text label corresponding to the text image includes: acquiring the text image, and making a corresponding text label aiming at the text image; extracting picture features of the text image by using a deep convolutional neural network; carrying out transverse equal-width segmentation on the picture features to form slices; inputting the slice into an LSTM bidirectional cyclic neural network, and outputting a predicted text of the slice; performing word and placeholder combination and overlapping operation on the predicted text by using a CTC algorithm, and outputting a predicted text sequence with the maximum probability; comparing the predicted text sequence with the text labels and calculating a loss function; and performing gradient descent training on the model according to the predicted text sequence and the loss function of the text label to obtain the converged text recognition model.
In the foregoing technical solution, preferably, the specific training process for obtaining the confidence model by training using the predicted text output by the text recognition model converged by training and the corresponding text label as input and using the comparison operation label of the corresponding position of the predicted text and the label content as output includes:
inputting a text image into the text recognition model which trains convergence; comparing the predicted text output by the text recognition model with the text label, and performing comparison operation labeling on the error modification operation type from the predicted text to the position corresponding to the text label; extracting picture features of the text image by using a deep convolutional neural network; carrying out transverse equal-width segmentation on the picture features to form slices; inputting the slice into an LSTM bidirectional cyclic neural network, and outputting a comparison operation prediction identifier of the slice; utilizing a CTC algorithm to carry out the operation of combining and overlapping characters and placeholders on the comparison operation prediction identification, and outputting a comparison operation prediction sequence with the maximum probability; comparing the comparison operation prediction sequence with the comparison operation label, and calculating a loss function; and performing gradient descent training on the model according to the comparison operation prediction sequence and the loss function labeled by the comparison operation so as to obtain the converged confidence coefficient model.
In the above technical solution, preferably, the comparison operation marking includes not modifying, deleting 1 character, modifying 1 character, and adding 1 character; according to the comparison between the predicted text and the text label: if the characters of the corresponding positions are the same, the current position is marked without modification; if the characters at the corresponding positions are repeated, deleting 1 character mark at the position of the repeated character; if the corresponding position character is wrong, modifying 1 character label before the wrong character; and if the corresponding position character is missing, adding 1 character mark at the position of the missing character.
In the foregoing technical solution, preferably, the predicting, according to the comparison operation prediction sequence, a position where an error may occur in a predicted text of the text image in the text recognition model and a type of a required error modification operation specifically include: judging whether the predicted text at the corresponding position has errors or not according to the marks of different positions in the comparison operation prediction sequence; if the current position is not marked for modification, the character of the current position is predicted to be correct; if the current position is marked by deleting 1 character, predicting that the character at the current position is a repeated character; if the current position is marked by modifying 1 character, predicting that the character at the current position is an error character; if the current position is marked by adding 1 character, predicting that the current position character is missing.
The invention also provides a system for predicting the image text recognition confidence coefficient, which applies any one of the technical schemes to provide a method for predicting the image text recognition confidence coefficient, and comprises the following steps: the text recognition model training module is used for training a text recognition model by using a text image and a text label corresponding to the text image; the confidence model training module is used for training to obtain a confidence model by taking a predicted text output by the text recognition model converged by training and a corresponding text label as input and taking a comparison operation label of a corresponding position of the predicted text and the label content as output; the image input module is used for inputting the text image into the confidence coefficient model which is converged by training; the characteristic extraction module is used for extracting the picture characteristics of the text image by utilizing a deep convolutional neural network; the transverse segmentation module is used for carrying out transverse equal-width segmentation on the picture characteristics to form a slice; the slice prediction module is used for inputting the slice into an LSTM bidirectional circulation neural network and outputting a comparison operation prediction identifier of the slice; the merging operation module is used for performing merging and overlapping word and placeholder operations on the comparison operation prediction identification by utilizing a CTC algorithm and outputting a comparison operation prediction sequence with the maximum probability; and the confidence coefficient prediction module predicts the position of the text image possibly with errors in the predicted text in the text recognition model and the type of the required error modification operation according to the comparison operation prediction sequence.
In the above technical solution, preferably, the specific training process of the text recognition model training module includes: acquiring the text image, and making a corresponding text label aiming at the text image; extracting picture features of the text image by using a deep convolutional neural network; carrying out transverse equal-width segmentation on the picture features to form slices; inputting the slice into an LSTM bidirectional cyclic neural network, and outputting a predicted text of the slice; performing word and placeholder combination and overlapping operation on the predicted text by using a CTC algorithm, and outputting a predicted text sequence with the maximum probability; comparing the predicted text sequence with the text labels and calculating a loss function; and performing gradient descent training on the model according to the predicted text sequence and the loss function of the text label to obtain the converged text recognition model.
In the above technical solution, preferably, the specific training process of the confidence model training module includes: inputting a text image into the text recognition model which trains convergence; comparing the predicted text output by the text recognition model with the text label, and performing comparison operation labeling on the error modification operation type from the predicted text to the position corresponding to the text label; extracting picture features of the text image by using a deep convolutional neural network; carrying out transverse equal-width segmentation on the picture features to form slices; inputting the slice into an LSTM bidirectional cyclic neural network, and outputting a comparison operation prediction identifier of the slice; utilizing a CTC algorithm to carry out the operation of combining and overlapping characters and placeholders on the comparison operation prediction identification, and outputting a comparison operation prediction sequence with the maximum probability; comparing the comparison operation prediction sequence with the comparison operation label, and calculating a loss function; and performing gradient descent training on the model according to the comparison operation prediction sequence and the loss function labeled by the comparison operation so as to obtain the converged confidence coefficient model.
In the above technical solution, preferably, the comparison operation marking includes not modifying, deleting 1 character, modifying 1 character, and adding 1 character; according to the comparison between the predicted text and the text label: if the characters of the corresponding positions are the same, the current position is marked without modification; if the characters at the corresponding positions are repeated, deleting 1 character mark at the position of the repeated character; if the corresponding position character is wrong, modifying 1 character label before the wrong character; and if the corresponding position character is missing, adding 1 character mark at the position of the missing character.
In the foregoing technical solution, preferably, the confidence prediction module is specifically configured to: judging whether the predicted text at the corresponding position has errors or not according to the marks of different positions in the comparison operation prediction sequence; if the current position is not marked for modification, the character of the current position is predicted to be correct; if the current position is marked by deleting 1 character, predicting that the character at the current position is a repeated character; if the current position is marked by modifying 1 character, predicting that the character at the current position is an error character; if the current position is marked by adding 1 character, predicting that the current position character is missing.
Compared with the prior art, the invention has the beneficial effects that: the confidence coefficient model is trained through a convergent text recognition model, picture features are extracted through a deep convolutional neural network, the picture features are transversely sliced, the slices are inferred through an LSTM bidirectional cyclic neural network, after the inference results are merged through a CTC algorithm, errors which may occur in a predicted text of a text image in the text recognition model are predicted, the text recognition and the prediction of the confidence coefficient can obtain extremely high accuracy rate based on the deep neural network mode, the confidence coefficient recognition represents the recognition difficulty degree of the text, the characters in the predicted text can be returned, the operation types of error modification can be returned, and the significance of the confidence coefficient on a user is determined.
Drawings
Fig. 1 is a schematic flowchart of a method for predicting confidence in image text recognition according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a training process of a text recognition model according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating a training process of a confidence model according to an embodiment of the present invention;
FIG. 4 is a block diagram illustrating an image text recognition confidence prediction system according to an embodiment of the present invention.
In the drawings, the correspondence between each component and the reference numeral is:
11. the system comprises a text recognition model training module, a confidence coefficient model training module 12, an image input module 13, a feature extraction module 14, a transverse segmentation module 15, a slice prediction module 16, a merging operation module 17 and a confidence coefficient prediction module 18.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
The invention is described in further detail below with reference to the attached drawing figures:
as shown in fig. 1, the image text recognition confidence prediction method provided by the present invention includes: training a text recognition model by using the text image and a text label corresponding to the text image; taking a predicted text output by the text recognition model with the convergence of training and a corresponding text label as input, taking a comparison operation label of the corresponding position of the predicted text and the label content as output, and training to obtain a confidence coefficient model; inputting the text image into a confidence coefficient model of training convergence; extracting picture characteristics of the text image by using a deep convolutional neural network; carrying out transverse equal-width segmentation on the picture characteristics to form slices; inputting the slices into an LSTM bidirectional cyclic neural network, and outputting comparison operation prediction marks of the slices; performing the operation of combining and overlapping characters and placeholders on the comparison operation prediction identification by using a CTC algorithm, and outputting a comparison operation prediction sequence with the maximum probability; and according to the comparison operation prediction sequence, predicting the position of the predicted text image in the predicted text in the text recognition model where errors can occur and the type of error modification operation required.
In the embodiment, a confidence coefficient model is trained through a convergent text recognition model, picture features are extracted through a deep convolutional neural network, the picture features are transversely sliced, the slices are inferred through an LSTM bidirectional cyclic neural network, after the inference results are merged through a CTC algorithm, errors which may occur in a predicted text of a text image in the text recognition model are predicted, the text recognition and the confidence coefficient can obtain extremely high accuracy rate based on the deep neural network, the confidence coefficient recognition represents the recognition difficulty degree of the text, and the characters in the predicted text can be returned to have errors and the error modification operation type, so that the significance of the confidence coefficient to a user is clarified.
Specifically, a confidence model is obtained through training by using a text recognition model converged through training as a basis in advance, taking a predicted text of a text image and a corresponding text label of the text recognition model as input, and taking whether a character at a corresponding position of the predicted text and the text label is wrong or not and an error modification operation type as output by comparing the predicted text and the text label in advance. After the confidence model is trained and converged, in the practical application process, the text image is directly input into the confidence model, a confidence prediction result can be directly obtained after a series of operations, and the prediction result is the position of a predicted text which is possibly wrong relative to a real text and the type of wrong modification operation if the text image is input into the text recognition model. For the user, according to the confidence degree prediction result, the confidence degree of the prediction result can be confirmed more clearly and more conveniently, and the confidence degree is specifically refined to the text recognition difficulty degree of a specific character of the prediction text.
In the above embodiment, preferably, the specific training process for training the text recognition model by using the text image and the text label corresponding to the text image includes: acquiring a text image, and making a corresponding text label aiming at the text image; extracting picture characteristics of the text image by using a deep convolutional neural network; carrying out transverse equal-width segmentation on the picture characteristics to form slices; inputting the slices into an LSTM bidirectional cyclic neural network, and outputting prediction texts of the slices; performing word combination and overlapping operation on the predicted text by using a CTC algorithm, and outputting a predicted text sequence with the maximum probability; comparing the predicted text sequence with the text label, and calculating a loss function; and performing gradient descent training on the model according to the loss function of the predicted text sequence and the text label to obtain a converged text recognition model.
As shown in fig. 2, specifically, taking a text image written with a numeral "130001" as an example, the text is labeled "130001". In the training process, the specific steps are as follows:
1) extracting text image features by using a deep convolutional neural network;
2) performing horizontal equal-width segmentation on the picture characteristics, inputting all the slices into an LSTM bidirectional cyclic neural network, and outputting prediction texts of all the slices;
3) combining and overlapping the characters and the placeholders by using a CTC algorithm, and outputting a predicted character sequence with the maximum probability;
4) comparing the predicted text with the text label content, and calculating ctc loss;
5) and performing gradient descent training on the model according to the loss to obtain a final training convergence text recognition model.
In the foregoing embodiment, preferably, the specific training process of training to obtain the confidence level model includes:
inputting the text image into a text recognition model for training convergence; comparing the predicted text output by the text recognition model with the text label, and carrying out comparison operation labeling on the error modification operation type from the predicted text to the corresponding position of the text label; extracting picture characteristics of the text image by using a deep convolutional neural network; carrying out transverse equal-width segmentation on the picture characteristics to form slices; inputting the slices into an LSTM bidirectional cyclic neural network, and outputting comparison operation prediction marks of the slices; performing the operation of combining and overlapping characters and placeholders on the comparison operation prediction identification by using a CTC algorithm, and outputting a comparison operation prediction sequence with the maximum probability; comparing the comparison operation prediction sequence with the comparison operation label, and calculating a loss function; and performing gradient descent training on the model according to the comparison operation prediction sequence and the loss function labeled by the comparison operation so as to obtain a converged confidence model.
As shown in fig. 3, in this embodiment, training a confidence level model using a text recognition model that is converged in training specifically includes:
firstly, a predicted text is obtained by utilizing a text recognition model for recognition, the predicted text is compared with a text label, and according to the comparison between the predicted text and the text label, the following 3 operations are allowed except that the label (N) is not modified:
i. deleting 1 character (D)
Modifying 1 character (C)
Adding 1 character (A)
Then, according to the error from the predicted content to the real text annotation, which operations are performed at least, the text annotation content of "130001" is still taken as an example:
i. identifying to obtain a predicted text of '130001', marking a comparison operation as 'NNNN', and indicating that 6 characters do not need to be modified;
identifying that the predicted text is '1130001', and the comparison operation is marked as 'DNNNNNN', which indicates that the first character needs to be deleted;
identifying the predicted text as "120001" and the comparison operation as "NCNNNN" indicating that the second character needs to be modified
identifying the predicted text as "30001" and the comparison operation as "ANNNNNN" indicating that the first character needs to be added.
According to the new comparison operation label, the training process of the confidence coefficient model is as follows:
1) extracting picture features by using a deep convolutional neural network;
2) performing horizontal equal-width segmentation on the picture characteristics, inputting all the slices into an LSTM bidirectional circulation neural network, and outputting comparison operation prediction marks of all the slices;
3) using a CTC algorithm to carry out the operation of combining and overlapping characters and placeholders on the comparison operation prediction identification, and outputting a comparison operation prediction sequence with the maximum probability;
4) comparing the comparison operation prediction sequence with the new comparison operation marking content, and calculating the ctc loss;
5) and carrying out gradient descent training on the model according to the loss to obtain a final confidence coefficient model of training convergence.
In the above embodiment, preferably, the marking of the comparison operation includes not modifying, deleting 1 character, modifying 1 character and adding 1 character; according to the comparison between the predicted text and the text label: if the characters of the corresponding positions are the same, the current position is marked without modification; if the characters at the corresponding positions are repeated, deleting 1 character mark at the position of the repeated character; if the corresponding position character is wrong, modifying 1 character label before the wrong character; and if the corresponding position character is missing, adding 1 character mark at the position of the missing character.
In the above embodiment, preferably, the predicting of the predicted text image in the text recognition model according to the comparison operation prediction sequence may include a location where an error may occur in the predicted text and a type of the error modification operation that is required: judging whether the predicted text of the corresponding position has errors or not according to the marks of different positions in the comparison operation prediction sequence; if the current position is not marked for modification, the character of the current position is predicted to be correct; if the current position is marked by deleting 1 character, predicting that the character at the current position is a repeated character; if the current position is marked by modifying 1 character, predicting that the character at the current position is an error character; if the current position is marked by adding 1 character, predicting that the current position character is missing.
As shown in fig. 4, the present invention further provides a system for predicting an image text recognition confidence, which applies any one of the methods for predicting an image text recognition confidence provided in the foregoing embodiments, and includes: the text recognition model training module 11 is configured to train a text recognition model by using the text image and a text label corresponding to the text image; the confidence model training module 12 is configured to train to obtain a confidence model by taking a predicted text output by the text recognition model converged in the training and a corresponding text label as input and taking a comparison operation label of a corresponding position of the predicted text and a label content as output; an image input module 13, configured to input a text image into a confidence model of training convergence; the feature extraction module 14 is configured to extract picture features of the text image by using a deep convolutional neural network; the transverse segmentation module 15 is used for transversely segmenting the image features into slices in equal width; the slice prediction module 16 is used for inputting the slices into the LSTM bidirectional circulation neural network and outputting comparison operation prediction marks of the slices; a merging operation module 17, configured to perform merging and overlapping word and placeholder operations on the comparison operation prediction identifier by using a CTC algorithm, and output a comparison operation prediction sequence with a maximum probability; and the confidence coefficient prediction module 18 predicts the position of the possible error of the predicted text of the text image in the text recognition model and the type of the error modification operation required according to the comparison operation prediction sequence.
In the embodiment, a confidence coefficient model is trained through a convergent text recognition model, picture features are extracted through a deep convolutional neural network, the picture features are transversely sliced, the slices are inferred through an LSTM bidirectional cyclic neural network, after the inference results are merged through a CTC algorithm, errors which may occur in a predicted text of a text image in the text recognition model are predicted, the text recognition and the confidence coefficient can obtain extremely high accuracy rate based on the deep neural network, the confidence coefficient recognition represents the recognition difficulty degree of the text, and the characters in the predicted text can be returned to have errors and the error modification operation type, so that the significance of the confidence coefficient to a user is clarified.
Specifically, a confidence model is obtained through training by using a text recognition model converged through training as a basis in advance, taking a predicted text of a text image and a corresponding text label of the text recognition model as input, and taking whether a character at a corresponding position of the predicted text and the text label is wrong or not and an error modification operation type as output by comparing the predicted text and the text label in advance. After the confidence model is trained and converged, in the practical application process, the text image is directly input into the confidence model, a confidence prediction result can be directly obtained after a series of operations, and the prediction result is the position of a predicted text which is possibly wrong relative to a real text and the type of wrong modification operation if the text image is input into the text recognition model. For the user, according to the confidence degree prediction result, the confidence degree of the prediction result can be confirmed more clearly and more conveniently, and the confidence degree is specifically refined to the text recognition difficulty degree of a specific character of the prediction text.
In the above embodiment, preferably, the specific training process of the text recognition model training module 11 includes: acquiring a text image, and making a corresponding text label aiming at the text image; extracting picture characteristics of the text image by using a deep convolutional neural network; carrying out transverse equal-width segmentation on the picture characteristics to form slices; inputting the slices into an LSTM bidirectional cyclic neural network, and outputting prediction texts of the slices; performing word combination and overlapping operation on the predicted text by using a CTC algorithm, and outputting a predicted text sequence with the maximum probability; comparing the predicted text sequence with the text label, and calculating a loss function; and performing gradient descent training on the model according to the loss function of the predicted text sequence and the text label to obtain a converged text recognition model.
In the above embodiment, preferably, the specific training process of the confidence model training module 12 includes: inputting the text image into a text recognition model for training convergence; comparing the predicted text output by the text recognition model with the text label, and carrying out comparison operation labeling on the error modification operation type from the predicted text to the corresponding position of the text label; extracting picture characteristics of the text image by using a deep convolutional neural network; carrying out transverse equal-width segmentation on the picture characteristics to form slices; inputting the slices into an LSTM bidirectional cyclic neural network, and outputting comparison operation prediction marks of the slices; performing the operation of combining and overlapping characters and placeholders on the comparison operation prediction identification by using a CTC algorithm, and outputting a comparison operation prediction sequence with the maximum probability; comparing the comparison operation prediction sequence with the comparison operation label, and calculating a loss function; and performing gradient descent training on the model according to the comparison operation prediction sequence and the loss function labeled by the comparison operation so as to obtain a converged confidence model.
In the above embodiment, preferably, the marking of the comparison operation includes not modifying, deleting 1 character, modifying 1 character and adding 1 character; according to the comparison between the predicted text and the text label: if the characters of the corresponding positions are the same, the current position is marked without modification; if the characters at the corresponding positions are repeated, deleting 1 character mark at the position of the repeated character; if the corresponding position character is wrong, modifying 1 character label before the wrong character; and if the corresponding position character is missing, adding 1 character mark at the position of the missing character.
In the above embodiment, preferably, the confidence prediction module 18 is specifically configured to: judging whether the predicted text of the corresponding position has errors or not according to the marks of different positions in the comparison operation prediction sequence; if the current position is not marked for modification, the character of the current position is predicted to be correct; if the current position is marked by deleting 1 character, predicting that the character at the current position is a repeated character; if the current position is marked by modifying 1 character, predicting that the character at the current position is an error character; if the current position is marked by adding 1 character, predicting that the current position character is missing.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method for predicting image text recognition confidence is characterized by comprising the following steps:
training a text recognition model by using a text image and a text label corresponding to the text image;
taking a predicted text output by the text recognition model with the convergence of training and a corresponding text label as input, taking a comparison operation label of the predicted text and a corresponding position of the label content as output, and training to obtain a confidence coefficient model;
inputting a text image into the confidence model of training convergence;
extracting picture features of the text image by using a deep convolutional neural network;
carrying out transverse equal-width segmentation on the picture features to form slices;
inputting the slice into an LSTM bidirectional cyclic neural network, and outputting a comparison operation prediction identifier of the slice;
utilizing a CTC algorithm to carry out the operation of combining and overlapping characters and placeholders on the comparison operation prediction identification, and outputting a comparison operation prediction sequence with the maximum probability;
and predicting the position of the text image in the predicted text of the text recognition model where errors can occur and the type of the required error modification operation according to the comparison operation prediction sequence.
2. The method for predicting image text recognition confidence of claim 1, wherein the specific training process for training the text recognition model by using the text image and the text label corresponding to the text image comprises:
acquiring the text image, and making a corresponding text label aiming at the text image;
extracting picture features of the text image by using a deep convolutional neural network;
carrying out transverse equal-width segmentation on the picture features to form slices;
inputting the slice into an LSTM bidirectional cyclic neural network, and outputting a predicted text of the slice;
performing word and placeholder combination and overlapping operation on the predicted text by using a CTC algorithm, and outputting a predicted text sequence with the maximum probability;
comparing the predicted text sequence with the text labels and calculating a loss function;
and performing gradient descent training on the model according to the predicted text sequence and the loss function of the text label to obtain the converged text recognition model.
3. The method for predicting image text recognition confidence according to claim 1 or 2, wherein the specific training process for training the confidence model includes:
inputting a text image into the text recognition model which trains convergence;
comparing the predicted text output by the text recognition model with the text label, and performing comparison operation labeling on the error modification operation type from the predicted text to the position corresponding to the text label;
extracting picture features of the text image by using a deep convolutional neural network;
carrying out transverse equal-width segmentation on the picture features to form slices;
inputting the slice into an LSTM bidirectional cyclic neural network, and outputting a comparison operation prediction identifier of the slice;
utilizing a CTC algorithm to carry out the operation of combining and overlapping characters and placeholders on the comparison operation prediction identification, and outputting a comparison operation prediction sequence with the maximum probability;
comparing the comparison operation prediction sequence with the comparison operation label, and calculating a loss function;
and performing gradient descent training on the model according to the comparison operation prediction sequence and the loss function labeled by the comparison operation so as to obtain the converged confidence coefficient model.
4. The image text recognition confidence prediction method of claim 3, wherein the comparison operation labels include no modification, deletion of 1 character, modification of 1 character, and addition of 1 character;
according to the comparison between the predicted text and the text label:
if the characters of the corresponding positions are the same, the current position is marked without modification;
if the characters at the corresponding positions are repeated, deleting 1 character mark at the position of the repeated character;
if the corresponding position character is wrong, modifying 1 character label before the wrong character;
and if the corresponding position character is missing, adding 1 character mark at the position of the missing character.
5. The method for predicting image text recognition confidence according to claim 4, wherein the predicting, according to the comparison operation prediction sequence, a position where an error may occur in a predicted text of the text image in the text recognition model and a type of an error modification operation required specifically includes:
judging whether the predicted text at the corresponding position has errors or not according to the marks of different positions in the comparison operation prediction sequence;
if the current position is not marked for modification, the character of the current position is predicted to be correct;
if the current position is marked by deleting 1 character, predicting that the character at the current position is a repeated character;
if the current position is marked by modifying 1 character, predicting that the character at the current position is an error character;
if the current position is marked by adding 1 character, predicting that the current position character is missing.
6. An image text recognition confidence prediction system applying the image text recognition confidence prediction method according to any one of claims 1 to 5, comprising:
the text recognition model training module is used for training a text recognition model by using a text image and a text label corresponding to the text image;
the confidence model training module is used for training to obtain a confidence model by taking a predicted text output by the text recognition model converged by training and a corresponding text label as input and taking a comparison operation label of a corresponding position of the predicted text and the label content as output;
the image input module is used for inputting the text image into the confidence coefficient model which is converged by training;
the characteristic extraction module is used for extracting the picture characteristics of the text image by utilizing a deep convolutional neural network;
the transverse segmentation module is used for carrying out transverse equal-width segmentation on the picture characteristics to form a slice;
the slice prediction module is used for inputting the slice into an LSTM bidirectional circulation neural network and outputting a comparison operation prediction identifier of the slice;
the merging operation module is used for performing merging and overlapping word and placeholder operations on the comparison operation prediction identification by utilizing a CTC algorithm and outputting a comparison operation prediction sequence with the maximum probability;
and the confidence coefficient prediction module predicts the position of the text image possibly with errors in the predicted text in the text recognition model and the type of the required error modification operation according to the comparison operation prediction sequence.
7. The system for predicting image text recognition confidence according to claim 6, wherein the specific training process of the text recognition model training module comprises:
acquiring the text image, and making a corresponding text label aiming at the text image;
extracting picture features of the text image by using a deep convolutional neural network;
carrying out transverse equal-width segmentation on the picture features to form slices;
inputting the slice into an LSTM bidirectional cyclic neural network, and outputting a predicted text of the slice;
performing word and placeholder combination and overlapping operation on the predicted text by using a CTC algorithm, and outputting a predicted text sequence with the maximum probability;
comparing the predicted text sequence with the text labels and calculating a loss function;
and performing gradient descent training on the model according to the predicted text sequence and the loss function of the text label to obtain the converged text recognition model.
8. The image text recognition confidence prediction system of claim 6, wherein the specific training process of the confidence model training module comprises:
inputting a text image into the text recognition model which trains convergence;
comparing the predicted text output by the text recognition model with the text label, and performing comparison operation labeling on the error modification operation type from the predicted text to the position corresponding to the text label;
extracting picture features of the text image by using a deep convolutional neural network;
carrying out transverse equal-width segmentation on the picture features to form slices;
inputting the slice into an LSTM bidirectional cyclic neural network, and outputting a comparison operation prediction identifier of the slice;
utilizing a CTC algorithm to carry out the operation of combining and overlapping characters and placeholders on the comparison operation prediction identification, and outputting a comparison operation prediction sequence with the maximum probability;
comparing the comparison operation prediction sequence with the comparison operation label, and calculating a loss function;
and performing gradient descent training on the model according to the comparison operation prediction sequence and the loss function labeled by the comparison operation so as to obtain the converged confidence coefficient model.
9. The image text recognition confidence prediction system of claim 8, wherein the comparison operation labels include no modification, deletion of 1 character, modification of 1 character, and addition of 1 character;
according to the comparison between the predicted text and the text label:
if the characters of the corresponding positions are the same, the current position is marked without modification;
if the characters at the corresponding positions are repeated, deleting 1 character mark at the position of the repeated character;
if the corresponding position character is wrong, modifying 1 character label before the wrong character;
and if the corresponding position character is missing, adding 1 character mark at the position of the missing character.
10. The image text recognition confidence prediction system of claim 9, wherein the confidence prediction module is specifically configured to:
judging whether the predicted text at the corresponding position has errors or not according to the marks of different positions in the comparison operation prediction sequence;
if the current position is not marked for modification, the character of the current position is predicted to be correct;
if the current position is marked by deleting 1 character, predicting that the character at the current position is a repeated character;
if the current position is marked by modifying 1 character, predicting that the character at the current position is an error character;
if the current position is marked by adding 1 character, predicting that the current position character is missing.
CN202011348779.8A 2020-11-26 2020-11-26 Prediction method and prediction system for image text recognition confidence Pending CN112257965A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011348779.8A CN112257965A (en) 2020-11-26 2020-11-26 Prediction method and prediction system for image text recognition confidence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011348779.8A CN112257965A (en) 2020-11-26 2020-11-26 Prediction method and prediction system for image text recognition confidence

Publications (1)

Publication Number Publication Date
CN112257965A true CN112257965A (en) 2021-01-22

Family

ID=74225519

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011348779.8A Pending CN112257965A (en) 2020-11-26 2020-11-26 Prediction method and prediction system for image text recognition confidence

Country Status (1)

Country Link
CN (1) CN112257965A (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021838A (en) * 2007-03-02 2007-08-22 华为技术有限公司 Text handling method and system
CN101295292A (en) * 2007-04-23 2008-10-29 北大方正集团有限公司 Method and device for modeling and naming entity recognition based on maximum entropy model
CN106708799A (en) * 2016-11-09 2017-05-24 上海智臻智能网络科技股份有限公司 Text error correction method and device, and terminal
CN109284399A (en) * 2018-10-11 2019-01-29 深圳前海微众银行股份有限公司 Similarity prediction model training method, equipment and computer readable storage medium
CN109670494A (en) * 2018-12-13 2019-04-23 深源恒际科技有限公司 A kind of Method for text detection and system of subsidiary recognition confidence
CN109948152A (en) * 2019-03-06 2019-06-28 北京工商大学 A kind of Chinese text grammer error correcting model method based on LSTM
CN109948122A (en) * 2017-12-21 2019-06-28 北京金山安全软件有限公司 Error correction method and device for input text and electronic equipment
CN110378338A (en) * 2019-07-11 2019-10-25 腾讯科技(深圳)有限公司 A kind of text recognition method, device, electronic equipment and storage medium
CN111061904A (en) * 2019-12-06 2020-04-24 武汉理工大学 Local picture rapid detection method based on image content identification
CN111310468A (en) * 2020-01-15 2020-06-19 同济大学 Method for realizing Chinese named entity recognition by using uncertain word segmentation information
CN111539417A (en) * 2020-04-28 2020-08-14 深源恒际科技有限公司 Text recognition training optimization method based on deep neural network
CN111783518A (en) * 2020-05-14 2020-10-16 北京三快在线科技有限公司 Training sample generation method and device, electronic equipment and readable storage medium

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101021838A (en) * 2007-03-02 2007-08-22 华为技术有限公司 Text handling method and system
CN101295292A (en) * 2007-04-23 2008-10-29 北大方正集团有限公司 Method and device for modeling and naming entity recognition based on maximum entropy model
CN106708799A (en) * 2016-11-09 2017-05-24 上海智臻智能网络科技股份有限公司 Text error correction method and device, and terminal
CN109948122A (en) * 2017-12-21 2019-06-28 北京金山安全软件有限公司 Error correction method and device for input text and electronic equipment
CN109284399A (en) * 2018-10-11 2019-01-29 深圳前海微众银行股份有限公司 Similarity prediction model training method, equipment and computer readable storage medium
CN109670494A (en) * 2018-12-13 2019-04-23 深源恒际科技有限公司 A kind of Method for text detection and system of subsidiary recognition confidence
CN109948152A (en) * 2019-03-06 2019-06-28 北京工商大学 A kind of Chinese text grammer error correcting model method based on LSTM
CN110378338A (en) * 2019-07-11 2019-10-25 腾讯科技(深圳)有限公司 A kind of text recognition method, device, electronic equipment and storage medium
CN111061904A (en) * 2019-12-06 2020-04-24 武汉理工大学 Local picture rapid detection method based on image content identification
CN111310468A (en) * 2020-01-15 2020-06-19 同济大学 Method for realizing Chinese named entity recognition by using uncertain word segmentation information
CN111539417A (en) * 2020-04-28 2020-08-14 深源恒际科技有限公司 Text recognition training optimization method based on deep neural network
CN111783518A (en) * 2020-05-14 2020-10-16 北京三快在线科技有限公司 Training sample generation method and device, electronic equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN109670494B (en) Text detection method and system with recognition confidence
CN109308476A (en) Billing information processing method, system and computer readable storage medium
CN109102844B (en) Automatic calibration method for clinical test source data
CN112396049A (en) Text error correction method and device, computer equipment and storage medium
CN112257613B (en) Physical examination report information structured extraction method and device and computer equipment
CN111881902B (en) Training sample making method, training sample making device, computer equipment and readable storage medium
CN112508011A (en) OCR (optical character recognition) method and device based on neural network
CN113762269B (en) Chinese character OCR recognition method, system and medium based on neural network
CN113255583B (en) Data annotation method and device, computer equipment and storage medium
CN112766255A (en) Optical character recognition method, device, equipment and storage medium
CN113221632A (en) Document picture identification method and device and computer equipment
CN114238575A (en) Document parsing method, system, computer device and computer-readable storage medium
CN111539417B (en) Text recognition training optimization method based on deep neural network
CN114528394B (en) Text triple extraction method and device based on mask language model
CN112257965A (en) Prediction method and prediction system for image text recognition confidence
CN111340031A (en) Equipment almanac target information extraction and identification system based on image identification and method thereof
CN115543915A (en) Automatic database building method and system for personnel file directory
CN112560849B (en) Neural network algorithm-based grammar segmentation method and system
CN113837067A (en) Organ contour detection method and device, electronic equipment and readable storage medium
EP3757825A1 (en) Methods and systems for automatic text segmentation
CN113962196A (en) Resume processing method and device, electronic equipment and storage medium
CN111078869A (en) Method and device for classifying financial websites based on neural network
CN113553852B (en) Contract information extraction method, system and storage medium based on neural network
CN116996470B (en) Rich media information sending system
CN117037190B (en) Seal identification management system based on data analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination