CN115147847A

CN115147847A - Text recognition result determining method and device, storage medium and computer equipment

Info

Publication number: CN115147847A
Application number: CN202210885785.XA
Authority: CN
Inventors: 顾善中; 惠慧; 田晓明
Original assignee: Seuic Technologies Co Ltd
Current assignee: Seuic Technologies Co Ltd
Priority date: 2022-07-26
Filing date: 2022-07-26
Publication date: 2022-10-04

Abstract

When performing OCR recognition on a text image to be recognized, determining a character corresponding to a maximum likelihood value in a probability distribution of each time step, then judging whether the character corresponding to the maximum likelihood value in the probability distribution of the time step is a character in a pre-configured error-prone character grouping list or not for each time step, and if not, directly taking the character corresponding to the maximum likelihood value in the probability distribution of the time step as an output character of the time step; if yes, updating the maximum probability value based on the probability value of the error-prone characters in the probability distribution of the time step, corresponding to the maximum probability value, of the character pairs in the error-prone character grouping list, and taking the characters after updating the maximum probability value as output characters of the time step, so that the condition that the overall score is too low due to ambiguity of the error-prone characters can be avoided, and the accuracy of the recognition result is effectively improved.

Description

Text recognition result determining method and device, storage medium and computer equipment

Technical Field

The present application relates to the field of text recognition technologies, and in particular, to a method and an apparatus for determining a text recognition result, a storage medium, and a computer device.

Background

OCR (Optical Character Recognition) refers to a technique capable of recognizing text to be recognized in an image to be recognized. The OCR is widely applied to the fields of logistics, medical treatment, finance, insurance and the like, and can be carried on the PDA, so that the application in more fields is realized. The PDA comprises a consumer product PDA and an industrial PDA, wherein the consumer product PDA comprises a smart phone, a tablet personal computer, a handheld game machine and the like; the industrial-grade PDA is mainly applied to the fields of factory manufacturing, logistics storage, outdoor material inspection, and the like, and a barcode scanner (also called a bargun), an RFID reader, a POS machine, and the like are commonly used, and these fields may be called PDAs. The industrial-grade PDA can be used in a plurality of places with severe environment, simultaneously, a plurality of optimizations are made aiming at the industrial use characteristics, the RFID reading and writing and bar code scanning functions are supported, and the industrial-grade PDA has IP54 and above, which are not possessed by the consumer handheld terminal.

At present, when the OCR recognition technology is applied to an industrial-grade PDA, a confidence score is generally given to a recognition result of the OCR recognition, and whether the recognition result is reliable is detected according to the score result. The prediction of the normal deep learning CTC algorithm is to score the best value of all time sequences, but some characters are confusing characters in the dictionary, such as 1 and I,0 and O, and the ambiguity of the confusing characters can cause the overall score to be too low, so that some correct recognition results are easily filtered out, and the accuracy of the recognition results is reduced.

Disclosure of Invention

The purpose of the present application is to solve at least one of the above technical defects, and in particular, to solve the technical defect that when a recognition result is subjected to confidence score scoring in the prior art, the ambiguity of a confusing character may cause an overall score to be too low, so that some correct recognition results are easily filtered out, and thus the accuracy of the recognition result is reduced.

The application provides a text recognition result determination method, which comprises the following steps:

performing OCR recognition on a text image to be recognized to obtain probability distribution of all character categories output by the text image at each time step;

traversing the probability distribution of all character types output at each time step, and acquiring characters corresponding to the maximum probability value in the probability distribution at each time step;

determining whether the character corresponding to the maximum probability value in the probability distribution of each time step is a character in a pre-configured error-prone character grouping list or not;

if yes, updating the maximum probability value based on the probability value of the error-prone characters in the probability distribution of the time step, corresponding to the characters with the maximum probability value, in the error-prone character grouping list, and taking the characters with the updated maximum probability value as output characters of the time step; wherein the updated maximum probability value is greater than the maximum probability value before updating;

if not, directly taking the character corresponding to the maximum probability value in the probability distribution of the time step as the output character of the time step;

and splicing the output characters of all time steps to obtain a character sequence, and decoding the character sequence to obtain a final recognition result.

Optionally, the method further comprises:

and correcting the recognition result according to the recognition scene corresponding to the text image to obtain a corrected result.

Optionally, the modifying the recognition result according to the recognition scene corresponding to the text image to obtain a modified result includes:

determining an identification scene corresponding to the text image and an easy-to-appear character list corresponding to the identification scene;

comparing each output character in the recognition result with the easy-to-appear characters in the easy-to-appear character list respectively, and determining whether the output characters which are not contained in the easy-to-appear character list exist in the recognition result;

if yes, replacing the output characters which are not contained in the character list easy to appear with error-prone characters which are matched with the output characters in the error-prone character grouping list;

and taking the recognition result after replacing the output character as a correction result.

Optionally, performing OCR recognition on the text image to be recognized to obtain probability distributions of all character categories output at each time step, where the probability distributions include:

inputting a text image to be recognized into a pre-configured CRNN for OCR recognition, and extracting the characteristics of the text image through a convolution layer of the CRNN to obtain a plurality of characteristic graphs;

and after converting each characteristic diagram into a characteristic vector, sequentially inputting each characteristic vector into a circulation layer of the CRNN, and predicting characters corresponding to each characteristic vector through the circulation layer to obtain the probability distribution of all character categories output by the circulation layer at each time step.

Optionally, the updating the maximum probability value based on the probability value of the error-prone character of the character pair corresponding to the maximum probability value in the probability distribution of the time step includes:

determining error-prone characters of character pairs corresponding to the maximum probability value in the error-prone character grouping list and probability values of the error-prone characters in probability distribution of the time step;

and adding the probability value of the error-prone character in the probability distribution of the time step with the maximum probability value, and updating the maximum probability value according to the addition result.

Optionally, the decoding the character sequence to obtain a final recognition result includes:

traversing the character sequence and determining whether the character sequence contains a placeholder;

if not, combining the repeated output characters under the continuous time step, and taking the combined character sequence as a final recognition result;

and if so, combining the repeated output characters in the character sequence under the continuous time step according to the placeholders, and removing the placeholders in the combined character sequence to obtain a final recognition result.

Optionally, the method further comprises:

multiplying the maximum probability values corresponding to the output characters in the recognition result to obtain a product result;

and taking the product result as a confidence score of the recognition result.

The present application further provides a device for determining a text recognition result, including:

the text recognition module is used for performing OCR recognition on a text image to be recognized to obtain probability distribution of all character categories output by the text image at each time step;

the character acquisition module is used for traversing the probability distribution of all character types output at each time step and acquiring characters corresponding to the maximum probability value in the probability distribution at each time step;

the error-prone character judgment module is used for determining whether the characters corresponding to the maximum probability values in the probability distribution of each time step are characters in a pre-configured error-prone character grouping list or not;

a first character determining module, configured to update the maximum probability value based on a probability value of an error-prone character in the probability distribution of the time step of the error-prone character paired with the character corresponding to the maximum probability value in the error-prone character grouping list, and use the character with the updated maximum probability value as an output character of the time step; wherein the updated maximum probability value is greater than the maximum probability value before updating;

the second character determining module is used for directly taking the character corresponding to the maximum probability value in the probability distribution of the time step as the output character of the time step if the probability distribution of the time step is not the maximum probability value;

and the recognition result determining module is used for splicing the output characters of all time steps to obtain a character sequence, and decoding the character sequence to obtain a final recognition result.

The present application further provides a storage medium having stored therein computer-readable instructions, which, when executed by one or more processors, cause the one or more processors to perform the steps of the method of determining a text recognition result as described in any of the above embodiments.

The present application further provides a computer device, comprising: one or more processors, and a memory;

the memory has stored therein computer readable instructions which, when executed by the one or more processors, perform the steps of the method of determining text recognition results as described in any of the above embodiments.

According to the technical scheme, the embodiment of the application has the following advantages:

according to the method, the device, the storage medium and the computer equipment for determining the text recognition result, when OCR recognition is performed on a text image to be recognized, probability distribution of all character types output by the text image at each time step can be obtained firstly, then characters corresponding to the maximum probability value in the probability distribution of each time step can be determined, then whether the characters corresponding to the maximum probability value in the probability distribution of the time step are characters in a pre-configured error-prone character grouping list or not is judged aiming at each time step, if not, the characters corresponding to the maximum probability value in the probability distribution of the time step are directly used as output characters of the time step, and therefore the accuracy of the recognition result can be improved to a certain extent; if so, updating the maximum probability value based on the probability value of the error-prone characters in the probability distribution of the time step corresponding to the character pair with the maximum probability value in the error-prone character grouping list, and using the characters with the updated maximum probability value as the output characters of the time step.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1 is a schematic flowchart of a method for determining a text recognition result according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a text image provided in an embodiment of the present application;

fig. 3 is a display diagram of a recognition result of a text image according to an embodiment of the present disclosure;

fig. 4 is a diagram illustrating a correction result of a text image according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of a device for determining a text recognition result according to an embodiment of the present disclosure;

fig. 6 is a schematic internal structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

At present, when the OCR recognition technology is applied to an industrial-grade PDA, a confidence score is generally given to a recognition result of the OCR recognition, and whether the recognition result is reliable is detected according to the score result. The prediction of the normal deep learning CTC algorithm is to score the best value of all time sequences, but some characters are confusing characters in a dictionary, such as 1 and I,0 and O, and the ambiguity of the confusing characters can cause the overall score to be too low, so that some correct recognition results are easily filtered out, and the accuracy of the recognition results is reduced.

Based on this, the following technical scheme is proposed in the application, and specifically participates in the following:

in an embodiment, as shown in fig. 1, fig. 1 is a schematic flowchart of a method for determining a text recognition result provided in an embodiment of the present application, where the present application provides a method for determining a text recognition result, and the method may include:

s110: and performing OCR recognition on the text image to be recognized to obtain the probability distribution of all character categories output by the text image at each time step.

In this step, after the text image to be recognized is obtained, OCR recognition may be performed on the text image, and probability distributions of all character categories output by the text image at each time step are obtained.

The text image to be identified in the application includes but is not limited to corresponding text images of bank card numbers, identity documents, weight information, steel seal, face sheets and the like. When the OCR recognition is carried out on the text image, the text block to be recognized in the text image can be obtained, if the text block in the designated area in the text image is cut, the text block to be recognized and the like are obtained, then the OCR recognition is carried out on the text block to be recognized, so that the influence of the complex background in the text image on the recognition result can be filtered, the recognition efficiency of the OCR recognition can be improved to a certain extent, the waiting time of a user is saved, and the user experience is improved.

It will be appreciated that the present application provides an OCR recognition component that employs the sight positioning as a reference point and captures several text blocks in a sample image that appear in a field of view (FOV). Therefore, the text block closest to the center point of the text image and the text block on the same line with the text block can be selected as the text block to be recognized.

Further, when performing OCR recognition on a text image, different recognition models can be selected for text recognition, but since a text block in the text image has a time sequence feature, no matter what recognition model is selected for text recognition, a text sequence with the time sequence feature in the text image needs to be focused, and texts at different positions in the text sequence are sequentially predicted, so that probability distributions of all character categories output by the text image at each time step are obtained, and text contents of the text image are predicted through the probability distributions of all character categories output by the text image at each time step.

In a specific implementation manner, a CRNN (conditional recovery Neural Network) Network can be selected for OCR recognition of a text image, and the CRNN Network is mainly used for recognizing a text sequence with an indefinite length end to end, so that the text recognition is converted into a sequence learning problem with time sequence dependence without cutting a single character, and thus, accurate prediction can be realized by combining context information in the text image.

S120: and traversing the probability distribution of all character categories output at each time step, and acquiring characters corresponding to the maximum probability value in the probability distribution at each time step.

In this step, after OCR recognition is performed on the text image to be recognized through S110 to obtain probability distributions of all character categories output by the text image at each time step, the application may also traverse the probability distributions of all character categories output at each time step, and obtain a character corresponding to the maximum probability value in the probability distribution at each time step.

It can be understood that, after the probability distributions of all character categories output at each time step are obtained, in order to further improve the accuracy of OCR recognition, the method and the device can traverse the probability distributions of all character categories output at each time step, and screen out characters corresponding to the maximum probability value in the probability distributions at each time step, so that a final recognition result is determined according to the characters corresponding to the maximum probability value, and the accuracy of OCR recognition can be effectively improved.

S130: determining whether the character corresponding to the maximum probability value in the probability distribution of each time step is a character in a pre-configured error-prone character grouping list or not; if yes, go to S140; if not, executing S150.

In this step, after the characters corresponding to the maximum probability value in the probability distribution of each time step are obtained through S120, the application may determine, for each time step, whether the characters corresponding to the maximum probability value in the probability distribution of the time step are characters in a pre-configured error-prone character grouping list, so that the maximum probability values at one or more time steps may be adjusted according to the determination result, and a situation that the correct recognition result is filtered due to too low overall score caused by ambiguity of the confusing characters is avoided.

Specifically, when determining whether the character corresponding to the maximum probability value in the probability distribution of the time step is a character in the error-prone character grouping list, the character corresponding to the maximum probability value in the probability distribution of the time step may be respectively compared with the characters in the error-prone character grouping list, if the character corresponding to the maximum probability value is the same as the characters in the error-prone character grouping list, it indicates that the character corresponding to the maximum probability value is a character in the error-prone character grouping list, otherwise, it is not.

The error-prone character grouping list refers to a grouping list corresponding to characters which are prone to errors when a text image in a digital-letter scene is identified, the grouping list may include (1, i, 1), (0, o), (Z, 2), (C, C), (W, W), (V, V), (U, U), and when a character corresponding to a maximum probability value appears in the grouping list, that is, the character corresponding to the maximum probability value is a character in the error-prone character grouping list.

S140: and updating the maximum probability value based on the probability value of the error-prone characters in the probability distribution of the time step, corresponding to the character pair with the maximum probability value, in the error-prone character grouping list, and taking the characters after updating the maximum probability value as output characters of the time step.

In this step, after determining whether the character corresponding to the maximum probability value in the probability distribution of each time step is a character in a pre-configured error-prone character grouping list through S130, if the character corresponding to the maximum probability value in the probability distribution of one or more time steps is a character in the error-prone character grouping list, the maximum probability value may be updated according to the probability value of the error-prone character paired with the character corresponding to the maximum probability value in the error-prone character grouping list in the probability distribution of the time step, and the character with the updated maximum probability value is used as the output character of the time step.

It can be understood that, when the application performs OCR recognition on the text image, the probability distribution of all character categories output by the text image at each time step is obtained, so that the probability distribution includes not only the character corresponding to the maximum probability value, but also the error-prone character of the character pair corresponding to the maximum probability value in the error-prone character grouping list and the probability value corresponding to the error-prone character.

When the characters corresponding to the maximum probability value in the probability distribution of one or more time steps are the characters in the error-prone character grouping list, the maximum probability value can be updated according to the probability values of the error-prone characters in the probability distribution of the time steps, wherein the error-prone characters are paired with the characters corresponding to the maximum probability value in the error-prone character grouping list, for example, the maximum probability value and the probability values of the error-prone characters are added or weighted and summed, if the maximum probability values and the probability values of the error-prone characters are both greater than 0, the maximum probability value can also be updated through a power function, of course, the maximum probability value can also be updated through other modes, as long as the updated maximum probability value is greater than the maximum probability value before updating, and no limitation is made here.

S150: and directly taking the character corresponding to the maximum probability value in the probability distribution of the time step as the output character of the time step.

In this step, after determining whether the character corresponding to the maximum probability value in the probability distribution at each time step is a character in the pre-configured error-prone character grouping list through S130, if there is no time step in which the character corresponding to the maximum probability value in the probability distribution is a character in the error-prone character grouping list, the character corresponding to the maximum probability value in the probability distribution at each time step may be directly used as the output character at the time step.

S160: and splicing the output characters of all time steps to obtain a character sequence, and decoding the character sequence to obtain a final recognition result.

In this step, after the output characters at each time step are obtained through S140 and S150, the application can splice the output characters at all time steps to obtain a character list, and then can decode the character list to obtain a final recognition result.

It can be understood that, in order to improve the OCR recognition precision, the method and the device can splice the output characters of all time steps to obtain a character sequence, so that the character sequence is a sequence formed by combining the characters with the maximum probability value at each time step, and the character sequence is decoded to obtain a recognition result with higher accuracy.

The process of decoding refers to a process of translating the character sequence into a final recognition result, and the process may include removing redundant information and the like. Generally, when time-series classification is performed, many redundant information inevitably occurs, for example, a letter is recognized twice in succession, and in this case, a redundancy removing mechanism is required to remove the redundant information in the character sequence, so as to obtain a final recognition result.

In the above embodiment, when performing OCR recognition on a text image to be recognized, probability distributions of all character categories output by the text image at each time step may be obtained first, then, according to the application, a character corresponding to an extreme probability value in the probability distribution at each time step may be determined, and then, for each time step, whether the character corresponding to the extreme probability value in the probability distribution at the time step is a character in a pre-configured error-prone character grouping list is determined, and if not, the character corresponding to the extreme probability value in the probability distribution at the time step is directly used as an output character at the time step, so that the accuracy of a recognition result may be improved to a certain extent; if so, updating the maximum probability value based on the probability value of the error-prone characters in the probability distribution of the time step corresponding to the character pair with the maximum probability value in the error-prone character grouping list, and using the characters with the updated maximum probability value as the output characters of the time step.

In one embodiment, the method may further comprise:

s170: and correcting the recognition result according to the recognition scene corresponding to the text image to obtain a corrected result.

In the embodiment, after the recognition result is obtained, the recognition result can be corrected according to the recognition scene corresponding to the text image, so that the obtained correction result is more accurate relative to the original recognition result, and the accuracy of the OCR recognition result is further improved.

In an embodiment, the modifying the recognition result according to the recognition scene corresponding to the text image in S170 to obtain a modified result may include:

s171: and determining a recognition scene corresponding to the text image and a character list easy to appear corresponding to the recognition scene.

S172: comparing each output character in the recognition result with the easy-to-appear characters in the easy-to-appear character list respectively, and determining whether the recognition result has output characters which are not included in the easy-to-appear character list; if yes, go to S173; and if the identification result does not exist, directly taking the identification result as a correction result.

S173: and replacing the output characters which are not contained in the character list easy to appear with the error-prone characters which are paired with the output characters in the error-prone character grouping list.

S174: and taking the recognition result after replacing the output character as a correction result.

In this embodiment, when a recognition result is corrected, a recognition scene corresponding to a text image and an easy-to-appear character list corresponding to the recognition scene may be determined, then each output character in the recognition result is compared with an easy-to-appear character in the easy-to-appear character list, it is determined whether an output character not included in the easy-to-appear character list exists in the recognition result, if so, it is indicated that the output character in the recognition result is a character with a recognition error to a great extent, at this time, the output character may be replaced with an easy-to-error character paired with the output character in the easy-to-error character grouping list, and the recognition result after replacing the output character is taken as a correction result; if the output character does not exist, the output character in the recognition result basically has no recognition error, and at this moment, the recognition result can be directly used as a correction result.

For example, when the text image is a text image acquired in a license plate scene, since no character O or O appears in the license plate scene, if O or O appears in the recognition result, the character O or O may be replaced with 0. Further, as shown in fig. 2, 3 and 4, fig. 2 is a schematic view of a text image provided in the embodiment of the present application, fig. 3 is a display view of an identification result of the text image provided in the embodiment of the present application, and fig. 4 is a display view of a correction result of the text image provided in the embodiment of the present application; as can be seen from fig. 2, the format of the second line of text of the text image in fig. 2 is number (alphabetic letters) number, numeric digits, and the recognition result obtained after the first recognition is shown in fig. 3, the recognition result is corrected according to the correction method in the application, after the characters from the fourth to the tenth positions are replaced by 0, the correction result is shown in fig. 4, after the processing, the accuracy of the recognition result is greatly improved, and the recognition result cannot be filtered out due to the low scoring caused by the confusable characters.

It should be noted that the method and the device for identifying the text image can construct the character list which is easy to appear in advance according to the characters which are relatively common in the text image under different scenes, so that after the identification result is obtained, whether the current identification result is accurate or not can be judged according to the character list which is easy to appear, adjustment can be performed under the condition of inaccuracy, and the method and the device are beneficial to further improving the accuracy of text identification.

In one embodiment, in S110, performing OCR recognition on the text image to be recognized to obtain probability distributions of all character categories output at each time step, may include:

s111: inputting a text image to be recognized into a pre-configured CRNN for OCR recognition, and performing feature extraction on the text image through a convolution layer of the CRNN to obtain a plurality of feature maps.

S112: and after converting each characteristic diagram into a characteristic vector, sequentially inputting each characteristic vector into a circulation layer of the CRNN, and predicting characters corresponding to each characteristic vector through the circulation layer to obtain the probability distribution of all character categories output by the circulation layer at each time step.

In this embodiment, when performing OCR recognition on a text image, a CRNN network may be used, the CRNN network may include a convolution layer and a circulation layer, and features of the text image may be extracted through the convolution layer of the CRNN network to obtain a plurality of feature maps, and the plurality of feature maps may be converted into feature vectors, and the feature vectors may perform character prediction through the circulation layer of the CRNN network, so as to obtain probability distributions of all character categories output by the text image at each time step.

For example, the height of the text image input to the CRNN network is 32, after convolution by the convolutional layer, the height of the text image becomes 1, and the width thereof may be 160, so that the size of the text image input to the convolutional layer may be (channel, height, width) = (1, 32, 160), and the output size may be (512,1, 40), that is, 512 feature maps each having a height of 1 and a width of 40 may be obtained after feature extraction by the convolutional layer. Since the feature sequence is input by the loop layer, when the feature map is obtained, a feature vector sequence can be extracted from the feature map, each feature vector is generated on the feature map in columns from left to right, each column contains 512-dimensional features, which means that the ith feature vector is the connection of all pixels in the ith column of the feature map, and the feature vectors form a sequence. Each column of the feature map (i.e., one feature vector) corresponds to a rectangular region of the text image (called the receptive field), and these rectangular regions are in the same order as the corresponding columns from left to right on the feature map. Thus, each vector in the sequence of feature vectors is associated with a receptive field.

Next, the application may take each feature vector in the feature vector sequence as an input for a loop layer at a time step (time step). It can be understood that the present application may obtain 40 feature vectors through the above steps, each feature vector has a length of 512, and only one feature vector is introduced into one time step in the loop layer for classification, so that there may be 40 time steps in the loop layer. At each time step, the loop layer can predict which character the feature vector input currently is, and output the probability distribution of all character categories as a vector with the length of the number of character categories, and after outputting 40 vectors with the length of the number of character categories, the 40 vectors with the length of the number of character categories can form a posterior probability matrix, and the posterior probability matrix can be transcribed into the recognition result of the text image by the text recognition result determination method.

In one embodiment, the updating the maximum probability value based on the probability values of the probability distributions of the error-prone characters of the character pairs corresponding to the maximum probability value at the time step in the error-prone character grouping list in S140 may include:

s141: and determining the error-prone characters of the character pairs corresponding to the maximum probability value in the error-prone character grouping list and the probability values of the error-prone characters in the probability distribution of the time step.

S142: and adding the probability value of the error-prone character in the probability distribution of the time step with the maximum probability value, and updating the maximum probability value according to the addition result.

In this embodiment, when the character corresponding to the maximum probability value in the probability distribution of one or more time steps is a character in the error-prone character grouping list, the maximum probability value may be updated according to the probability value of the error-prone character of the character pair corresponding to the maximum probability value in the error-prone character grouping list in the probability distribution of the time step.

Specifically, when the extreme probability value is updated, the error-prone character paired with the character corresponding to the maximum probability value in the error-prone character grouping list and the probability value of the error-prone character in the probability distribution of the time step can be determined firstly, then, the probability value of the error-prone character in the probability distribution of the time step can be added with the maximum probability value, the probability value obtained after addition is used as a new maximum probability value to update the original maximum probability value, the updated maximum probability value is compared with the original maximum probability value, the numerical value is relatively large, and the situation that the overall score is too low due to ambiguity of the error-prone character and the correct recognition result is filtered can be avoided.

In an embodiment, the decoding the character sequence in S160 to obtain a final recognition result may include:

s161: traversing the character sequence and determining whether the character sequence contains placeholders; if not, executing S162; if so, then S163 is executed.

S162: and combining the repeated output characters under the continuous time steps, and taking the combined character sequence as a final recognition result.

S163: and combining repeated output characters in the character sequence under the continuous time step according to the placeholders, and removing the placeholders in the combined character sequence to obtain a final recognition result.

In this embodiment, after a character sequence formed by splicing maximum probability values at each time step is obtained, the application may first traverse the character sequence, determine whether the character sequence includes a placeholder, if so, merge repeated output characters at consecutive time steps in the character sequence according to the placeholder, and remove the placeholder in the merged character sequence, thereby obtaining a final recognition result; and when the character sequence does not contain the placeholder, the repeated output characters under the continuous time step can be directly merged, and the merged character sequence is used as the final recognition result.

It can be understood that, in the process of decoding the character sequence and obtaining the final recognition result, redundant information may occur in the character sequence because the character sequence may repeatedly recognize the same text during time-series classification. For example, when a text "ab" needs to be recognized, if the text is divided into 5 time steps for recognition, there may occur that the times t0, t1, and t2 are mapped to "a", the times t3, and t4 are mapped to "b", and these character sequences are connected to obtain "aaabb", where the letter a is repeated three times and the letter b is repeated two times, and therefore, when decoding is performed subsequently, repeated characters need to be combined to obtain a final recognition result "ab".

Further, since the text itself in some text images has repeated characters, such as words like book, hello, bok and hello will be obtained if consecutive characters are combined in the above manner, which may result in inaccurate recognition results. Therefore, a "-" symbol may be used to represent a placeholder, when outputting a character sequence, a "-" symbol may be inserted between repeated characters in a text label, for example, the output sequence is "bbooo-ook", and finally mapped to "book", that is, if placeholders are separated, consecutive identical output characters may not be combined, and only consecutive identical output characters without placeholder separation need to be combined, and then the placeholder in the combined character sequence is deleted, so as to obtain a final recognition result.

In one embodiment, the method may further comprise:

s181: and multiplying the maximum probability values corresponding to the output characters in the recognition result to obtain a product result.

S182: and taking the product result as a confidence score of the recognition result.

In this embodiment, after the final recognition result is obtained, in order to verify the reliability of the recognition result, the confidence score may be further performed on the recognition result, and the user may judge the reliability of the current recognition result according to the confidence score.

Specifically, when the recognition result is marked, each output character in the recognition result is the character corresponding to the maximum probability value, so that the maximum probability values corresponding to the output characters can be directly multiplied, and the multiplication result is used as the confidence score of the recognition result.

The calculation amount of the scoring method is the same as that of a greedy algorithm, but the correlation among different moments is considered, so that the scoring method is more robust to error-prone characters.

The following describes a text recognition result determining apparatus provided in the embodiment of the present application, and the text recognition result determining apparatus described below and the text recognition result determining method described above may be referred to in correspondence with each other.

In an embodiment, as shown in fig. 5, fig. 5 is a schematic structural diagram of a device for determining a text recognition result according to an embodiment of the present application; the present application further provides a device for determining a text recognition result, which may include a text recognition module 210, a character acquisition module 220, an error-prone character determination module 230, a first character determination module 240, a second character determination module 250, and a recognition result determination module 260, and specifically includes the following:

the text recognition module 210 is configured to perform OCR recognition on a text image to be recognized, and obtain probability distributions of all character categories output by the text image at each time step.

The character obtaining module 220 is configured to traverse probability distributions of all character categories output at each time step, and obtain a character corresponding to the maximum probability value in the probability distribution at each time step.

And an error-prone character determining module 230, configured to determine, for each time step, whether a character corresponding to the maximum probability value in the probability distribution of the time step is a character in a pre-configured error-prone character grouping list.

A first character determining module 240, configured to update the maximum probability value based on a probability value of an error-prone character in the probability distribution of the time step of the error-prone character paired with the character corresponding to the maximum probability value in the error-prone character grouping list, and use the character with the updated maximum probability value as an output character of the time step; and the updated maximum probability value is greater than the maximum probability value before updating.

And a second character determining module 250, configured to, if not, directly use a character corresponding to the maximum probability value in the probability distribution of the time step as the output character of the time step.

And the recognition result determining module 260 is configured to splice the output characters at all time steps to obtain a character sequence, and decode the character sequence to obtain a final recognition result.

In one embodiment, the apparatus may further include:

and the result correction module is used for correcting the recognition result according to the recognition scene corresponding to the text image to obtain a correction result.

In one embodiment, the result modification module may include:

and the scene and list determining module is used for determining the identification scene corresponding to the text image and the character list which is easy to appear and corresponds to the identification scene.

And the character comparison module is used for comparing each output character in the recognition result with the easy-to-appear characters in the easy-to-appear character list respectively and determining whether the output characters which are not contained in the easy-to-appear character list exist in the recognition result.

And the character replacing module is used for replacing the output characters which are not contained in the easily-appearing character list with the error-prone characters which are matched with the output characters in the error-prone character grouping list if the output characters exist.

And the correction result determining module is used for taking the recognition result after the output character is replaced as the correction result.

In one embodiment, the text recognition module 210 may include:

the feature extraction module is used for inputting a text image to be identified into a pre-configured CRNN (CrNN) network for OCR (optical character recognition), and extracting features of the text image through a convolution layer of the CRNN network to obtain a plurality of feature maps.

And the character prediction module is used for converting each characteristic diagram into a characteristic vector, sequentially inputting each characteristic vector into a circulation layer of the CRNN, and predicting characters corresponding to each characteristic vector through the circulation layer to obtain the probability distribution of all character categories output by the circulation layer at each time step.

In one embodiment, the first character determination module 240 may include:

and the character and probability value determining module is used for determining the error-prone characters of the character pairs corresponding to the maximum probability value in the error-prone character grouping list and the probability values of the error-prone characters in the probability distribution of the time step.

And the probability adding module is used for adding the probability value of the error-prone character in the probability distribution of the time step with the maximum probability value and updating the maximum probability value according to the addition result.

In one embodiment, the recognition result determining module 260 may include:

and the traversing module is used for traversing the character sequence and determining whether the character sequence contains a placeholder.

And the first merging module is used for merging repeated output characters under continuous time steps if the output characters are not included, and taking the merged character sequence as a final recognition result.

And the second merging module is used for merging repeated output characters in the character sequence under the continuous time step according to the placeholder if the output characters are included, and eliminating the placeholder in the merged character sequence to obtain a final recognition result.

In one embodiment, the apparatus may further include:

and the product module is used for multiplying the maximum probability values corresponding to the output characters in the recognition result to obtain a product result.

And the confidence score module is used for taking the product result as the confidence score of the recognition result.

In one embodiment, the present application further provides a storage medium having computer-readable instructions stored therein, which when executed by one or more processors, cause the one or more processors to perform the steps of the method for determining a text recognition result as described in any one of the above embodiments.

In one embodiment, the present application further provides a computer device comprising: one or more processors, and a memory.

The memory has stored therein computer readable instructions which, when executed by the one or more processors, perform the steps of the method of determining text recognition results as described in any one of the above embodiments.

Fig. 6 is a schematic diagram illustrating an internal structure of a computer device according to an embodiment of the present disclosure, and the computer device 300 may be provided as a server, as shown in fig. 6. Referring to fig. 6, computer device 300 includes a processing component 302 that further includes one or more processors, and memory resources, represented by memory 301, for storing instructions, such as application programs, that are executable by processing component 302. The application programs stored in memory 301 may include one or more modules that each correspond to a set of instructions. Further, the processing component 302 is configured to execute instructions to perform the method of determining a text recognition result of any of the embodiments described above.

The computer device 300 may also include a power component 303 configured to perform power management of the computer device 300, a wired or wireless network interface 304 configured to connect the computer device 300 to a network, and an input output (I/O) interface 305. The computer device 300 may operate based on an operating system stored in memory 301, such as Windows Server, mac OS XTM, unix, linux, free BSDTM, or the like.

Those skilled in the art will appreciate that the architecture shown in fig. 6 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, the embodiments may be combined as needed, and the same and similar parts may be referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for determining a text recognition result, the method comprising:

2. The method for determining a text recognition result according to claim 1, further comprising:

3. The method for determining the text recognition result according to claim 2, wherein the modifying the recognition result according to the recognition scene corresponding to the text image to obtain a modified result comprises:

comparing each output character in the recognition result with the easy-to-appear characters in the easy-to-appear character list respectively, and determining whether the recognition result has output characters which are not included in the easy-to-appear character list;

if yes, replacing the output characters which are not contained in the character list easy to appear with error-prone characters which are paired with the output characters in the error-prone character grouping list;

4. The method for determining the text recognition result according to claim 1, wherein performing OCR recognition on the text image to be recognized to obtain probability distributions of all character categories output by the text image at each time step includes:

5. The method for determining the text recognition result according to claim 1, wherein the updating the maximum probability value based on the probability values of the probability distributions of the error-prone characters of the character pairs corresponding to the maximum probability value in the error-prone character grouping list at the time step comprises:

6. The method for determining the text recognition result according to claim 1, wherein the decoding the character sequence to obtain the final recognition result comprises:

if not, combining the repeated output characters in the continuous time steps, and taking the combined character sequence as a final recognition result;

7. The method for determining the text recognition result according to any one of claims 1 to 6, wherein the method further comprises:

and taking the product result as a confidence score of the recognition result.

8. An apparatus for determining a text recognition result, comprising:

9. A storage medium, characterized by: the storage medium has stored therein computer-readable instructions which, when executed by one or more processors, cause the one or more processors to perform the steps of the method of determining a text recognition result according to any one of claims 1 to 7.

10. A computer device, comprising: one or more processors, and a memory;

the memory has stored therein computer-readable instructions which, when executed by the one or more processors, perform the steps of the method of determining text recognition results of any of claims 1 to 7.