CN112990035A

CN112990035A - Text recognition method, device, equipment and storage medium

Info

Publication number: CN112990035A
Application number: CN202110310267.0A
Authority: CN
Inventors: 陈禹燊; 韩光耀; 姜泽青
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-03-23
Filing date: 2021-03-23
Publication date: 2021-06-18
Anticipated expiration: 2041-03-23
Also published as: CN112990035B

Abstract

The disclosure provides a text recognition method, a text recognition device, text recognition equipment and a storage medium, and relates to the field of artificial intelligence such as image recognition, natural language processing, deep learning and cloud computing. The specific implementation scheme is as follows: obtaining a machine identification result of an object to be identified; generating a model through the confidence coefficient based on the machine recognition result and the semantic features of the object to be recognized to obtain the confidence coefficient of the machine recognition result; and comparing the confidence coefficient of the machine recognition result with a confidence coefficient threshold value to determine the final recognition result of the object to be recognized, wherein the confidence coefficient threshold value is determined in advance according to a confidence coefficient generation model. According to the technology disclosed by the invention, the workload of manual review in the text review process can be reduced, and the identification efficiency is improved.

Description

Text recognition method, device, equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technology, and more particularly to the field of artificial intelligence such as image recognition, natural language processing, deep learning, and cloud computing.

Background

In the text recognition technology, under the bill auditing scene, the recognition of the bill can be realized, and further the information such as the purpose of the bill is classified. In the related art, the machine Recognition is usually performed on the bill by combining an OCR (Optical Character Recognition) and an NLP (Natural Language Processing), but because the machine Recognition result has an error, an auditor is required to manually audit all the machine Recognition results, and the related art has the defects of high labor cost, low Recognition efficiency and the like.

Disclosure of Invention

The disclosure provides a text recognition method, a text recognition device, a text recognition equipment and a storage medium.

According to an aspect of the present disclosure, there is provided a text recognition method, including:

obtaining a machine identification result of an object to be identified;

generating a model through the confidence coefficient based on the machine recognition result and the semantic features of the object to be recognized to obtain the confidence coefficient of the machine recognition result;

and comparing the confidence coefficient of the machine recognition result with a confidence coefficient threshold value to determine the final recognition result of the object to be recognized, wherein the confidence coefficient threshold value is determined in advance according to a confidence coefficient generation model.

According to another aspect of the present disclosure, there is provided a training method of a confidence level generation model, including:

determining an initialized target confidence coefficient by using a machine recognition result sample of an object to be recognized;

inputting the machine recognition result sample of the object to be recognized and the semantic features of the machine recognition result sample into a confidence coefficient generation model to be trained to obtain the difference between the prediction confidence coefficient and the target confidence coefficient;

and training the confidence coefficient generation model to be trained according to the difference until the difference is within an allowable range.

According to another aspect of the present disclosure, there is provided an apparatus for text recognition, including:

the machine identification result acquisition module is used for acquiring a machine identification result of the object to be identified;

the confidence coefficient generation module is used for generating a model through the confidence coefficient based on the machine recognition result and the semantic features of the object to be recognized to obtain the confidence coefficient of the machine recognition result;

and the final recognition result determining module is used for comparing the confidence coefficient of the machine recognition result with a confidence coefficient threshold value to determine the final recognition result of the object to be recognized, wherein the confidence coefficient threshold value is determined in advance according to the confidence coefficient generation model.

According to another aspect of the present disclosure, there is provided a training apparatus for a confidence level generation model, including:

the target confidence coefficient determining module is used for determining the initialized target confidence coefficient by utilizing a machine recognition result sample of the object to be recognized;

the difference generation module is used for inputting the semantic features of the machine recognition result sample of the object to be recognized and the machine recognition result sample into a confidence coefficient generation model to be trained to obtain the difference between the prediction confidence coefficient and the target confidence coefficient;

and the training module is used for training the confidence coefficient generation model to be trained according to the difference until the difference is within an allowable range.

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method according to any one of the embodiments of the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a method in any of the embodiments of the present disclosure.

According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method in any of the embodiments of the present disclosure.

According to the technology disclosed by the invention, the workload of manual review in the text review process can be reduced, and the identification efficiency is improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 shows a flow diagram of a method of text recognition in accordance with an embodiment of the present disclosure;

FIG. 2 illustrates a detailed flow diagram for determining a confidence threshold in accordance with an embodiment of the present disclosure;

FIG. 3 illustrates a detailed flow chart for determining confidence thresholds based on accuracy and recall in accordance with an embodiment of the present disclosure;

FIG. 4 illustrates a detailed flow chart for obtaining a machine recognition result of an object to be recognized according to an embodiment of the present disclosure;

FIG. 5 illustrates a detailed flow chart for obtaining machine identification results according to an embodiment of the present disclosure;

FIG. 6 illustrates a detailed flow diagram of building usage features according to an embodiment of the present disclosure;

FIG. 7 illustrates a detailed flow chart for deriving confidence for a machine recognition result according to an embodiment of the present disclosure;

FIG. 8 illustrates a detailed flow chart for determining a final recognition result of an object to be recognized according to an embodiment of the present disclosure;

FIG. 9 illustrates a detailed flow chart for determining a final recognition result of an object to be recognized according to an embodiment of the present disclosure;

FIG. 10 illustrates a graph of recall versus candidate threshold and accuracy versus candidate threshold;

FIG. 11 illustrates a flow diagram of a method of training a confidence generation model in accordance with an embodiment of the present disclosure;

FIG. 12 shows a schematic diagram of an apparatus for text recognition according to an embodiment of the present disclosure;

FIG. 13 shows a schematic diagram of a training apparatus for a confidence generation model, in accordance with an embodiment of the present disclosure;

FIG. 14 is a block diagram of an electronic device for implementing a method of text recognition of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The man-machine cooperation technology is taken as an artificial intelligence core technology which greatly improves the production efficiency and quality and enriches the creativity of human society, and is given high attention as a key factor for realizing the industrial innovation of the whole society. Human-machine collaboration aims at bringing automatable work to the machine.

In the related technology, in the application scenario of post-credit note review, the human-computer cooperation technology generally adopts an OCR + NLP technology, and obtains a related field as a machine recognition result in a structured manner. Then, the auditor does not need to input the information of each field of each bill one by one, but needs to compare the machine identification result and the bill picture manually, and some modifications are made on the reference bill according to the individual wrong fields. The traditional man-machine cooperation mode really realizes the process automation to a certain degree, and saves most of the recording time of auditors. However, because the machine identification results have errors and cannot reach 100% accuracy, the auditor cannot determine which machine identification results are accurate and which results are inaccurate, so that the auditor still needs to spend a lot of time to audit the machine identification results one by one.

Therefore, in the application scene of post-credit bill auditing, the man-machine cooperation technology in the related technology has the defects of high labor cost, low verification efficiency and the like.

In view of the above technical problems in the related art, embodiments of the present disclosure provide a method for text recognition. According to the method disclosed by the embodiment of the disclosure, the confidence coefficient is generated for the machine identification result, and whether the machine identification result is accurate is judged based on the confidence coefficient, so that manual verification can be performed on part of the machine identification results in a targeted manner, and further the workload and the working time of manual comparison are reduced.

Fig. 1 shows a flow diagram of a method of text recognition according to an embodiment of the present disclosure.

As shown in fig. 1, the method includes:

step S101: obtaining a machine identification result of an object to be identified;

step S102: generating a model through the confidence coefficient based on the machine recognition result and the semantic features of the object to be recognized to obtain the confidence coefficient of the machine recognition result;

step S103: and comparing the confidence coefficient of the machine recognition result with a confidence coefficient threshold value to determine the final recognition result of the object to be recognized, wherein the confidence coefficient threshold value is determined in advance according to a confidence coefficient generation model.

In the embodiment of the present disclosure, the object to be recognized may be an image or the like containing a character to be recognized. For example, for a ticket audit scenario, the object to be recognized may be an image of a ticket to be recognized.

For example, in step S101, a machine recognition result of the object to be recognized is obtained, and the object to be recognized may be subjected to character recognition processing by various character recognition techniques to obtain the machine recognition result of the object to be recognized.

For example, the object to be recognized may be recognized by an OCR technology to obtain a plurality of characters included in the object to be recognized, and key characters are extracted from the characters and structured to obtain a key field as a machine recognition result. It will be appreciated that OCR technology, using a classification model, will have a classification probability value for each character, which may reflect the accuracy of the recognition result. Also, the use of OCR recognition to extract machine recognition results is not only simple recognition but also involves detection processing and related structuring processing.

The machine recognition result may include a key field included in the object to be recognized. For example, for a bill auditing scene, the key fields may be fields such as amount, date, name, and use, wherein the fields such as amount, date, and information may be directly generated by using corresponding characters obtained by an OCR technology and structured, and the information of the use key fields may be obtained by an NLP technology and the like.

Exemplarily, in step S102, the semantic features of the object to be recognized are used to reflect the real semantic information of the key fields in the object to be recognized. For example, for the purpose field in the object to be recognized, the real purpose of the object to be recognized can be better reflected by extracting the semantic features of the purpose field in the object to be recognized.

The confidence coefficient generation model compares the machine recognition result with the real semantic information of the key field in the object to be recognized, so that the confidence coefficient of the machine recognition result is obtained. The confidence coefficient is used for reflecting the similarity of the machine recognition result and the real semantic information of the key field in the object to be recognized. That is, the higher the confidence coefficient is, the closer the machine recognition result is to the real semantic information of the key field in the object to be recognized, that is, the higher the accuracy of the machine recognition result is; the smaller the confidence coefficient is, the larger the difference between the machine recognition result and the real semantic information of the key field in the object to be recognized is, i.e. the lower the accuracy of the machine recognition result is.

Exemplarily, in step S103, the confidence of the machine recognition result is compared with a confidence threshold, and if the confidence is greater than or equal to the confidence threshold, which indicates that the accuracy of the machine recognition result is higher, the machine recognition result is used as the final recognition result of the object to be recognized; and under the condition that the confidence coefficient is smaller than the confidence coefficient threshold, indicating that the accuracy of the machine identification result is low, sending the machine identification result to a manual auditing channel for manual auditing, and taking the identification result after the final manual auditing as a final identification result.

Wherein the confidence threshold can be determined by a trained confidence generation model.

For example, a certain number of samples are input into the confidence level generation model to obtain the confidence level corresponding to each sample. According to the confidence corresponding to each sample, the accuracy and the recall ratio of a certain number of samples under different candidate thresholds are calculated, and the candidate threshold which meets the expected accuracy and is high in the recall ratio is selected as the confidence threshold.

The candidate threshold may be a plurality of reference values obtained by presetting a step size according to a set threshold range. For example, the threshold value range may be 0.50-1.00, and the preset step size may be 0.01, so as to obtain 51 candidate threshold values in total of 0.50,0.51,0.52 … … 0.99.99, 1.00. The threshold range and the step length can be set according to actual conditions.

It can be understood that the confidence threshold determined by the confidence generation model is beneficial to improving the recall rate as much as possible according to the final recognition result obtained by the comparison result of the confidence of the machine recognition result and the confidence threshold on the premise of higher accuracy, thereby reducing the workload of manual review and improving the recognition efficiency.

According to the method disclosed by the embodiment of the disclosure, by utilizing the confidence coefficient generation model, the confidence coefficient can be generated for the machine recognition result to be used as a basis for judging whether the machine recognition result is accurate, so that in the process of determining the final recognition result, whether the machine recognition result can be used as the final recognition result is judged according to the comparison result of the confidence coefficient and the confidence coefficient threshold value. Therefore, for a large batch of objects to be recognized, manual examination of machine recognition results is not needed, and manual examination can be performed only on machine recognition results with confidence degrees smaller than a confidence degree threshold value, so that the workload of manual examination is reduced, and the recognition efficiency is improved.

Moreover, the confidence coefficient threshold value is determined through the confidence coefficient generation model, so that the recall rate is improved as much as possible on the premise that the accuracy of the machine recognition result which can be used as the final recognition result meets the expected accuracy rate in the process of determining the final recognition result, the workload of manual examination is further reduced, and the recognition efficiency is improved.

As shown in fig. 2, in one embodiment, determining the confidence threshold value according to the confidence generation model in advance includes:

step S201: inputting the sample set into a confidence coefficient generation model to obtain a confidence coefficient set;

step S202: according to the comparison result of the confidence coefficient set and different candidate thresholds, calculating the accuracy and recall rate of the confidence coefficient set under different candidate thresholds;

step S203: based on the accuracy and recall, confidence thresholds are determined from the different candidate thresholds.

For example, the sample may be a machine recognition result of the object to be recognized, and the sample set is constructed by obtaining a certain number of machine recognition results of the object to be recognized. The object to be identified can be a shopping receipt. And then, auditing the machine identification results of each object to be identified in a manual mode so as to judge whether the machine identification results are accurate or not, and marking.

And respectively inputting each sample in the sample set into the confidence coefficient generation model to obtain the confidence coefficient corresponding to each sample, and constructing a plurality of confidence coefficients into a confidence coefficient set.

And comparing the confidence degrees in the confidence degree set with the candidate threshold values to obtain the number of samples with the confidence degrees larger than or equal to the candidate threshold values, and then calculating the actual accurate number of samples with the confidence degrees larger than or equal to the number of the samples with the candidate threshold values according to the artificial marks of the samples.

The specific calculation formula of the accuracy is as follows: the accuracy rate is the number of samples with confidence greater than or equal to the candidate threshold value that are actually correct in the document/the confidence is greater than or equal to the number of samples with candidate threshold value. In other words, the accuracy of the confidence set under the candidate threshold can be obtained by calculating the ratio of the actual accurate sample number in the sample number with the confidence greater than or equal to the candidate threshold to the sample number with the confidence greater than or equal to the candidate threshold.

The specific calculation formula of the recall rate is as follows: recall is the number of samples/total number of samples actually correct in a ticket whose confidence is greater than or equal to the candidate threshold. In other words, by calculating the ratio of the actual accurate sample number to the total sample number in the sample numbers with the confidence degrees larger than or equal to the candidate threshold, the recall rate of the confidence degree set under the candidate threshold can be obtained.

Through the above manner, the accuracy and the recall ratio of the confidence set under different candidate thresholds can be obtained, and a relation curve between the accuracy and the candidate thresholds of the confidence set and a relation curve between the recall ratio and the candidate thresholds of the confidence set are constructed according to the above manner, as shown in fig. 10, wherein an abscissa is a specific numerical value of the candidate threshold, and an ordinate is a specific numerical value of the accuracy or the recall ratio.

And selecting the candidate threshold meeting the expected accuracy and the expected recall rate as the confidence coefficient threshold by using the relation curve of the accuracy of the confidence coefficient set and the candidate threshold and the relation curve of the recall rate of the confidence coefficient set and the candidate threshold.

As shown in fig. 3, in one embodiment, step S203 may include:

step S301: selecting a candidate threshold meeting a preset condition from different candidate thresholds as a reference threshold according to the corresponding accuracy of the confidence set under different candidate thresholds;

step S302: and selecting the reference threshold with the maximum recall rate from the reference thresholds as the confidence threshold according to the recall rate corresponding to the confidence set under the reference thresholds.

Illustratively, in step S301, the predetermined condition being satisfied means that the threshold is greater than or equal to the expected accuracy, that is, the candidate threshold greater than or equal to the expected accuracy is selected from the different candidate thresholds as the reference threshold according to the accuracy of the confidence set under the different candidate thresholds. The expected accuracy may be set according to actual needs, for example, the expected accuracy may be set with reference to the accuracy of manual identification.

It is to be understood that the number of candidate thresholds satisfying the predetermined condition may be one or more. In the case that the number of the reference threshold values is one, the reference threshold value may be directly determined as the confidence threshold value in step S302; under the condition that the number of the reference thresholds is multiple, the reference threshold with the largest recall rate is selected from the multiple reference thresholds according to the recall rates of the confidence sets corresponding to different reference thresholds, and the reference threshold is determined as the confidence threshold.

For example, as shown in fig. 10, assuming that the expected accuracy is 0.96, according to the accuracy corresponding to the confidence set under different candidate thresholds, the candidate threshold with the accuracy greater than or equal to 0.96 is selected from the different candidate thresholds as the reference threshold, that is, the candidate thresholds with the values of 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, and 0.99 in the figure are used as the reference thresholds. Then, according to the recall rates of the confidence sets corresponding to different reference thresholds, a reference threshold with the highest recall rate is selected as a confidence threshold from the reference thresholds with values of 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98 and 0.99, as can be seen from fig. 10, where the reference threshold with the value of 0.92 corresponds to the highest recall rate and the value of 0.37.

It is understood that, at the confidence threshold value of 0.92, the accuracy rate of the finally determined machine recognition result which can be used as the final recognition result can reach 96%, and the recall rate can reach 37%. That is, for a certain number of machine recognition results of the objects to be recognized, 37% of the machine recognition results can be used as the final recognition result, and manual review is not required; and 63% of machine identification results need to be sent to a manual auditing terminal for manual auditing.

Through the implementation mode, the appropriate confidence threshold value can be selected, so that the machine recognition result which can be used as the final recognition result has higher accuracy and the recall rate of the machine recognition result is improved as much as possible under the confidence threshold value, the number of machine recognition results which cannot be used as the final recognition result is reduced as much as possible, the manual review work is reduced, and the recognition efficiency is improved.

As shown in fig. 4, in one embodiment, step S101 includes:

step S401: processing an object to be recognized by utilizing an optical character recognition technology to obtain a plurality of character features of the object to be recognized;

step S402: filtering the character features to obtain key character features of the object to be recognized;

step S403: and performing feature extraction processing on the key character features to obtain a machine identification result of the object to be identified.

Illustratively, in step S401, the optical character recognition technique refers to examining, by the electronic device, characters printed in an object to be recognized, determining the shape thereof by detecting dark and light patterns, and then translating the shape into computer text by a character recognition method to obtain a plurality of character features of the object to be recognized.

For example, in step S402, the character features may be filtered through various text filtering methods. For example, the popularity of words of each character feature may be measured by calculating an Inverse Document Frequency (IDF) of each character feature, thereby filtering out high frequency words of the plurality of character features. It can be understood that the higher the IDF value, the higher the popularity of the character feature is; the lower the IDF value, the less general the character features. And filtering the character features to obtain the key character features of the object to be recognized.

For example, in step S402, the plurality of character features may be further filtered by using a preset stop word dictionary. For example, for a shopping ticket, the words such as "ticket", "welcome", "approaching", "amount" and the like may be filtered.

For example, in step S403, the extracted features may include multidimensional features for characterizing actual information of the object to be recognized, for example, folding, turning or definition features for characterizing the object to be recognized, key field features for characterizing actual use of the object to be recognized, monetary features for characterizing the object to be recognized, and encoding features for characterizing a category to which the object to be recognized belongs.

Through the embodiment, the accuracy of the machine recognition result is improved, so that the actual information of the object to be recognized is reflected more accurately by the machine recognition result, and the accuracy and the recall rate of the final recognition result are improved.

As shown in fig. 5, in an embodiment, the machine recognition result includes at least one of a clarity feature, a money amount feature and a usage feature, and the step S403 may include at least one of the following:

step S501: aiming at the key character features, extracting the key character features positioned at the preset positions of the objects to be recognized, calculating the average recognition probability and constructing definition features;

step S502: aiming at the key character features, extracting the key character features for representing the amount and constructing amount features;

step S503: and aiming at the key character features, extracting the key character features for characterizing the use and generating a use field, and constructing the use features based on the use field.

For example, in step S501, for a bill auditing scenario, the predetermined positions of the object to be recognized may be the position of the first line of words and the position of the last line of words of a bill, and the sharpness feature is obtained by extracting key character features located at the position of the first line of words and the position of the last line of words of the bill and calculating an average recognition probability of the key character features. The average recognition probability may be obtained by averaging the recognition probability of each character feature returned in the optical character recognition process. It should be noted that the definition feature may reflect whether the object to be recognized is folded or flipped.

Illustratively, in step S502, based on the amount of money characteristics, the probability of the actual use of the object to be identified can be reflected to some extent by the preset distribution probabilities of the objects to be identified for different uses with respect to the amount of money. For example, in the case where the object to be identified is a shopping ticket, the shopping ticket distributed in the amount range of 1 to 3000 dollars has a high probability of being used for daily use as a commodity.

Illustratively, in step S503, based on the extracted key character features for ensuring the use, the key character features are subjected to a stitching process to obtain a use field, an average recognition probability of the use field is obtained by averaging the recognition probability of each character feature returned in the optical character recognition process, and the average recognition probability of the use field is taken as the use feature.

According to the embodiment, the key information used for representing multiple dimensions of the object to be recognized can be obtained, so that the accuracy of the machine recognition result is improved, and particularly for the use characteristics of the object to be recognized, the actual use of the object to be recognized can be better fitted.

As shown in fig. 6, in an embodiment, the usage features include index position sub-features and classification coding sub-features, and step S403 may include:

step S601: constructing index position sub-features according to the corresponding index positions of the purpose fields in the keyword dictionary;

step S602: and constructing a classification coding sub-feature according to the corresponding codes of the application fields in the classification library.

Illustratively, in step S601, it is determined whether the usage field is hit by the keyword dictionary based on the keyword dictionary of different usage. For example, when a keyword dictionary whose usage category is "medical & beauty" is [ "gauze", "bandage", and "mask". ], and a keyword matching the "mask" is included in the usage field, an index position sub-feature is constructed by indexing the "mask" at an index position in the keyword dictionary. The keyword dictionary can be preset according to common words corresponding to different application categories.

Illustratively, in step S602, the classification code of the usage field in the classification library is obtained, and the classification code is used as a classification code sub-feature. For example, the classification library includes [ "daily necessities", "home appliances". In "medical and beauty" …, "the category corresponding to the usage field is" daily necessities ", and the code corresponding to the" daily necessities "is" index 0 ", then" index 0 "is used as the classification code sub-feature. The classification library can preset corresponding codes according to different classifications, and the classification information of the application fields can be known according to the codes.

According to the embodiment, the purpose characteristics of the object to be recognized can be refined, the purpose of the object to be recognized can be more accurately represented through the index position sub-characteristics and the classification coding sub-characteristics, and the accuracy of the confidence coefficient generated subsequently can be improved.

As shown in fig. 7, in one embodiment, step S102 includes:

step S701: extracting semantic features of an object to be recognized;

step S702: and generating a model through the trained confidence degree based on the semantic features and the machine recognition result to obtain the confidence degree of the machine recognition result.

Exemplarily, in step S701, a semantic feature, i.e., a feature for characterizing a key character feature semantic of the object to be recognized.

For example, an embedding vector can be obtained as a semantic feature of the object to be recognized by preprocessing the key character feature of the object to be recognized. It can be understood that the Embedding vector is a low-dimensional vector, and the Embedding vector can be used for characterizing the real purpose of the object to be recognized.

For example, in step S702, the confidence generation model may employ various correlation analysis models. For example, the confidence generation model may employ an XGBoost model.

It can be understood that the XGboost is called Extreme Gradient Boosting, is an optimized distributed Gradient boost library, and has the advantages of high efficiency, flexibility and portability. XGboost is a tool of a large-scale parallel boosting tree, and is the fastest and best tool kit of the open source boosting tree at present. By adopting the XGboost model as the confidence coefficient generation model and calculating the correlation score between the machine recognition result and the semantic features, the method is beneficial to improving the calculation accuracy of the confidence coefficient. The relevance score may be a similarity value between the representation and the recognition result thereof and the semantic features, and the similarity value is a confidence of the machine recognition result.

According to the above embodiment, by using the confidence degree generation model, the similarity between the machine recognition result of the object to be recognized and the semantic features of the object to be recognized, which are used to characterize the real usage of the object to be recognized, is compared, and thus the correlation score between the two can be used as the confidence degree of the machine recognition result.

As shown in fig. 8, in one embodiment, step S103 includes:

step S801: and determining the machine recognition result as the final recognition result of the object to be recognized under the condition that the confidence coefficient of the machine recognition result is greater than or equal to the confidence coefficient threshold value.

It is understood that, in the case that the confidence of the machine recognition result is greater than or equal to the confidence threshold, it indicates that the accuracy of the machine recognition result is high, and the machine recognition result can be used as the final recognition result without manual review or verification.

Based on the method, manual review is not needed for the machine identification result meeting the confidence coefficient threshold, so that the workload of manual review is reduced, and the labor cost is saved.

As shown in fig. 9, in one embodiment, step S103 includes:

step S901: under the condition that the confidence coefficient of the machine identification result is smaller than a confidence coefficient threshold value, sending the object to be identified to an identification terminal;

step S902: and determining the identification result of the identification terminal as the final identification result of the identification object.

It is understood that, in the case that the confidence of the machine recognition result is less than the confidence threshold, the machine recognition result is less accurate, and therefore, the machine recognition result needs to be checked or verified manually.

Exemplarily, the identification terminal is used for displaying the object to be identified and/or the machine identification result of the object to be identified to an auditor, so that the auditor can manually audit the object to be identified and/or the machine identification result of the object to be identified. And finally, taking the identification result of the manual examination as a final identification result.

According to an embodiment of the present disclosure, the present disclosure further provides a training method of the confidence coefficient generation model.

As shown in fig. 11, the method includes:

step S1101: determining an initialized target confidence coefficient by using a machine recognition result sample of an object to be recognized;

step S1102: inputting the machine recognition result sample of the object to be recognized and the semantic features of the machine recognition result sample into a confidence coefficient generation model to be trained to obtain the difference between the prediction confidence coefficient and the target confidence coefficient;

step S1103: and training the confidence coefficient generation model to be trained according to the difference until the difference is within an allowable range.

In a specific example, taking the recognition scenario of the shopping bill as an example, the training method of the confidence generation model may include the following specific steps:

exemplarily, step S1101 may include the following specific steps:

(1) and constructing a data set, selecting 1850 shopping bills, constructing the data set, and cutting the data set to obtain a training data set, a verification data set and a test data set, wherein the ratio of the data amount of the training data set to the data amount of the verification data set to the data amount of the test data set is 1450:200: 200.

(2) Preprocessing the data set, obtaining character features for representing the character information of the shopping bills by using an OCR technology for each bill in the data set, and filtering the character features to obtain key character features.

(3) And performing feature extraction processing on the training data set, and taking the extracted multidimensional features as machine recognition result samples. Wherein the extracted multi-dimensional features include:

the method comprises the steps of (1) calculating the average recognition probability of a first line of characters and a last line of characters in a shopping bill as bill folding, overturning and definition characteristics;

the method comprises the steps of identifying and structuring information characteristics of actual purposes, calculating average identification probability of key fields for representing the actual purposes, extracting hit conditions of the key field characteristics in a keyword dictionary and index positions of the key fields in the keyword dictionary to obtain the identification and structuring information characteristics of the actual purposes;

the amount characteristic is the key field characteristic used for reflecting the amount in the bill and is used as the amount characteristic;

and using the use classification coding characteristics, and using the classification codes corresponding to the key field characteristics as the use classification coding characteristics.

Illustratively, in step S1102, the difference between the prediction confidence and the target confidence is obtained by generating the prediction confidence output by the model based on the confidence during the training process.

For example, in step S1103, the confidence level generation model is continuously trained according to the difference between the prediction confidence level and the target confidence level until the difference is within the allowable range, so as to obtain a trained confidence level generation model.

The confidence coefficient generation model obtained by training according to the training method disclosed by the embodiment of the disclosure can be used for generating the confidence coefficient of the machine recognition result to be used as a basis for judging whether the machine recognition result is accurate, so that in the process of determining the final recognition result, whether the machine recognition result can be used as the final recognition result is judged according to the comparison result of the confidence coefficient and the confidence coefficient threshold value. Therefore, for a large batch of objects to be recognized, manual examination of machine recognition results is not needed, and manual examination can be performed only on machine recognition results with confidence degrees smaller than a confidence degree threshold value, so that the workload of manual examination is reduced, and the recognition efficiency is improved.

According to an embodiment of the present disclosure, the present disclosure also provides a text recognition apparatus.

As shown in fig. 12, the apparatus includes:

a machine recognition result obtaining module 1201, configured to obtain a machine recognition result of the object to be recognized;

a confidence generating module 1202, configured to generate a model through a confidence based on the machine recognition result and the semantic features of the object to be recognized, so as to obtain a confidence of the machine recognition result;

and a final recognition result determining module 1203, configured to compare the confidence of the machine recognition result with a confidence threshold, and determine a final recognition result of the object to be recognized, where the confidence threshold is determined in advance according to the confidence generation model.

In one embodiment, the apparatus further comprises:

the confidence set generating module is used for inputting the sample set into the confidence generating model to obtain a confidence set;

the accuracy and recall rate calculation module is used for calculating the accuracy and recall rate of the confidence coefficient set under different candidate threshold values according to the comparison result of the confidence coefficient set and different candidate threshold values;

and the confidence threshold determining module is used for determining confidence thresholds from different candidate thresholds based on the accuracy and the recall rate.

In one embodiment, the confidence threshold determination module comprises:

the reference threshold determining submodule is used for selecting a candidate threshold meeting a preset condition from different candidate thresholds according to the corresponding accuracy of the confidence set under different candidate thresholds to serve as a reference threshold;

and the confidence threshold determining submodule is used for selecting the reference threshold with the maximum recall rate from the reference thresholds according to the recall rate corresponding to the confidence set under the reference thresholds, and taking the reference threshold as the confidence threshold.

In one embodiment, the machine recognition result obtaining module 1201 includes:

the character feature generation submodule is used for processing the object to be recognized by utilizing an optical character recognition technology to obtain a plurality of character features of the object to be recognized;

the filtering module submodule is used for filtering the character features to obtain key character features of the object to be recognized;

and the feature extraction submodule is used for carrying out feature extraction processing on the key character features to obtain a machine identification result of the object to be identified.

In one embodiment, the machine recognition result includes at least one of a clarity feature, a monetary feature and a usage feature, and the feature extraction sub-module includes at least one of:

the definition feature construction unit is used for extracting key character features positioned at a preset position of an object to be recognized according to the key character features, calculating average recognition probability and constructing definition features;

the amount feature construction unit is used for extracting key character features for representing amount and constructing amount features according to the key character features;

and the application feature construction unit is used for extracting the key character features for characterizing the application and generating application fields aiming at the key character features, and constructing the application features based on the application fields.

In one embodiment, the usage feature includes an index position sub-feature and a classification coding sub-feature, and the usage feature constructing unit is further configured to:

constructing index position sub-features according to the corresponding index positions of the purpose fields in the keyword dictionary;

and constructing a classification coding sub-feature according to the corresponding codes of the application fields in the classification library.

In one embodiment, the confidence generation module 1202 includes:

the semantic feature extraction submodule is used for extracting semantic features of the object to be identified;

and the confidence coefficient generation submodule is used for generating a trained confidence coefficient based on the semantic features and the machine recognition result to obtain the confidence coefficient of the machine recognition result.

In one embodiment, the final recognition result determining module 1203 is further configured to:

and determining the machine recognition result as the final recognition result of the object to be recognized under the condition that the confidence coefficient of the machine recognition result is greater than or equal to the confidence coefficient threshold value.

under the condition that the confidence coefficient of the machine identification result is smaller than a confidence coefficient threshold value, sending the object to be identified to an identification terminal;

and determining the identification result of the identification terminal as the final identification result of the identification object.

According to the embodiment of the disclosure, a device for training a confidence coefficient generation model is also provided.

As shown in fig. 13, the apparatus includes:

a target confidence determining module 1301, configured to determine an initialized target confidence by using a machine recognition result sample of an object to be recognized;

a difference generation module 1302, configured to input semantic features of the machine recognition result sample of the object to be recognized and the machine recognition result sample into a confidence generation model to be trained, so as to obtain a difference between a prediction confidence and a target confidence;

and the training module 1303 is used for training the confidence coefficient generation model to be trained according to the difference until the difference is within the allowable range.

The functions of each unit, module or sub-module in each apparatus in the embodiments of the present disclosure may refer to the corresponding description in the above method embodiments, and are not described herein again.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 14 shows a schematic block diagram of an example electronic device 1400 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 14, the electronic device 1400 includes a computing unit 1401 that can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)1402 or a computer program loaded from a storage unit 1408 into a Random Access Memory (RAM) 1403. In the RAM 1403, various programs and data required for the operation of the electronic device 1400 can also be stored. The calculation unit 1401, the ROM 1402, and the RAM 1403 are connected to each other via a bus 1404. An input/output (I/O) interface 1405 is also connected to bus 1404.

A number of components in the electronic device 1400 are connected to the I/O interface 1405, including: an input unit 1406 such as a keyboard, a mouse, or the like; an output unit 1407 such as various types of displays, speakers, and the like; a storage unit 1408 such as a magnetic disk, optical disk, or the like; and a communication unit 1409 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 1409 allows the electronic device 1400 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The computing unit 1401 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 1401 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The computing unit 1401 performs the respective methods and processes described above, such as a method of text recognition and/or a training method of a confidence generation model. For example, in some embodiments, the method of text recognition and/or the method of training the confidence generation model may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 1408. In some embodiments, part or all of the computer program can be loaded and/or installed onto the electronic device 1400 via the ROM 1402 and/or the communication unit 1409. When the computer program is loaded into the RAM 1403 and executed by the computing unit 1401, one or more steps of the method of text recognition and/or the method of training the confidence generation model described above may be performed. Alternatively, in other embodiments, the computing unit 1401 may be configured by any other suitable means (e.g. by means of firmware) to perform a method of text recognition and/or a training method of a confidence generation model.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel or sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method of text recognition, comprising:

obtaining a machine identification result of an object to be identified;

generating a model through a confidence coefficient based on the machine recognition result and the semantic features of the object to be recognized to obtain the confidence coefficient of the machine recognition result;

and comparing the confidence coefficient of the machine recognition result with a confidence coefficient threshold value to determine a final recognition result of the object to be recognized, wherein the confidence coefficient threshold value is determined in advance according to the confidence coefficient generation model.

2. The method of claim 1, wherein determining the confidence threshold in advance from a confidence generation model comprises:

inputting a sample set into the confidence coefficient generation model to obtain a confidence coefficient set;

according to the comparison result of the confidence coefficient set and different candidate thresholds, calculating the accuracy and recall rate of the confidence coefficient set under the different candidate thresholds;

determining the confidence threshold from the different candidate thresholds based on the accuracy and the recall.

3. The method of claim 2, wherein determining the confidence threshold from the different candidate thresholds based on the accuracy and the recall comprises:

according to the corresponding accuracy of the confidence coefficient set under the different candidate threshold values, selecting a candidate threshold value meeting a preset condition from the different candidate threshold values as a reference threshold value;

and selecting a reference threshold with the maximum recall rate from the reference thresholds as the confidence threshold according to the recall rate corresponding to the confidence set under the reference thresholds.

4. The method of claim 1, wherein obtaining a machine recognition result of an object to be recognized comprises:

processing the object to be recognized by utilizing an optical character recognition technology to obtain a plurality of character features of the object to be recognized;

filtering the character features to obtain key character features of the object to be recognized;

and performing feature extraction processing on the key character features to obtain a machine identification result of the object to be identified.

5. The method of claim 4, wherein the machine recognition result comprises at least one of a definition feature, a money amount feature and a purpose feature, wherein the feature extraction processing is performed on the key character feature to obtain the machine recognition result, and the method comprises at least one of the following steps:

aiming at the key character features, extracting the key character features positioned at the preset positions of the objects to be recognized, calculating the average recognition probability and constructing definition features;

aiming at the key character features, extracting key character features for representing money, and constructing money features;

and aiming at the key character features, extracting the key character features for characterizing the use and generating a use field, and constructing the use features based on the use field.

6. The method of claim 5, wherein the usage features comprise index position sub-features and class encoding sub-features, and wherein constructing usage features comprises:

constructing the index position sub-features according to the index positions of the purpose fields corresponding to the keyword dictionaries;

and constructing the classification coding sub-features according to the corresponding codes of the use fields in the classification library.

7. The method of claim 1, wherein obtaining the confidence level of the machine recognition result through a confidence level generation model based on the machine recognition result and semantic features of the object to be recognized comprises:

extracting semantic features of the object to be recognized;

and generating a model through the trained confidence degree based on the semantic features and the machine recognition result to obtain the confidence degree of the machine recognition result.

8. The method according to any one of claims 1-7, wherein comparing the confidence of the machine recognition result to a confidence threshold to determine a final recognition result of the object to be recognized comprises:

and determining the machine recognition result as a final recognition result of the object to be recognized when the confidence of the machine recognition result is greater than or equal to the confidence threshold.

9. The method according to any one of claims 1-7, wherein comparing the confidence of the machine recognition result to a confidence threshold to determine a final recognition result of the object to be recognized comprises:

under the condition that the confidence of the machine recognition result is smaller than the confidence threshold, the object to be recognized is sent to a recognition terminal;

10. A method of training a confidence generation model, comprising:

11. An apparatus for text recognition, comprising:

the confidence coefficient generation module is used for generating a model through a confidence coefficient based on the machine recognition result and the semantic features of the object to be recognized to obtain the confidence coefficient of the machine recognition result;

12. The apparatus of claim 11, further comprising:

a confidence threshold determination module for determining the confidence threshold from the different candidate thresholds based on the accuracy and the recall.

13. The apparatus of claim 12, wherein the confidence threshold determination module comprises:

a reference threshold determining submodule, configured to select, according to accuracy rates of the confidence sets corresponding to the different candidate thresholds, a candidate threshold that meets a predetermined condition from the different candidate thresholds, as a reference threshold;

and the confidence threshold determining submodule is used for selecting a reference threshold with the maximum recall rate from the reference thresholds according to the recall rate corresponding to the confidence set under the reference thresholds, and the reference threshold is used as the confidence threshold.

14. The apparatus of claim 11, wherein the machine recognition result obtaining module comprises:

the filtering submodule is used for filtering the character features to obtain key character features of the object to be recognized;

15. The apparatus of claim 14, the machine recognition result comprising at least one of a clarity feature, a monetary feature, and a usage feature, the feature extraction module comprising at least one of:

the definition feature construction unit is used for extracting the key character features positioned at the preset positions of the objects to be recognized according to the key character features, calculating the average recognition probability and constructing the definition features;

16. The apparatus of claim 15, the use-feature comprising an index-position sub-feature and a class-encoding sub-feature, the use-feature construction unit further configured to:

17. The apparatus of claim 11, wherein the confidence generation module comprises:

the semantic feature extraction submodule is used for extracting the semantic features of the object to be identified;

18. The apparatus of any of claims 11-17, wherein the final recognition result determination module is further to:

19. The apparatus of any of claims 11-17, wherein the final recognition result determination module is further to:

20. A training apparatus for a confidence generation model, comprising:

the difference generation module is used for inputting the machine recognition result sample of the object to be recognized and the semantic features of the machine recognition result sample into a confidence coefficient generation model to be trained to obtain the difference between the prediction confidence coefficient and the target confidence coefficient;

21. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-10.

22. A non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method of any one of claims 1-10.

23. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-10.