CN112381086A

CN112381086A - Method and device for outputting image character recognition result in structured mode

Info

Publication number: CN112381086A
Application number: CN202011229081.4A
Authority: CN
Inventors: 汪泰伸; 吴婷婷; 吴志鹏; 陈德意; 刘彩玲; 高志鹏; 赵建强
Original assignee: Xiamen Meiya Pico Information Co Ltd
Current assignee: Xiamen Meiya Pico Information Co Ltd
Priority date: 2020-11-06
Filing date: 2020-11-06
Publication date: 2021-02-19

Abstract

The invention relates to a method and a device for outputting image character recognition results in a structured manner, wherein the method comprises the following steps: s1, acquiring position information of a detection frame by using an optical character recognition algorithm (OCR); s2, constructing a key field data set with labels, wherein the key field is an information category to be acquired; s3, setting an anchoring field; s4, constructing a feature vector, wherein the relative position information and the relative aspect ratio are used for generating the feature vector; s5, training and optimizing a classifier, and training and optimizing a machine learning classifier by using the generated feature vector; s6, classifying the detection frames, namely classifying the detection frames of the character areas of the image to be recognized by using a machine learning classifier after training optimization; and S7, identifying and outputting a structured result, specifically, identifying characters in the detection box, performing key information matching on the identified characters, correcting and outputting character fields with similar formats, and finally outputting structured result data.

Description

Method and device for outputting image character recognition result in structured mode

Technical Field

The invention relates to the technical field of character recognition, in particular to a method and a device for recognizing a structured output image character recognition result of card information.

Background

The image character recognition mainly utilizes an Optical Character Recognition (OCR) technology to recognize and extract characters in an image into character strings, and then the character strings are edited in a post-processing link. The result recognized by OCR technology is simply a string of editable characters, not containing any structured information. For the result, a series of rules are often required to be established to screen each item for entry, or both the rules and the entries are directly manually entered, which have obvious disadvantages in robustness and efficiency, the former cannot establish a complete set of rules to screen each item of information, and the latter causes great waste of labor cost.

In recent years, with the accelerated popularization of internet of things and mobile internet terminals, common certificates, bank cards, entity business cards and the like are usually photographed and stored in a picture form, so that the card and certificate information management requirement is increasingly highlighted. At present, the common card and certificate character structuring methods mainly comprise two methods: the other method is based on layout analysis, and analyzes the adjacency relation between character areas by using the statistical rule of the layout form, and predicts the field attribution of the character area. The common classification method for information attribution of fields based on the grammar mode depends on the coverage degree of a knowledge base, and the effect of diversified names and enterprise names cannot be guaranteed; certain errors exist in the card processing of some personalized typesetting based on the layout analysis method. Therefore, a method for outputting image character recognition results in a structured manner with better robustness is needed to improve the card information management effect.

Disclosure of Invention

The invention aims to provide a method for outputting an image character recognition result in a structured mode, so as to solve the problems. Therefore, the invention adopts the following specific technical scheme:

according to an aspect of the present invention, there is provided a method for outputting a text recognition result of an image in a structured manner, comprising the following steps:

s1, acquiring position information of a detection frame by using an optical character recognition algorithm (OCR), specifically, performing character detection and recognition on an input image by using the OCR, and acquiring a position information set and a character set of a character area, wherein the position information set is a set formed by the vertex coordinates of the upper left corner and the vertex coordinates of the lower right corner of the detection frame of the character area;

s2, constructing a key field data set with labels, wherein the key field is an information category to be acquired;

s3, setting an anchoring field, specifically, setting the anchoring field according to different input data, constructing a data list of information corresponding to the anchoring field, retrieving anchoring information of a character recognition result, taking a detection box of a corresponding character area as an anchoring box, and calculating relative position information and a relative aspect ratio of the detection box and the anchoring box of other character areas, wherein the relative position information refers to a coordinate difference value of a vertex of a lower right corner of the detection box of the other character areas and a vertex of an upper left corner of the anchoring box;

s4, constructing a feature vector, wherein the relative position information and the relative aspect ratio are used for generating the feature vector;

s5, training and optimizing a classifier, and training and optimizing a machine learning classifier by using the generated feature vector;

s6, classifying the detection frames, namely classifying the detection frames of the character areas of the image to be recognized by using a machine learning classifier after training optimization;

and S7, identifying and outputting a structured result, specifically, identifying characters in the detection box, performing key information matching on the identified characters, correcting and outputting character fields with similar formats, and finally outputting structured result data.

Further, the machine learning classifier adopts a support vector machine classifier and a random forest classifier.

Further, the number of decision trees of the random forest classifier is 10, and the out-of-bag data test is set to True; and setting the kernel function of the vector machine classifier as a Gaussian radial basis kernel function, and setting the penalty factor as 90.

According to another aspect of the present invention, there is provided an apparatus for structured output of image text recognition results, comprising:

the detection frame position information acquisition module is used for acquiring detection frame position information by using an optical character recognition algorithm (OCR), specifically, performing character detection and recognition on an input image by using the OCR to acquire a position information set and a character set of a character area, wherein the position information set is a set formed by a top left corner vertex coordinate and a bottom right corner vertex coordinate of a detection frame of the character area;

the system comprises a key field data set construction module, a key field data set identification module and a key field data set identification module, wherein the key field data set identification module is used for identifying the key field data set with labels;

the anchoring field setting module is used for setting anchoring fields, specifically, setting anchoring fields according to different input data, constructing a data list of information corresponding to the anchoring fields, retrieving anchoring information for character recognition results, taking a detection box of a corresponding character area as an anchoring box, and calculating relative position information and relative width-height ratio of the detection box and the anchoring box of other character areas, wherein the relative position information refers to a coordinate difference value of a vertex of a lower right corner of the detection box of the other character areas and a vertex of an upper left corner of the anchoring box;

the characteristic vector construction module is used for using the relative position information and the relative aspect ratio for generating the characteristic vector;

the classifier training optimization module is used for training and optimizing a classifier, and specifically, training and optimizing a machine learning classifier by using the generated feature vector;

the detection frame classification module is used for classifying the detection frames of the character areas of the images to be recognized by using the machine learning classifier after training optimization;

and the recognition output module is used for recognizing and outputting the structured result, specifically recognizing characters in the detection box, performing key information matching on the recognized characters, correcting and outputting character fields with similar formats, and finally outputting structured result data.

By adopting the technical scheme, the invention has the beneficial effects that: the machine learning classifier used by the invention can process high-dimensional data, and has strong generalization capability and good robustness. For the card and certificate pictures with complex and various layout information, the OCR recognition result is structurally processed by using a machine learning classifier and combining a detection frame of a character area, so that the character information after the image character recognition can be accurately output, and an effective solution is provided for card and certificate information management.

Drawings

To further illustrate the various embodiments, the invention provides the accompanying drawings. The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the embodiments. Those skilled in the art will appreciate still other possible embodiments and advantages of the present invention with reference to these figures. Elements in the figures are not drawn to scale and like reference numerals are generally used to indicate like elements.

FIG. 1 is a flow chart of a method of the present invention;

FIG. 2 illustrates an image to be recognized;

FIG. 3 is a feature vector obtained by identifying the image shown in FIG. 2 using the method of the present invention;

fig. 4 is an output of the recognition of the image shown in fig. 2 using the method of the present invention.

Fig. 5 is a block diagram of the apparatus of the present invention.

Detailed Description

The invention will now be further described with reference to the accompanying drawings and detailed description.

As shown in fig. 1, a method for outputting a text recognition result of an image in a structured manner may include the following steps:

s1 obtains position information of the detection box by using an Optical Character Recognition (OCR) algorithm, specifically, performs character detection and recognition on the input image by using the OCR algorithm, and obtains a position information set and a character set of the character area, where the position information set is a set formed by top left corner vertex coordinates and bottom right corner vertex coordinates of the detection box of the character area.

And S2, constructing a key field data set with labels, wherein the key field is the type of the information to be acquired.

S3, setting an anchoring field, specifically, setting the anchoring field according to different input data, constructing a data list of information corresponding to the anchoring field, retrieving anchoring information of the character recognition result, taking a detection box of a corresponding character area as an anchoring box, and calculating relative position information and relative aspect ratio of the detection box and the anchoring box of other character areas, wherein the relative position information refers to the coordinate difference value of the vertex of the lower right corner of the detection box of the other character areas and the vertex of the upper left corner of the anchoring box.

And S4, constructing a feature vector, and using the relative position information and the relative aspect ratio for feature vector generation.

S5, training and optimizing a classifier, and training and optimizing a machine learning classifier by using the generated feature vector; the machine learning classifier can adopt a support vector machine classifier and a random forest classifier.

And S6, classifying the detection frames, and classifying the detection frames of the character areas of the image to be recognized by using the machine learning classifier after training optimization. The machine learning classifier carries out key field classification steps as follows: 1) calculating a characteristic vector by a key field detection frame and an anchoring frame, 2) manually marking the category of the characteristic vector and inputting the category into a machine learning classifier for training, so that the classifier can distinguish which kind of key field different fields belong to, 3) calculating the characteristic vector of the frame to be detected and the anchoring frame, and inputting the trained classifier for classification.

The method of the present invention is further explained below by taking a business card as an example. Fig. 2 shows a business card image to be recognized, and the specific processes of recognizing and outputting are as follows:

(1) carrying out character detection and recognition on an input business card image by utilizing an optical character recognition algorithm (OCR) to obtain a position information (coordinate) set and a character set of a character area, wherein the position information refers to the top left corner vertex coordinate and the bottom right corner vertex coordinate of a detection frame of the character area;

(2) constructing a keyword field data set with labels, wherein the keyword fields of the business card data are as follows: company name, job title, mobile phone, telephone, mailbox, fax, address and web address, each category is labeled with a number, for example, 0, 1, 2, 3, 4, 5, 6, 7, 8;

(3) searching a detection box matched with the role field in the business card recognition result from a pre-constructed role list, taking the detection box as an anchor box, and then calculating the coordinate difference value x and y of the vertex at the lower right corner of the detection box to be classified and the vertex at the upper left corner of the anchor box and the relative aspect ratio r of the width and the height of the detection box to be classified and the anchor box_w，r_h；

(4) The relative position information and relative aspect ratio obtained by the calculation are used for generating the characteristic vector [ x, y, r_w,r_h]；

(5) And using the generated feature vectors for training and optimizing a machine learning classifier, and setting parameters of the random forest classifier as follows: the number of the created decision trees is 10, and the out-of-bag data test is set to True; setting the parameters of the support vector machine as follows: setting the kernel function as a radial basis kernel function and setting a penalty factor as 90;

(6) classifying the character region detection boxes in the business card image by using the machine learning classifier after training optimization, wherein the classification result is shown in fig. 3;

(7) recognizing the characters in the detection box area, matching key information of the recognized characters, correcting and outputting character fields with similar formats, and finally outputting structured result data, as shown in fig. 4.

As shown in fig. 5, an apparatus for structured output of image text recognition results includes:

a detection frame position information obtaining module 100, configured to obtain detection frame position information by using an optical character recognition algorithm (OCR), specifically, perform character detection and recognition on an input image by using the OCR, and obtain a position information set and a character set of a character region, where the position information set is a set formed by an upper left corner vertex coordinate and a lower right corner vertex coordinate of a detection frame of the character region;

a key field data set constructing module 200, configured to construct a key field data set with labels, where a key field is an information category to be acquired;

an anchor field setting module 300, configured to set an anchor field, specifically, set the anchor field according to different input data, construct a data list of information corresponding to the anchor field, perform anchor information retrieval on a text recognition result, and calculate relative position information and a relative aspect ratio between a detection frame and an anchor frame of another text region by using a detection frame of a text region corresponding to the detection frame as the anchor frame, where the relative position information is a coordinate difference between a vertex of a lower right corner of the detection frame and a vertex of an upper left corner of the anchor frame of the other text region;

a feature vector construction module 400 for using the relative position information and the relative aspect ratio for feature vector generation;

a classifier training optimization module 500, configured to train and optimize a classifier, specifically, train and optimize a machine learning classifier using the generated feature vectors;

a detection frame classification module 600, configured to classify a detection frame of a text region of an image to be recognized by using a machine learning classifier after training optimization; the machine learning classifier adopts a support vector machine classifier and a random forest classifier; specific parameters of the support vector machine classifier and the random forest classifier can be set according to different recognition objects, for example, for the business card image recognition shown in fig. 2, the number of decision trees of the random forest classifier is 10, and the out-of-bag data test is set to True; setting the kernel function of the vector machine classifier as a Gaussian radial basis kernel function, and setting the penalty factor as 90;

and an identification output module 700, configured to identify and output a structured result, specifically, identify characters in the detection box, perform key information matching on the identified characters, correct and output character fields with similar formats, and finally output structured result data.

The invention applies the image character recognition and machine learning method to card information structured output, the image character recognition technology can efficiently and accurately extract and recognize characters under complex scenes, the character recognition of print forms is developed and has more mature, a plurality of commercial application cases exist, and meanwhile, the machine learning forms a decision method with certain generalization capability after a large amount of data are accumulated and rules are automatically analyzed, and unknown data can be deduced. The machine learning classifier used by the invention can process high-dimensional data, and has strong generalization capability and good robustness. For the card and certificate pictures with complex and various layout information, the OCR recognition result is structurally processed by using a machine learning classifier and combining a detection frame of a character area, so that the character information after the image character recognition can be accurately output, and an effective solution is provided for card and certificate information management.

While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method for outputting image character recognition results in a structured manner is characterized by comprising the following steps:

s1, acquiring position information of a detection frame by using an optical character recognition algorithm (OCR), specifically, performing character detection and recognition on an input image by using the OCR to acquire a position information set and a character set of a character area, wherein the position information set is a set formed by a vertex coordinate at the upper left corner and a vertex coordinate at the lower right corner of the detection frame of the character area;

2. The method of claim 1, wherein the machine learning classifier employs a support vector machine classifier and a random forest classifier.

3. A method as claimed in claim 2, wherein the number of decision trees for the random forest classifier is 10, the out-of-bag data test is set to True; and setting the kernel function of the vector machine classifier as a Gaussian radial basis kernel function, and setting the penalty factor as 90.

4. An apparatus for structured output of image text recognition results, comprising:

5. The apparatus of claim 4, in which a machine learning classifier employs a support vector machine classifier and a random forest classifier.

6. Apparatus as claimed in claim 5, wherein the number of decision trees for the random forest classifier is 10, the out-of-bag data test is set to True; and setting the kernel function of the vector machine classifier as a Gaussian radial basis kernel function, and setting the penalty factor as 90.