CN111414917B

CN111414917B - Identification method of low-pixel-density text

Info

Publication number: CN111414917B
Application number: CN202010190222.XA
Authority: CN
Inventors: 李振; 鲁宾宾; 刘挺; 陈伟强; 陈远琴; 孟天祥; 翟昶
Original assignee: Minsheng Science And Technology Co ltd
Current assignee: Minsheng Science And Technology Co ltd
Priority date: 2020-03-18
Filing date: 2020-03-18
Publication date: 2023-05-12
Anticipated expiration: 2040-03-18
Also published as: CN111414917A

Abstract

The invention belongs to the technical field of image recognition, and relates to a recognition method of a text with low pixel density. The method comprises the following specific steps: image information acquisition is carried out on the text with low pixel density to be identified, the acquired image information is input and preprocessed, and preliminary positioning is carried out; accurate positioning is carried out again; carrying out image information area identification according to the obtained accurate positioning result, classifying the identified result, and respectively inputting the classified data into a special symbol identification model and a common text identification model for identification; and summarizing and structuring the recognized result. The beneficial effects of the invention are as follows: the method is characterized in that the character area containing the special symbol in the low-pixel-density text is judged, and the special symbol is identified by using a trained model, so that the special symbol identification accuracy is improved, the identification process is carried out in multiple steps, but the detection and re-identification networks are relatively small, and the identification efficiency can be effectively improved.

Description

Identification method of low-pixel-density text

Technical Field

The invention belongs to the technical field of image recognition, and particularly relates to a recognition method of a low-pixel-density text.

Background

Currently, a large number of paper documents, such as credit cards, draft, invoices, contracts, shipping documents, etc., are generated in transactions, trade and business-to-business transactions. These paper documents often need to be entered into the corresponding business system, which requires a lot of manual and repetitive labor, and is inefficient and prone to error.

OCR (Optical Character Recognition) is a technique for converting text in an image into text by image recognition. Therefore, the OCR technology can effectively solve the problem caused by manual entry under the condition of processing various types of bill scenes.

In common document tickets there are a large number of special symbols such as radio boxes, check boxes, underlining and clause constraint footnotes, etc., whereas OCR recognition systems currently existing on the market are not able to process and recognize efficiently. Because the pixel density ratio of the areas is low, the recall rate of the common deep learning detection method is low, and the omission phenomenon is easy to occur; because the frequency of occurrence of special symbols is low and similar to the structure of Chinese characters, the transfer learning is difficult to achieve a better recognition effect by expanding a training data set.

Disclosure of Invention

The invention discloses a method for identifying low-pixel-density text, which aims to solve any one of the above and other potential problems in the prior art.

In order to achieve the above purpose, the technical scheme of the invention is as follows: a recognition method of low-pixel-density text specifically comprises the following steps:

s1) acquiring image information of a text with low pixel density to be identified, and inputting the acquired image information;

s2) preprocessing input image information, and processing an image information area by using SIFT (Scale-Invariant feature Transform, scale invariant feature transform) for preliminary positioning;

s3) carrying out accurate positioning again according to the result of the preliminary positioning obtained in the S2);

s4) carrying out image information area identification on the accurate positioning result obtained in the step S3), and classifying the identified result to obtain a common text area and a special symbol area;

s5) inputting the common text region and the special symbol region obtained in the S4) into a special symbol recognition model and a common text recognition model respectively for recognition;

s6) summarizing and structuring the recognized result.

The low pixel density text contains a large number of special symbols such as radio boxes, check boxes, underlining, and footnotes, etc., resulting in text recognition with very low local pixel density.

Further, the step S5) includes a correction step: and carrying out post correction on the recognized result through an error correction mechanism.

Further, the specific steps of S2 are as follows:

s2.1) adopting a characteristic point matching algorithm to convert the input image information into a matrix,

s2.2) calculating extreme points and corresponding coordinates by constructing a differential pyramid of a scale space and a Gaussian space,

s2.3) obtaining rough coordinates of all extreme point discrimination and positioning special symbols in the image information according to the S2.2), extracting corresponding image areas, and completing preliminary positioning.

Further, the specific steps of S3) are as follows:

s3.1) inputting the result subjected to the preliminary positioning of S2.3) into a CRAST deep learning detection model,

s3.2) calculating to obtain region coordinates containing a common text region and a special symbol region in the image information by using a CRAST deep learning detection model, extracting the image information in the region coordinates and extracting the directional gradient histogram characteristics of the corresponding image,

s3.3) carrying out two classification on the gradient histogram by using an SVM vector machine according to the gradient histogram characteristics obtained in the step S3.2), and dividing the gradient histogram into a common text region and a special symbol region.

Further, the specific steps of S4) are as follows:

s4.1), firstly inputting the common text region obtained in the step S3.3) into a trained common text recognition model for recognition, and outputting a recognition result;

s4.2) inputting the special symbol area obtained in the step S3.3) into a trained special symbol recognition model for recognition, and outputting a recognition result.

Further, the specific steps of training the text recognition model and the special symbol recognition model are as follows:

step 1: taking text content in the millions of pictures as a training set of the network model;

step 2: uniformly processing all pictures to be trained in the training set, and converting the pictures to be trained into a matrix with the value range of [ -1,1 ];

step 3: randomly dividing all pictures to be trained in the training set into a plurality of groups, and combining the grouped pictures with a matrix obtained in S4.12) according to the groups to obtain batch batches;

step 4: inputting the batch obtained in the step 3 into an input layer of a corresponding identification model, and comparing an output result with an error of an actual value of a sample label in an output layer to calculate a final value;

step 5: if the output layer is subjected to CTC algorithm (Connectionist Temporal Classification, mainly used for processing alignment of input and output labels in sequence labeling problem), executing step 7 if the difference between the output result and the actual label result is smaller, otherwise executing step 6;

step 6: updating the network weight and threshold of each neuron in the hidden layer of the corresponding convolutional neural recognition model, enabling the network error function to descend along the negative gradient direction, enabling the output result to approach the expected output, and returning to the step 4;

step 7: and obtaining a trained ordinary text recognition model and a trained special symbol recognition model.

Further, the unified processing manner in the step 2 includes graying or binarization.

A computer program implementing the method of recognition of low pixel density text described above.

An information processing terminal for realizing the identification method of the low-pixel density text.

A computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the method of the low pixel density text recognition method described above.

The invention has the beneficial effects that due to the adoption of the technical scheme, the invention is a multi-stage identification scheme and mainly comprises the steps of screening, identifying and positioning the characteristic symbol region, re-identifying and correcting the characteristic symbol region and the like, which is a step-by-step fine identification process. The method is characterized in that the character area containing the special symbol is judged, the special symbol is identified by using a trained model, so that the special symbol identification accuracy is improved, the identification process is carried out in multiple steps, but the detection and re-identification networks are relatively small, and the identification efficiency can be effectively improved.

Drawings

FIG. 1 is a block flow diagram of a method for identifying text with low pixel density according to the present invention.

Fig. 2 is a schematic diagram of a picture to be identified.

FIG. 3 is a schematic diagram of a detected region identified by the method of the present invention.

In the figure:

1. general text area, 2. Special symbol area.

Detailed Description

The technical scheme of the invention is further described below with reference to the accompanying drawings and specific embodiments.

As shown in fig. 1, a method for identifying low-pixel density text according to the present invention specifically includes the following steps:

s2) preprocessing input image information, and processing an image information area by using a SIFT method to perform preliminary positioning;

s6) summarizing and structuring the recognized result.

The step S5) also comprises a correction step: and carrying out post correction on the recognized result through an error correction mechanism.

The specific steps of the S2 are as follows:

The specific steps of the S3) are as follows:

The specific steps of the S4) are as follows:

The text recognition model and the special symbol recognition model training specifically comprises the following steps:

step 5: if the difference between the output result of the output layer after being processed by the CTC algorithm and the actual label result is smaller, executing the step 7, otherwise executing the step 6;

The unified processing mode in the step 2 comprises graying or binarization.

A computer readable storage medium comprising instructions that when executed on a computer cause the computer to perform the method of identifying low pixel density text described above.

Examples:

a recognition method of low-pixel-density text specifically comprises the following steps:

firstly, collecting images of a text (document bill) with low pixel density to be identified, and inputting the collected images;

secondly, adopting a feature matching algorithm, converting an input image into a matrix by utilizing two-dimensional Gaussian blur (formula 1, sigma is a standard deviation, r is a radius), constructing a scale space (formula 2, I is an original image, and is convolution operation) and a Gaussian space differential pyramid (formula 3, k is the reciprocal of the number of pyramid layers), calculating extreme points and corresponding coordinates according to the differential pyramid, and finally judging and positioning approximate coordinates of special symbols on the image according to all the extreme points and extracting a corresponding image area, wherein fig. 1 is the extracted image area;

L(x,y,σ)＝G(x,y,σ)*I(x,y) (2)

inputting a picture to be detected by using a CRATT deep learning model (as shown in figure 2), obtaining picture coordinates containing a text region by model calculation, extracting an image in the coordinates, extracting a direction gradient histogram (Histogram of Oriented Gradient) characteristic, classifying and judging whether the image is a common text region or a special symbol region by using an SVM vector machine by using the vector, wherein the detected region is a text region in figure 3, and the detected region is a special symbol region in figure 1;

inputting the images extracted and classified in the step 2 into a trained special symbol recognition model (the model design core is shown as a formula 4), recognizing the images, and outputting a corresponding recognition result;

H _l ＝Relu(b _l *f _l (H _l-1 )+id(H _l-1 )) (4)

correcting the error of the identification result according to the characteristics of the corresponding image to be identified;

inputting an image without special symbols into a trained ordinary text recognition model (a model design core is shown as a formula 5) for text recognition and post correction;

x _i ＝H _i ([x ₀ ,x ₁ ,...,x _i-1 ]) (5)

and finally, merging and sequencing the identification results and outputting the identification results.

The above describes in detail the method for identifying the text with low pixel density provided in the embodiment of the present application. The above description of embodiments is only for aiding in understanding the method of the present application and its core ideas; meanwhile, as those skilled in the art will have modifications in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Certain terms are used throughout the description and claims to refer to particular components. Those of skill in the art will appreciate that a hardware manufacturer may refer to the same component by different names. The description and claims do not take the form of an element differentiated by name, but rather by functionality. As referred to throughout the specification and claims, the terms "comprising," including, "and" includes "are intended to be interpreted as" including/comprising, but not limited to. By "substantially" is meant that within an acceptable error range, a person skilled in the art is able to solve the technical problem within a certain error range, substantially achieving the technical effect. The description hereinafter sets forth the preferred embodiment for carrying out the present application, but is not intended to limit the scope of the present application in general, for the purpose of illustrating the general principles of the present application. The scope of the present application is defined by the appended claims.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a product or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such product or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a commodity or system comprising such elements.

It should be understood that the term "and/or" as used herein is merely one relationship describing the association of the associated objects, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

While the foregoing description illustrates and describes the preferred embodiments of the present application, it is to be understood that this application is not limited to the forms disclosed herein, but is not to be construed as an exclusive use of other embodiments, and is capable of many other combinations, modifications and environments, and adaptations within the scope of the teachings described herein, through the foregoing teachings or through the knowledge or skills of the relevant art. And that modifications and variations which do not depart from the spirit and scope of the present invention are intended to be within the scope of the appended claims.

Claims

1. A method for identifying low-pixel density text specifically comprises the following steps:

the method comprises the following specific steps:

s2.3) obtaining rough coordinates of all extreme point discrimination and positioning special symbols in the image information according to the S2.2), extracting corresponding image areas, and completing preliminary positioning;

the method comprises the following steps: s3.1) inputting the result subjected to the preliminary positioning of S2.3) into a CRAST deep learning detection model,

s3.3) carrying out two classification on the gradient histogram by using an SVM vector machine according to the gradient histogram characteristics obtained in the step S3.2), and dividing the gradient histogram into a common text region and a special symbol region;

s6) summarizing and structuring the identified results and outputting the results; the method is characterized by comprising the following specific steps of S4):

s4.2) inputting the special symbol area obtained in the step S3.3) into a trained special symbol recognition model for recognition, and outputting a recognition result;

step 3: randomly dividing all the pictures to be trained in the training set into a plurality of groups, and combining the grouped pictures with the matrix obtained in the step 2 according to the groups to obtain batch batches;

2. The method according to claim 1, wherein the unified processing means in step 2 includes graying or binarizing.

3. An information processing terminal implementing the low-pixel-density text recognition method according to any one of claims 1 to 2.

4. A computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the method of identifying low pixel density text as claimed in any one of claims 1-2.