CN112308057A

CN112308057A - OCR (optical character recognition) optimization method and system based on character position information

Info

Publication number: CN112308057A
Application number: CN202011090602.2A
Authority: CN
Inventors: 张丽丽; 刘宏亮; 刘伟珊; 王菲
Original assignee: Shandong Guoying Big Data Industry Co ltd
Current assignee: Shandong Guoying Big Data Industry Co ltd
Priority date: 2020-10-13
Filing date: 2020-10-13
Publication date: 2021-02-02

Abstract

The invention discloses an OCR optimization method and system based on character position information, which comprises the following steps: setting constraint information items and extraction ranges to be extracted; preprocessing an image to be recognized to acquire characters and position information in the image; calculating the average line spacing of all the character information, determining the spacing between the current character and the next character, and judging whether the current character and the next character need to be combined or not based on the size of the spacing and the average line spacing; and formatting and outputting the extracted information item character data. On the premise of ensuring the accuracy of the recognition result, the method can realize high-precision extraction of various characters only by scanning the picture once, and can obviously reduce the computational hardware cost and time cost realized by the OCR recognition technology.

Description

OCR (optical character recognition) optimization method and system based on character position information

Technical Field

The invention relates to the technical field of image character recognition, in particular to an OCR optimization method and system based on character position information.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

OCR (Optical Character Recognition) refers to a process in which an electronic device (e.g., a scanner or a digital camera) checks a Character printed on paper, determines its shape by detecting dark and light patterns, and then translates the shape into a computer text by a Character Recognition method;

with the improvement of the informatization degree of various industries and the development of computer image processing technology, machine learning and other artificial intelligence technologies, the OCR character recognition technology makes a rapid progress, the recognition efficiency and scenes are more and more efficient and richer, and the method develops from the first OCR recognition of unformatted simple scenes such as electronic books and the like to the OCR recognition of formatted complex scenes such as business licenses, identity cards, driving licenses, birth cards and the like which are ubiquitous in various industries at present. However, all OCR solutions in the current complex scene put forward higher requirements on the shooting definition, shooting angle, shooting range and the like of the photo to be recognized, and some solutions help the user to shoot a high-standard picture by setting a humanized shooting range auxiliary frame so as to improve the recognition accuracy.

Most of the existing OCR recognition solutions for complex scenes are to obtain the classification of the certificate photos to be recognized and the area information of each information item through certificate template training, then to cut the pictures again according to the area information, to perform a series of image preprocessing such as graying, binarization, noise removal, inclination correction and the like on the cut small-area pictures, and then to perform character recognition, wherein the cut pictures have small information amount and high recognition success rate, but the above solutions have the following defects technically:

(1) the acquisition of information item coordinate area information in the certificate template needs a large amount of sample training to be more accurate, and the technical cost is high.

(2) The accuracy of the coordinate region of the information item obtained by the training can directly determine the accuracy of the subsequent final recognition result, because once the information of the coordinate region is inaccurate, the information in the cut-out picture is incomplete, and the recognition result is incorrect. Therefore, in order to ensure the accuracy of model training, the technical scheme of the type puts forward high technical requirements on the definition, the shooting angle and the like of a training sample.

(3) The solutions not only have high technical requirements on training samples, but also require that a photo to be recognized provided by a user in a production environment also needs to meet higher technical indexes, so the solutions generally need to intervene when the user submits a picture, or give a shooting reference prompt, and the user experience is poor.

(4) This kind of solution needs to carry out many times data scanning (cutting) to original picture, and time cost is high.

Disclosure of Invention

In order to solve the problems, the invention provides an OCR optimization method and system based on character position information, which can obviously reduce the time and technical cost for realizing the OCR recognition technology on the premise of ensuring the accuracy of the recognition result, and obviously improve the user experience.

In some embodiments, the following technical scheme is adopted:

an OCR optimization method based on character position information comprises the following steps:

setting constraint information items and extraction ranges to be extracted;

preprocessing an image to be recognized to acquire characters and position information in the image;

calculating the average line spacing of all the character information, determining the spacing between the current character and the next character, and judging whether the current character and the next character need to be combined or not based on the size of the spacing and the average line spacing;

and formatting and outputting the extracted information item character data.

In some embodiments, the following technical scheme is adopted:

an OCR optimization system based on text position information, comprising:

the template design module is used for setting constraint information items to be extracted and an extraction range;

the character position information identification module is used for preprocessing an image to be identified and acquiring characters and position information in the image;

the optimization processing module is used for calculating the average line spacing of all the character information, determining the spacing between the current character and the next line of characters, and judging whether the current character and the next line of characters need to be combined or not based on the size of the spacing and the average line spacing;

and the structured output module is used for formatting and outputting the extracted information item character data.

In some embodiments, the following technical scheme is adopted:

a terminal device comprising a processor and a computer-readable storage medium, the processor being configured to implement instructions; the computer readable storage medium is used for storing a plurality of instructions, and the instructions are suitable for being loaded by a processor and executing the OCR optimization method based on the character position information.

In some embodiments, the following technical scheme is adopted:

a computer-readable storage medium, wherein a plurality of instructions are stored, and the instructions are suitable for being loaded by a processor of a terminal device and executing the above OCR optimization method based on the character position information.

Compared with the prior art, the invention has the beneficial effects that:

(1) on the premise of ensuring the accuracy of the recognition result, the method can realize high-precision extraction of various characters only by scanning the picture once, and can obviously reduce the computational hardware cost and time cost realized by the OCR technology.

(2) The method of the invention does not need to provide information item coordinate position area information in the certificate template, so the technical requirements on the photo to be recognized of the user or the photographing process can be obviously reduced, the user experience is obviously improved, the OCR template can be adjusted to realize the high-efficiency extraction of the OCR of various certificate pictures such as business licenses, identity cards, business cards, driving licenses, house property cards and the like, and the method has very flexible expansibility.

Drawings

FIG. 1 is a flowchart of an OCR optimization method based on text position information according to an embodiment of the present invention;

FIG. 2 is a comparison of the extraction results of the present invention embodiment and the recognition results of the prior art method.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

Example one

In one or more embodiments, an OCR optimization method based on text location information is disclosed, which is described by taking extracting text information in a license as an example, and with reference to fig. 1, the method specifically includes the following processes:

(1) setting constraint information items and extraction ranges to be extracted;

specifically, the constraint information item and the extraction range refer to which text information in the license that needs to be extracted, but the position information of the text does not need to be extracted; the technical requirements on the photo to be recognized or the photographing process of the user can be reduced. For example, the following information items are analyzed to determine that the company license needs to be extracted:

a V social credit code;

check mark's business scope;

v. French;

the date when the check mark is established;

check-in capital;

v. certificate number;

a hook address;

name of the unit of check-up;

a V type;

effective period of √

(2) Preprocessing an image to be recognized to acquire characters and position information in the image;

specifically, the process of preprocessing the image includes: graying, binaryzation, noise removal, inclination correction and the like;

after the preprocessing is finished, searching characters appearing in the picture line by line and recording coordinate positions by a near connected region searching method, intercepting the region where the characters are located, and identifying the characters to obtain all characters of the picture and coordinate position information of the characters; and extracting all character information in the picture according to lines by scanning the picture once.

The specific implementation process of the method of this step can be obtained by those skilled in the art according to the prior art, such as: the generic OCR recognition API of the hundredth cloud and will therefore not be described in detail.

(3) Calculating the average line spacing of all the character information, determining the spacing between the current character and the next character, and judging whether the current character and the next character need to be combined or not based on the size of the spacing and the average line spacing;

specifically, since the information items such as the registration address and the business scope are distributed in multiple lines when the content of the information items is large in the license, and the font and the line spacing are inconsistent with those of other text contents when the multiple lines are distributed, for example, the line spacing of the lines is generally significantly smaller than the text line spacing.

Therefore, in this embodiment, the average line spacing of the whole picture is calculated first, and the specific method is as follows:

traversing the coordinates of each line of characters of the extracted whole picture;

calculating the line spacing between the current line and the next line by using the coordinate values, and recording the number of lines;

accumulating and summing all the line intervals;

and fourthly, dividing the accumulated value of the line spacing by the number of the recording lines to obtain the average line spacing.

Correcting the maximum line spacing maxDiff of a plurality of lines of characters based on the average line spacing, wherein the maxDiff is used for identifying whether character information in a current line is merged and extracted: if the actual line spacing between the current line and the next line is smaller than maxDeff, combining and extracting the text information identified by the current line as the content of the previous information item, otherwise, considering the text information as the content of the new information item. Wherein, the correction coefficient is set to 0.5 in this example, which is half of the text average line spacing avDiff: maxDiff ═ avDiff 0.5.

Because some content in the business license may be displayed in multiple lines, the final information item text may need to merge multiple lines of records, and the stopping basis in merging records in the scheme is based on the line spacing: when the line spacing between the current line and the next line is obviously smaller than the average line spacing (the maximum line spacing of multiple lines of characters, maxDiff), the display is performed in multiple lines, the content extracted as the current information item needs to be merged, otherwise, the display is performed to indicate the content of the new information item, the merging is stopped, and the current characters are identified as the content of the new information item. The specific implementation process is as follows;

firstly, acquiring text information rawText of a camping range in a current line;

if the next line does not exist, directly returning the extracted characters rawText;

calculating the line spacing nextline Diff between the current line and the next line if the next line exists;

if nextlieDiff is less than maxDeff, combining the next line of text information into the rawText, and jumping to the step two;

if nextLineDiff > is maxDiff, return to the now merged and extracted rawText.

According to the process, each information item to be identified is extracted to obtain the identification data of all the information items in the whole picture, the process does not involve reprocessing of the original picture, and the process is very efficient.

(4) And formatting and outputting the extracted information item character data.

The information item data in the whole picture obtained in the previous step is formatted and output according to a desired output format, as seen in fig. 2, in this example, json format output is adopted.

Fig. 2 shows a comparison chart example of the recognition effect of the method of the present embodiment compared with the Baidu smart cloud recognition method, and as can be seen from fig. 2, the method of the present embodiment can accurately recognize and merge and output a plurality of lines of text information, and the recognition result is more accurate and is not affected by the photographed picture or the photographing technology.

Example two

In one or more implementations, a system for OCR optimization based on text position information is disclosed, comprising:

It should be noted that the specific implementation manner of the modules adopts the method disclosed in the first embodiment, and details are not described here.

EXAMPLE III

In one or more embodiments, a terminal device is disclosed, which includes a server including a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the OCR optimization method based on text position information in the first embodiment when executing the program. For brevity, no further description is provided herein.

It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory may include both read-only memory and random access memory, and may provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.

In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software.

The OCR optimization method based on the text position information in the first embodiment may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here.

Those of ordinary skill in the art will appreciate that the various illustrative elements, i.e., algorithm steps, described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims

1. An OCR optimization method based on character position information is characterized by comprising the following steps:

setting constraint information items and extraction ranges to be extracted;

and formatting and outputting the extracted information item character data.

2. An OCR optimization method based on character position information as claimed in claim 1, wherein the preprocessing of the image to be recognized includes: and carrying out preprocessing operations of graying, binaryzation, noise removal and inclination correction on the image.

3. An OCR optimization method based on character position information as claimed in claim 1, wherein setting a constraint information item to be extracted and an extraction range specifically includes: text information in the image needs to be extracted.

4. An OCR optimization method based on character position information as claimed in claim 1, characterized in that by the adjacent connected region search method, the characters appearing in the picture are searched line by line and the coordinate position is recorded, the region where the characters are located is intercepted, the characters are recognized, and all characters and the coordinate position information thereof are obtained.

5. An OCR optimization method based on text position information as claimed in claim 1, wherein calculating the average line spacing of all text information specifically includes:

traversing the coordinates of each line of characters in the extracted image, and calculating the line spacing between the current line and the next line based on the coordinate values;

accumulating and summing all the line intervals; an average line spacing is determined based on a ratio of the accumulated value to the number of lines.

6. The OCR optimization method based on the character position information as claimed in claim 1, wherein the step of judging whether the current character and the next character need to be merged based on the space and the average line space includes:

when the line spacing between the current line and the next line is smaller than the average line spacing, combining and extracting characters of the current line and the next line;

and stopping merging when the line spacing of the current line and the next line is larger than the average line spacing.

7. An OCR optimization method based on character position information as claimed in claim 1, characterized in that each constraint information item to be recognized is extracted, and the extracted information item character data is formatted and output.

8. An OCR optimization system based on text position information, comprising:

9. A terminal device comprising a processor and a computer-readable storage medium, the processor being configured to implement instructions; a computer readable storage medium for storing a plurality of instructions adapted to be loaded by a processor and to perform the method for OCR optimization based on text position information according to any one of claims 1-7.

10. A computer-readable storage medium having stored thereon a plurality of instructions, wherein the instructions are adapted to be loaded by a processor of a terminal device and to execute the OCR optimization method based on text position information according to any one of claims 1 to 7.