CN112308057A - OCR (optical character recognition) optimization method and system based on character position information - Google Patents

OCR (optical character recognition) optimization method and system based on character position information Download PDF

Info

Publication number
CN112308057A
CN112308057A CN202011090602.2A CN202011090602A CN112308057A CN 112308057 A CN112308057 A CN 112308057A CN 202011090602 A CN202011090602 A CN 202011090602A CN 112308057 A CN112308057 A CN 112308057A
Authority
CN
China
Prior art keywords
character
position information
line
ocr
spacing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011090602.2A
Other languages
Chinese (zh)
Inventor
张丽丽
刘宏亮
刘伟珊
王菲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Guoying Big Data Industry Co ltd
Original Assignee
Shandong Guoying Big Data Industry Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Guoying Big Data Industry Co ltd filed Critical Shandong Guoying Big Data Industry Co ltd
Priority to CN202011090602.2A priority Critical patent/CN112308057A/en
Publication of CN112308057A publication Critical patent/CN112308057A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Input (AREA)

Abstract

The invention discloses an OCR optimization method and system based on character position information, which comprises the following steps: setting constraint information items and extraction ranges to be extracted; preprocessing an image to be recognized to acquire characters and position information in the image; calculating the average line spacing of all the character information, determining the spacing between the current character and the next character, and judging whether the current character and the next character need to be combined or not based on the size of the spacing and the average line spacing; and formatting and outputting the extracted information item character data. On the premise of ensuring the accuracy of the recognition result, the method can realize high-precision extraction of various characters only by scanning the picture once, and can obviously reduce the computational hardware cost and time cost realized by the OCR recognition technology.

Description

OCR (optical character recognition) optimization method and system based on character position information
Technical Field
The invention relates to the technical field of image character recognition, in particular to an OCR optimization method and system based on character position information.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
OCR (Optical Character Recognition) refers to a process in which an electronic device (e.g., a scanner or a digital camera) checks a Character printed on paper, determines its shape by detecting dark and light patterns, and then translates the shape into a computer text by a Character Recognition method;
with the improvement of the informatization degree of various industries and the development of computer image processing technology, machine learning and other artificial intelligence technologies, the OCR character recognition technology makes a rapid progress, the recognition efficiency and scenes are more and more efficient and richer, and the method develops from the first OCR recognition of unformatted simple scenes such as electronic books and the like to the OCR recognition of formatted complex scenes such as business licenses, identity cards, driving licenses, birth cards and the like which are ubiquitous in various industries at present. However, all OCR solutions in the current complex scene put forward higher requirements on the shooting definition, shooting angle, shooting range and the like of the photo to be recognized, and some solutions help the user to shoot a high-standard picture by setting a humanized shooting range auxiliary frame so as to improve the recognition accuracy.
Most of the existing OCR recognition solutions for complex scenes are to obtain the classification of the certificate photos to be recognized and the area information of each information item through certificate template training, then to cut the pictures again according to the area information, to perform a series of image preprocessing such as graying, binarization, noise removal, inclination correction and the like on the cut small-area pictures, and then to perform character recognition, wherein the cut pictures have small information amount and high recognition success rate, but the above solutions have the following defects technically:
(1) the acquisition of information item coordinate area information in the certificate template needs a large amount of sample training to be more accurate, and the technical cost is high.
(2) The accuracy of the coordinate region of the information item obtained by the training can directly determine the accuracy of the subsequent final recognition result, because once the information of the coordinate region is inaccurate, the information in the cut-out picture is incomplete, and the recognition result is incorrect. Therefore, in order to ensure the accuracy of model training, the technical scheme of the type puts forward high technical requirements on the definition, the shooting angle and the like of a training sample.
(3) The solutions not only have high technical requirements on training samples, but also require that a photo to be recognized provided by a user in a production environment also needs to meet higher technical indexes, so the solutions generally need to intervene when the user submits a picture, or give a shooting reference prompt, and the user experience is poor.
(4) This kind of solution needs to carry out many times data scanning (cutting) to original picture, and time cost is high.
Disclosure of Invention
In order to solve the problems, the invention provides an OCR optimization method and system based on character position information, which can obviously reduce the time and technical cost for realizing the OCR recognition technology on the premise of ensuring the accuracy of the recognition result, and obviously improve the user experience.
In some embodiments, the following technical scheme is adopted:
an OCR optimization method based on character position information comprises the following steps:
setting constraint information items and extraction ranges to be extracted;
preprocessing an image to be recognized to acquire characters and position information in the image;
calculating the average line spacing of all the character information, determining the spacing between the current character and the next character, and judging whether the current character and the next character need to be combined or not based on the size of the spacing and the average line spacing;
and formatting and outputting the extracted information item character data.
In some embodiments, the following technical scheme is adopted:
an OCR optimization system based on text position information, comprising:
the template design module is used for setting constraint information items to be extracted and an extraction range;
the character position information identification module is used for preprocessing an image to be identified and acquiring characters and position information in the image;
the optimization processing module is used for calculating the average line spacing of all the character information, determining the spacing between the current character and the next line of characters, and judging whether the current character and the next line of characters need to be combined or not based on the size of the spacing and the average line spacing;
and the structured output module is used for formatting and outputting the extracted information item character data.
In some embodiments, the following technical scheme is adopted:
a terminal device comprising a processor and a computer-readable storage medium, the processor being configured to implement instructions; the computer readable storage medium is used for storing a plurality of instructions, and the instructions are suitable for being loaded by a processor and executing the OCR optimization method based on the character position information.
In some embodiments, the following technical scheme is adopted:
a computer-readable storage medium, wherein a plurality of instructions are stored, and the instructions are suitable for being loaded by a processor of a terminal device and executing the above OCR optimization method based on the character position information.
Compared with the prior art, the invention has the beneficial effects that:
(1) on the premise of ensuring the accuracy of the recognition result, the method can realize high-precision extraction of various characters only by scanning the picture once, and can obviously reduce the computational hardware cost and time cost realized by the OCR technology.
(2) The method of the invention does not need to provide information item coordinate position area information in the certificate template, so the technical requirements on the photo to be recognized of the user or the photographing process can be obviously reduced, the user experience is obviously improved, the OCR template can be adjusted to realize the high-efficiency extraction of the OCR of various certificate pictures such as business licenses, identity cards, business cards, driving licenses, house property cards and the like, and the method has very flexible expansibility.
Drawings
FIG. 1 is a flowchart of an OCR optimization method based on text position information according to an embodiment of the present invention;
FIG. 2 is a comparison of the extraction results of the present invention embodiment and the recognition results of the prior art method.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
Example one
In one or more embodiments, an OCR optimization method based on text location information is disclosed, which is described by taking extracting text information in a license as an example, and with reference to fig. 1, the method specifically includes the following processes:
(1) setting constraint information items and extraction ranges to be extracted;
specifically, the constraint information item and the extraction range refer to which text information in the license that needs to be extracted, but the position information of the text does not need to be extracted; the technical requirements on the photo to be recognized or the photographing process of the user can be reduced. For example, the following information items are analyzed to determine that the company license needs to be extracted:
a V social credit code;
check mark's business scope;
v. French;
the date when the check mark is established;
check-in capital;
v. certificate number;
a hook address;
name of the unit of check-up;
a V type;
effective period of √
(2) Preprocessing an image to be recognized to acquire characters and position information in the image;
specifically, the process of preprocessing the image includes: graying, binaryzation, noise removal, inclination correction and the like;
after the preprocessing is finished, searching characters appearing in the picture line by line and recording coordinate positions by a near connected region searching method, intercepting the region where the characters are located, and identifying the characters to obtain all characters of the picture and coordinate position information of the characters; and extracting all character information in the picture according to lines by scanning the picture once.
The specific implementation process of the method of this step can be obtained by those skilled in the art according to the prior art, such as: the generic OCR recognition API of the hundredth cloud and will therefore not be described in detail.
(3) Calculating the average line spacing of all the character information, determining the spacing between the current character and the next character, and judging whether the current character and the next character need to be combined or not based on the size of the spacing and the average line spacing;
specifically, since the information items such as the registration address and the business scope are distributed in multiple lines when the content of the information items is large in the license, and the font and the line spacing are inconsistent with those of other text contents when the multiple lines are distributed, for example, the line spacing of the lines is generally significantly smaller than the text line spacing.
Therefore, in this embodiment, the average line spacing of the whole picture is calculated first, and the specific method is as follows:
traversing the coordinates of each line of characters of the extracted whole picture;
calculating the line spacing between the current line and the next line by using the coordinate values, and recording the number of lines;
accumulating and summing all the line intervals;
and fourthly, dividing the accumulated value of the line spacing by the number of the recording lines to obtain the average line spacing.
Correcting the maximum line spacing maxDiff of a plurality of lines of characters based on the average line spacing, wherein the maxDiff is used for identifying whether character information in a current line is merged and extracted: if the actual line spacing between the current line and the next line is smaller than maxDeff, combining and extracting the text information identified by the current line as the content of the previous information item, otherwise, considering the text information as the content of the new information item. Wherein, the correction coefficient is set to 0.5 in this example, which is half of the text average line spacing avDiff: maxDiff ═ avDiff 0.5.
Because some content in the business license may be displayed in multiple lines, the final information item text may need to merge multiple lines of records, and the stopping basis in merging records in the scheme is based on the line spacing: when the line spacing between the current line and the next line is obviously smaller than the average line spacing (the maximum line spacing of multiple lines of characters, maxDiff), the display is performed in multiple lines, the content extracted as the current information item needs to be merged, otherwise, the display is performed to indicate the content of the new information item, the merging is stopped, and the current characters are identified as the content of the new information item. The specific implementation process is as follows;
firstly, acquiring text information rawText of a camping range in a current line;
if the next line does not exist, directly returning the extracted characters rawText;
calculating the line spacing nextline Diff between the current line and the next line if the next line exists;
if nextlieDiff is less than maxDeff, combining the next line of text information into the rawText, and jumping to the step two;
if nextLineDiff > is maxDiff, return to the now merged and extracted rawText.
According to the process, each information item to be identified is extracted to obtain the identification data of all the information items in the whole picture, the process does not involve reprocessing of the original picture, and the process is very efficient.
(4) And formatting and outputting the extracted information item character data.
The information item data in the whole picture obtained in the previous step is formatted and output according to a desired output format, as seen in fig. 2, in this example, json format output is adopted.
Fig. 2 shows a comparison chart example of the recognition effect of the method of the present embodiment compared with the Baidu smart cloud recognition method, and as can be seen from fig. 2, the method of the present embodiment can accurately recognize and merge and output a plurality of lines of text information, and the recognition result is more accurate and is not affected by the photographed picture or the photographing technology.
Example two
In one or more implementations, a system for OCR optimization based on text position information is disclosed, comprising:
the template design module is used for setting constraint information items to be extracted and an extraction range;
the character position information identification module is used for preprocessing an image to be identified and acquiring characters and position information in the image;
the optimization processing module is used for calculating the average line spacing of all the character information, determining the spacing between the current character and the next line of characters, and judging whether the current character and the next line of characters need to be combined or not based on the size of the spacing and the average line spacing;
and the structured output module is used for formatting and outputting the extracted information item character data.
It should be noted that the specific implementation manner of the modules adopts the method disclosed in the first embodiment, and details are not described here.
EXAMPLE III
In one or more embodiments, a terminal device is disclosed, which includes a server including a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the OCR optimization method based on text position information in the first embodiment when executing the program. For brevity, no further description is provided herein.
It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include both read-only memory and random access memory, and may provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.
In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software.
The OCR optimization method based on the text position information in the first embodiment may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here.
Those of ordinary skill in the art will appreciate that the various illustrative elements, i.e., algorithm steps, described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.

Claims (10)

1. An OCR optimization method based on character position information is characterized by comprising the following steps:
setting constraint information items and extraction ranges to be extracted;
preprocessing an image to be recognized to acquire characters and position information in the image;
calculating the average line spacing of all the character information, determining the spacing between the current character and the next character, and judging whether the current character and the next character need to be combined or not based on the size of the spacing and the average line spacing;
and formatting and outputting the extracted information item character data.
2. An OCR optimization method based on character position information as claimed in claim 1, wherein the preprocessing of the image to be recognized includes: and carrying out preprocessing operations of graying, binaryzation, noise removal and inclination correction on the image.
3. An OCR optimization method based on character position information as claimed in claim 1, wherein setting a constraint information item to be extracted and an extraction range specifically includes: text information in the image needs to be extracted.
4. An OCR optimization method based on character position information as claimed in claim 1, characterized in that by the adjacent connected region search method, the characters appearing in the picture are searched line by line and the coordinate position is recorded, the region where the characters are located is intercepted, the characters are recognized, and all characters and the coordinate position information thereof are obtained.
5. An OCR optimization method based on text position information as claimed in claim 1, wherein calculating the average line spacing of all text information specifically includes:
traversing the coordinates of each line of characters in the extracted image, and calculating the line spacing between the current line and the next line based on the coordinate values;
accumulating and summing all the line intervals; an average line spacing is determined based on a ratio of the accumulated value to the number of lines.
6. The OCR optimization method based on the character position information as claimed in claim 1, wherein the step of judging whether the current character and the next character need to be merged based on the space and the average line space includes:
when the line spacing between the current line and the next line is smaller than the average line spacing, combining and extracting characters of the current line and the next line;
and stopping merging when the line spacing of the current line and the next line is larger than the average line spacing.
7. An OCR optimization method based on character position information as claimed in claim 1, characterized in that each constraint information item to be recognized is extracted, and the extracted information item character data is formatted and output.
8. An OCR optimization system based on text position information, comprising:
the template design module is used for setting constraint information items to be extracted and an extraction range;
the character position information identification module is used for preprocessing an image to be identified and acquiring characters and position information in the image;
the optimization processing module is used for calculating the average line spacing of all the character information, determining the spacing between the current character and the next line of characters, and judging whether the current character and the next line of characters need to be combined or not based on the size of the spacing and the average line spacing;
and the structured output module is used for formatting and outputting the extracted information item character data.
9. A terminal device comprising a processor and a computer-readable storage medium, the processor being configured to implement instructions; a computer readable storage medium for storing a plurality of instructions adapted to be loaded by a processor and to perform the method for OCR optimization based on text position information according to any one of claims 1-7.
10. A computer-readable storage medium having stored thereon a plurality of instructions, wherein the instructions are adapted to be loaded by a processor of a terminal device and to execute the OCR optimization method based on text position information according to any one of claims 1 to 7.
CN202011090602.2A 2020-10-13 2020-10-13 OCR (optical character recognition) optimization method and system based on character position information Pending CN112308057A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011090602.2A CN112308057A (en) 2020-10-13 2020-10-13 OCR (optical character recognition) optimization method and system based on character position information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011090602.2A CN112308057A (en) 2020-10-13 2020-10-13 OCR (optical character recognition) optimization method and system based on character position information

Publications (1)

Publication Number Publication Date
CN112308057A true CN112308057A (en) 2021-02-02

Family

ID=74488033

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011090602.2A Pending CN112308057A (en) 2020-10-13 2020-10-13 OCR (optical character recognition) optimization method and system based on character position information

Country Status (1)

Country Link
CN (1) CN112308057A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114490697A (en) * 2022-03-28 2022-05-13 山东国赢大数据产业有限公司 Data cooperative processing method and device based on block chain
CN117079282A (en) * 2023-08-16 2023-11-17 读书郎教育科技有限公司 Intelligent dictionary pen based on image processing

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103034856A (en) * 2012-12-18 2013-04-10 深圳深讯和科技有限公司 Method and device for locating text area in image
CN108364009A (en) * 2018-02-12 2018-08-03 掌阅科技股份有限公司 Recognition methods, computing device and the computer storage media of two-dimensional structure formula
CN110941972A (en) * 2018-09-21 2020-03-31 广州金山移动科技有限公司 Method and device for segmenting characters in PDF document and electronic equipment
CN111368562A (en) * 2020-02-28 2020-07-03 北京字节跳动网络技术有限公司 Method and device for translating characters in picture, electronic equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103034856A (en) * 2012-12-18 2013-04-10 深圳深讯和科技有限公司 Method and device for locating text area in image
CN108364009A (en) * 2018-02-12 2018-08-03 掌阅科技股份有限公司 Recognition methods, computing device and the computer storage media of two-dimensional structure formula
CN110941972A (en) * 2018-09-21 2020-03-31 广州金山移动科技有限公司 Method and device for segmenting characters in PDF document and electronic equipment
CN111368562A (en) * 2020-02-28 2020-07-03 北京字节跳动网络技术有限公司 Method and device for translating characters in picture, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
曾凡峰 等: "基于连通域的扭曲中文文本图像快速校正方法", 《计算机工程与设计》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114490697A (en) * 2022-03-28 2022-05-13 山东国赢大数据产业有限公司 Data cooperative processing method and device based on block chain
CN117079282A (en) * 2023-08-16 2023-11-17 读书郎教育科技有限公司 Intelligent dictionary pen based on image processing

Similar Documents

Publication Publication Date Title
CN109492643B (en) Certificate identification method and device based on OCR, computer equipment and storage medium
US20210192202A1 (en) Recognizing text in image data
CN110046529B (en) Two-dimensional code identification method, device and equipment
US11164027B2 (en) Deep learning based license plate identification method, device, equipment, and storage medium
WO2019238063A1 (en) Text detection and analysis method and apparatus, and device
CN111476227B (en) Target field identification method and device based on OCR and storage medium
WO2019169532A1 (en) License plate recognition method and cloud system
CN108717543B (en) Invoice identification method and device and computer storage medium
CN110263694A (en) A kind of bank slip recognition method and device
WO2018233055A1 (en) Method and apparatus for entering policy information, computer device and storage medium
US20190362193A1 (en) Eyeglass positioning method, apparatus and storage medium
CN108846385B (en) Image identification and correction method and device based on convolution-deconvolution neural network
US20210182547A1 (en) Automated systems and methods for identifying fields and regions of interest within a document image
CN110598566A (en) Image processing method, device, terminal and computer readable storage medium
CN112308057A (en) OCR (optical character recognition) optimization method and system based on character position information
CN114092938B (en) Image recognition processing method and device, electronic equipment and storage medium
CN112434690A (en) Method, system and storage medium for automatically capturing and understanding elements of dynamically analyzing text image characteristic phenomena
US11574492B2 (en) Efficient location and identification of documents in images
CN110781890A (en) Identification card identification method and device, electronic equipment and readable storage medium
CN111160169A (en) Face detection method, device, equipment and computer readable storage medium
CN111160395A (en) Image recognition method and device, electronic equipment and storage medium
CN113158895A (en) Bill identification method and device, electronic equipment and storage medium
CN112686257A (en) Storefront character recognition method and system based on OCR
US11210507B2 (en) Automated systems and methods for identifying fields and regions of interest within a document image
CN113557520A (en) Character processing and character recognition method, storage medium and terminal device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210202