CN112308057A - OCR (optical character recognition) optimization method and system based on character position information - Google Patents
OCR (optical character recognition) optimization method and system based on character position information Download PDFInfo
- Publication number
- CN112308057A CN112308057A CN202011090602.2A CN202011090602A CN112308057A CN 112308057 A CN112308057 A CN 112308057A CN 202011090602 A CN202011090602 A CN 202011090602A CN 112308057 A CN112308057 A CN 112308057A
- Authority
- CN
- China
- Prior art keywords
- character
- position information
- line
- ocr
- spacing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000005457 optimization Methods 0.000 title claims abstract description 27
- 238000012015 optical character recognition Methods 0.000 title description 27
- 238000000605 extraction Methods 0.000 claims abstract description 13
- 238000007781 pre-processing Methods 0.000 claims abstract description 12
- 238000003860 storage Methods 0.000 claims description 8
- 238000012545 processing Methods 0.000 claims description 5
- 238000012937 correction Methods 0.000 claims description 4
- 238000013461 design Methods 0.000 claims description 4
- 238000005516 engineering process Methods 0.000 abstract description 7
- 238000012549 training Methods 0.000 description 6
- 238000003491 array Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012958 reprocessing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Character Input (AREA)
Abstract
The invention discloses an OCR optimization method and system based on character position information, which comprises the following steps: setting constraint information items and extraction ranges to be extracted; preprocessing an image to be recognized to acquire characters and position information in the image; calculating the average line spacing of all the character information, determining the spacing between the current character and the next character, and judging whether the current character and the next character need to be combined or not based on the size of the spacing and the average line spacing; and formatting and outputting the extracted information item character data. On the premise of ensuring the accuracy of the recognition result, the method can realize high-precision extraction of various characters only by scanning the picture once, and can obviously reduce the computational hardware cost and time cost realized by the OCR recognition technology.
Description
Technical Field
The invention relates to the technical field of image character recognition, in particular to an OCR optimization method and system based on character position information.
Background
The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.
OCR (Optical Character Recognition) refers to a process in which an electronic device (e.g., a scanner or a digital camera) checks a Character printed on paper, determines its shape by detecting dark and light patterns, and then translates the shape into a computer text by a Character Recognition method;
with the improvement of the informatization degree of various industries and the development of computer image processing technology, machine learning and other artificial intelligence technologies, the OCR character recognition technology makes a rapid progress, the recognition efficiency and scenes are more and more efficient and richer, and the method develops from the first OCR recognition of unformatted simple scenes such as electronic books and the like to the OCR recognition of formatted complex scenes such as business licenses, identity cards, driving licenses, birth cards and the like which are ubiquitous in various industries at present. However, all OCR solutions in the current complex scene put forward higher requirements on the shooting definition, shooting angle, shooting range and the like of the photo to be recognized, and some solutions help the user to shoot a high-standard picture by setting a humanized shooting range auxiliary frame so as to improve the recognition accuracy.
Most of the existing OCR recognition solutions for complex scenes are to obtain the classification of the certificate photos to be recognized and the area information of each information item through certificate template training, then to cut the pictures again according to the area information, to perform a series of image preprocessing such as graying, binarization, noise removal, inclination correction and the like on the cut small-area pictures, and then to perform character recognition, wherein the cut pictures have small information amount and high recognition success rate, but the above solutions have the following defects technically:
(1) the acquisition of information item coordinate area information in the certificate template needs a large amount of sample training to be more accurate, and the technical cost is high.
(2) The accuracy of the coordinate region of the information item obtained by the training can directly determine the accuracy of the subsequent final recognition result, because once the information of the coordinate region is inaccurate, the information in the cut-out picture is incomplete, and the recognition result is incorrect. Therefore, in order to ensure the accuracy of model training, the technical scheme of the type puts forward high technical requirements on the definition, the shooting angle and the like of a training sample.
(3) The solutions not only have high technical requirements on training samples, but also require that a photo to be recognized provided by a user in a production environment also needs to meet higher technical indexes, so the solutions generally need to intervene when the user submits a picture, or give a shooting reference prompt, and the user experience is poor.
(4) This kind of solution needs to carry out many times data scanning (cutting) to original picture, and time cost is high.
Disclosure of Invention
In order to solve the problems, the invention provides an OCR optimization method and system based on character position information, which can obviously reduce the time and technical cost for realizing the OCR recognition technology on the premise of ensuring the accuracy of the recognition result, and obviously improve the user experience.
In some embodiments, the following technical scheme is adopted:
an OCR optimization method based on character position information comprises the following steps:
setting constraint information items and extraction ranges to be extracted;
preprocessing an image to be recognized to acquire characters and position information in the image;
calculating the average line spacing of all the character information, determining the spacing between the current character and the next character, and judging whether the current character and the next character need to be combined or not based on the size of the spacing and the average line spacing;
and formatting and outputting the extracted information item character data.
In some embodiments, the following technical scheme is adopted:
an OCR optimization system based on text position information, comprising:
the template design module is used for setting constraint information items to be extracted and an extraction range;
the character position information identification module is used for preprocessing an image to be identified and acquiring characters and position information in the image;
the optimization processing module is used for calculating the average line spacing of all the character information, determining the spacing between the current character and the next line of characters, and judging whether the current character and the next line of characters need to be combined or not based on the size of the spacing and the average line spacing;
and the structured output module is used for formatting and outputting the extracted information item character data.
In some embodiments, the following technical scheme is adopted:
a terminal device comprising a processor and a computer-readable storage medium, the processor being configured to implement instructions; the computer readable storage medium is used for storing a plurality of instructions, and the instructions are suitable for being loaded by a processor and executing the OCR optimization method based on the character position information.
In some embodiments, the following technical scheme is adopted:
a computer-readable storage medium, wherein a plurality of instructions are stored, and the instructions are suitable for being loaded by a processor of a terminal device and executing the above OCR optimization method based on the character position information.
Compared with the prior art, the invention has the beneficial effects that:
(1) on the premise of ensuring the accuracy of the recognition result, the method can realize high-precision extraction of various characters only by scanning the picture once, and can obviously reduce the computational hardware cost and time cost realized by the OCR technology.
(2) The method of the invention does not need to provide information item coordinate position area information in the certificate template, so the technical requirements on the photo to be recognized of the user or the photographing process can be obviously reduced, the user experience is obviously improved, the OCR template can be adjusted to realize the high-efficiency extraction of the OCR of various certificate pictures such as business licenses, identity cards, business cards, driving licenses, house property cards and the like, and the method has very flexible expansibility.
Drawings
FIG. 1 is a flowchart of an OCR optimization method based on text position information according to an embodiment of the present invention;
FIG. 2 is a comparison of the extraction results of the present invention embodiment and the recognition results of the prior art method.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present application. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.
Example one
In one or more embodiments, an OCR optimization method based on text location information is disclosed, which is described by taking extracting text information in a license as an example, and with reference to fig. 1, the method specifically includes the following processes:
(1) setting constraint information items and extraction ranges to be extracted;
specifically, the constraint information item and the extraction range refer to which text information in the license that needs to be extracted, but the position information of the text does not need to be extracted; the technical requirements on the photo to be recognized or the photographing process of the user can be reduced. For example, the following information items are analyzed to determine that the company license needs to be extracted:
a V social credit code;
check mark's business scope;
v. French;
the date when the check mark is established;
check-in capital;
v. certificate number;
a hook address;
name of the unit of check-up;
a V type;
effective period of √
(2) Preprocessing an image to be recognized to acquire characters and position information in the image;
specifically, the process of preprocessing the image includes: graying, binaryzation, noise removal, inclination correction and the like;
after the preprocessing is finished, searching characters appearing in the picture line by line and recording coordinate positions by a near connected region searching method, intercepting the region where the characters are located, and identifying the characters to obtain all characters of the picture and coordinate position information of the characters; and extracting all character information in the picture according to lines by scanning the picture once.
The specific implementation process of the method of this step can be obtained by those skilled in the art according to the prior art, such as: the generic OCR recognition API of the hundredth cloud and will therefore not be described in detail.
(3) Calculating the average line spacing of all the character information, determining the spacing between the current character and the next character, and judging whether the current character and the next character need to be combined or not based on the size of the spacing and the average line spacing;
specifically, since the information items such as the registration address and the business scope are distributed in multiple lines when the content of the information items is large in the license, and the font and the line spacing are inconsistent with those of other text contents when the multiple lines are distributed, for example, the line spacing of the lines is generally significantly smaller than the text line spacing.
Therefore, in this embodiment, the average line spacing of the whole picture is calculated first, and the specific method is as follows:
traversing the coordinates of each line of characters of the extracted whole picture;
calculating the line spacing between the current line and the next line by using the coordinate values, and recording the number of lines;
accumulating and summing all the line intervals;
and fourthly, dividing the accumulated value of the line spacing by the number of the recording lines to obtain the average line spacing.
Correcting the maximum line spacing maxDiff of a plurality of lines of characters based on the average line spacing, wherein the maxDiff is used for identifying whether character information in a current line is merged and extracted: if the actual line spacing between the current line and the next line is smaller than maxDeff, combining and extracting the text information identified by the current line as the content of the previous information item, otherwise, considering the text information as the content of the new information item. Wherein, the correction coefficient is set to 0.5 in this example, which is half of the text average line spacing avDiff: maxDiff ═ avDiff 0.5.
Because some content in the business license may be displayed in multiple lines, the final information item text may need to merge multiple lines of records, and the stopping basis in merging records in the scheme is based on the line spacing: when the line spacing between the current line and the next line is obviously smaller than the average line spacing (the maximum line spacing of multiple lines of characters, maxDiff), the display is performed in multiple lines, the content extracted as the current information item needs to be merged, otherwise, the display is performed to indicate the content of the new information item, the merging is stopped, and the current characters are identified as the content of the new information item. The specific implementation process is as follows;
firstly, acquiring text information rawText of a camping range in a current line;
if the next line does not exist, directly returning the extracted characters rawText;
calculating the line spacing nextline Diff between the current line and the next line if the next line exists;
if nextlieDiff is less than maxDeff, combining the next line of text information into the rawText, and jumping to the step two;
if nextLineDiff > is maxDiff, return to the now merged and extracted rawText.
According to the process, each information item to be identified is extracted to obtain the identification data of all the information items in the whole picture, the process does not involve reprocessing of the original picture, and the process is very efficient.
(4) And formatting and outputting the extracted information item character data.
The information item data in the whole picture obtained in the previous step is formatted and output according to a desired output format, as seen in fig. 2, in this example, json format output is adopted.
Fig. 2 shows a comparison chart example of the recognition effect of the method of the present embodiment compared with the Baidu smart cloud recognition method, and as can be seen from fig. 2, the method of the present embodiment can accurately recognize and merge and output a plurality of lines of text information, and the recognition result is more accurate and is not affected by the photographed picture or the photographing technology.
Example two
In one or more implementations, a system for OCR optimization based on text position information is disclosed, comprising:
the template design module is used for setting constraint information items to be extracted and an extraction range;
the character position information identification module is used for preprocessing an image to be identified and acquiring characters and position information in the image;
the optimization processing module is used for calculating the average line spacing of all the character information, determining the spacing between the current character and the next line of characters, and judging whether the current character and the next line of characters need to be combined or not based on the size of the spacing and the average line spacing;
and the structured output module is used for formatting and outputting the extracted information item character data.
It should be noted that the specific implementation manner of the modules adopts the method disclosed in the first embodiment, and details are not described here.
EXAMPLE III
In one or more embodiments, a terminal device is disclosed, which includes a server including a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the OCR optimization method based on text position information in the first embodiment when executing the program. For brevity, no further description is provided herein.
It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include both read-only memory and random access memory, and may provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.
In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software.
The OCR optimization method based on the text position information in the first embodiment may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here.
Those of ordinary skill in the art will appreciate that the various illustrative elements, i.e., algorithm steps, described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.
Claims (10)
1. An OCR optimization method based on character position information is characterized by comprising the following steps:
setting constraint information items and extraction ranges to be extracted;
preprocessing an image to be recognized to acquire characters and position information in the image;
calculating the average line spacing of all the character information, determining the spacing between the current character and the next character, and judging whether the current character and the next character need to be combined or not based on the size of the spacing and the average line spacing;
and formatting and outputting the extracted information item character data.
2. An OCR optimization method based on character position information as claimed in claim 1, wherein the preprocessing of the image to be recognized includes: and carrying out preprocessing operations of graying, binaryzation, noise removal and inclination correction on the image.
3. An OCR optimization method based on character position information as claimed in claim 1, wherein setting a constraint information item to be extracted and an extraction range specifically includes: text information in the image needs to be extracted.
4. An OCR optimization method based on character position information as claimed in claim 1, characterized in that by the adjacent connected region search method, the characters appearing in the picture are searched line by line and the coordinate position is recorded, the region where the characters are located is intercepted, the characters are recognized, and all characters and the coordinate position information thereof are obtained.
5. An OCR optimization method based on text position information as claimed in claim 1, wherein calculating the average line spacing of all text information specifically includes:
traversing the coordinates of each line of characters in the extracted image, and calculating the line spacing between the current line and the next line based on the coordinate values;
accumulating and summing all the line intervals; an average line spacing is determined based on a ratio of the accumulated value to the number of lines.
6. The OCR optimization method based on the character position information as claimed in claim 1, wherein the step of judging whether the current character and the next character need to be merged based on the space and the average line space includes:
when the line spacing between the current line and the next line is smaller than the average line spacing, combining and extracting characters of the current line and the next line;
and stopping merging when the line spacing of the current line and the next line is larger than the average line spacing.
7. An OCR optimization method based on character position information as claimed in claim 1, characterized in that each constraint information item to be recognized is extracted, and the extracted information item character data is formatted and output.
8. An OCR optimization system based on text position information, comprising:
the template design module is used for setting constraint information items to be extracted and an extraction range;
the character position information identification module is used for preprocessing an image to be identified and acquiring characters and position information in the image;
the optimization processing module is used for calculating the average line spacing of all the character information, determining the spacing between the current character and the next line of characters, and judging whether the current character and the next line of characters need to be combined or not based on the size of the spacing and the average line spacing;
and the structured output module is used for formatting and outputting the extracted information item character data.
9. A terminal device comprising a processor and a computer-readable storage medium, the processor being configured to implement instructions; a computer readable storage medium for storing a plurality of instructions adapted to be loaded by a processor and to perform the method for OCR optimization based on text position information according to any one of claims 1-7.
10. A computer-readable storage medium having stored thereon a plurality of instructions, wherein the instructions are adapted to be loaded by a processor of a terminal device and to execute the OCR optimization method based on text position information according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011090602.2A CN112308057A (en) | 2020-10-13 | 2020-10-13 | OCR (optical character recognition) optimization method and system based on character position information |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011090602.2A CN112308057A (en) | 2020-10-13 | 2020-10-13 | OCR (optical character recognition) optimization method and system based on character position information |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112308057A true CN112308057A (en) | 2021-02-02 |
Family
ID=74488033
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011090602.2A Pending CN112308057A (en) | 2020-10-13 | 2020-10-13 | OCR (optical character recognition) optimization method and system based on character position information |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112308057A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114490697A (en) * | 2022-03-28 | 2022-05-13 | 山东国赢大数据产业有限公司 | Data cooperative processing method and device based on block chain |
CN117079282A (en) * | 2023-08-16 | 2023-11-17 | 读书郎教育科技有限公司 | Intelligent dictionary pen based on image processing |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103034856A (en) * | 2012-12-18 | 2013-04-10 | 深圳深讯和科技有限公司 | Method and device for locating text area in image |
CN108364009A (en) * | 2018-02-12 | 2018-08-03 | 掌阅科技股份有限公司 | Recognition methods, computing device and the computer storage media of two-dimensional structure formula |
CN110941972A (en) * | 2018-09-21 | 2020-03-31 | 广州金山移动科技有限公司 | Method and device for segmenting characters in PDF document and electronic equipment |
CN111368562A (en) * | 2020-02-28 | 2020-07-03 | 北京字节跳动网络技术有限公司 | Method and device for translating characters in picture, electronic equipment and storage medium |
-
2020
- 2020-10-13 CN CN202011090602.2A patent/CN112308057A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103034856A (en) * | 2012-12-18 | 2013-04-10 | 深圳深讯和科技有限公司 | Method and device for locating text area in image |
CN108364009A (en) * | 2018-02-12 | 2018-08-03 | 掌阅科技股份有限公司 | Recognition methods, computing device and the computer storage media of two-dimensional structure formula |
CN110941972A (en) * | 2018-09-21 | 2020-03-31 | 广州金山移动科技有限公司 | Method and device for segmenting characters in PDF document and electronic equipment |
CN111368562A (en) * | 2020-02-28 | 2020-07-03 | 北京字节跳动网络技术有限公司 | Method and device for translating characters in picture, electronic equipment and storage medium |
Non-Patent Citations (1)
Title |
---|
曾凡峰 等: "基于连通域的扭曲中文文本图像快速校正方法", 《计算机工程与设计》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114490697A (en) * | 2022-03-28 | 2022-05-13 | 山东国赢大数据产业有限公司 | Data cooperative processing method and device based on block chain |
CN117079282A (en) * | 2023-08-16 | 2023-11-17 | 读书郎教育科技有限公司 | Intelligent dictionary pen based on image processing |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109492643B (en) | Certificate identification method and device based on OCR, computer equipment and storage medium | |
US20210192202A1 (en) | Recognizing text in image data | |
CN110046529B (en) | Two-dimensional code identification method, device and equipment | |
US11164027B2 (en) | Deep learning based license plate identification method, device, equipment, and storage medium | |
WO2019238063A1 (en) | Text detection and analysis method and apparatus, and device | |
CN111476227B (en) | Target field identification method and device based on OCR and storage medium | |
WO2019169532A1 (en) | License plate recognition method and cloud system | |
CN108717543B (en) | Invoice identification method and device and computer storage medium | |
CN110263694A (en) | A kind of bank slip recognition method and device | |
WO2018233055A1 (en) | Method and apparatus for entering policy information, computer device and storage medium | |
US20190362193A1 (en) | Eyeglass positioning method, apparatus and storage medium | |
CN108846385B (en) | Image identification and correction method and device based on convolution-deconvolution neural network | |
US20210182547A1 (en) | Automated systems and methods for identifying fields and regions of interest within a document image | |
CN110598566A (en) | Image processing method, device, terminal and computer readable storage medium | |
CN112308057A (en) | OCR (optical character recognition) optimization method and system based on character position information | |
CN114092938B (en) | Image recognition processing method and device, electronic equipment and storage medium | |
CN112434690A (en) | Method, system and storage medium for automatically capturing and understanding elements of dynamically analyzing text image characteristic phenomena | |
US11574492B2 (en) | Efficient location and identification of documents in images | |
CN110781890A (en) | Identification card identification method and device, electronic equipment and readable storage medium | |
CN111160169A (en) | Face detection method, device, equipment and computer readable storage medium | |
CN111160395A (en) | Image recognition method and device, electronic equipment and storage medium | |
CN113158895A (en) | Bill identification method and device, electronic equipment and storage medium | |
CN112686257A (en) | Storefront character recognition method and system based on OCR | |
US11210507B2 (en) | Automated systems and methods for identifying fields and regions of interest within a document image | |
CN113557520A (en) | Character processing and character recognition method, storage medium and terminal device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210202 |