CN111210455A - Method and device for extracting preprinted information in image, medium and electronic equipment - Google Patents

Method and device for extracting preprinted information in image, medium and electronic equipment Download PDF

Info

Publication number
CN111210455A
CN111210455A CN201911268302.6A CN201911268302A CN111210455A CN 111210455 A CN111210455 A CN 111210455A CN 201911268302 A CN201911268302 A CN 201911268302A CN 111210455 A CN111210455 A CN 111210455A
Authority
CN
China
Prior art keywords
image
information
processed
preset
binary image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911268302.6A
Other languages
Chinese (zh)
Other versions
CN111210455B (en
Inventor
马文伟
王亚领
刘设伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taikang Insurance Group Co Ltd
Taikang Online Property Insurance Co Ltd
Original Assignee
Taikang Insurance Group Co Ltd
Taikang Online Property Insurance Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taikang Insurance Group Co Ltd, Taikang Online Property Insurance Co Ltd filed Critical Taikang Insurance Group Co Ltd
Priority to CN201911268302.6A priority Critical patent/CN111210455B/en
Publication of CN111210455A publication Critical patent/CN111210455A/en
Application granted granted Critical
Publication of CN111210455B publication Critical patent/CN111210455B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure relates to the field of image processing technologies, and in particular, to a method and an apparatus for extracting pre-print information in an image, a medium, and an electronic device, where the method for extracting pre-print information includes: converting an image to be processed into a gray image and segmenting the gray image according to a preset two-stage segmentation threshold value to obtain a corresponding binary image G; converting the image to be processed into Lab color space, and segmenting the image to be processed according to the sum S of the components a and b of the image to be processed and a preset S threshold value to obtain a binary image Sy(ii) a From the binary image G0And binary image SyDetermining the number of the information types corresponding to the image to be processed; and selecting a corresponding preset method according to the number so as to extract preprinting information in the image to be processed. The technical scheme of the embodiment of the disclosure can accurately judge the number of the information categories in the image to be processed, and thenDifferent extraction methods are selected according to the number, the preprinting information in the image to be processed is accurately extracted, and the problem that the extraction of the preprinting information is transitional or insufficient is avoided.

Description

Method and device for extracting preprinted information in image, medium and electronic equipment
Technical Field
The present disclosure relates to the field of image processing technologies, and in particular, to a method and an apparatus for extracting pre-print information in an image, a computer-readable storage medium, and an electronic device.
Background
An Optical Character Recognition (OCR) system is a system that can convert an image file into a text format, and is widely applied to aspects of certificate information acquisition, certificate information entry, and the like.
When the certificate information is recorded, in order to distinguish the preprinting information and the printing information in the certificate picture, the conventional OCR system usually sets the experience color value of the corresponding preprinting information of the certificate through manual work, and then extracts the preprinting information according to the experience color value so as to distinguish the preprinting information and the printing information. However, since the pre-printed information is printed in different ways, the empirical color value often deviates from the true certificate, and therefore, the extraction of the pre-printed information according to the empirical color value often causes problems of extraction transition or insufficient extraction.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The present disclosure is directed to a method for extracting pre-print information from an image, an apparatus for extracting pre-print information from an image, a computer-readable storage medium, and an electronic device, so as to overcome the problem of transient or insufficient extraction of pre-print information at least to some extent.
Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.
According to a first aspect of the present disclosure, there is provided a method for extracting pre-print information in an image, including:
converting an image to be processed into a gray image, and segmenting the gray image according to a preset two-stage segmentation threshold value to obtain a corresponding binary image G; the preset two-stage segmentation threshold is a gray level threshold, and comprises a preset first segmentation threshold, and the binary image G comprises a binary image G corresponding to the preset first segmentation threshold0(ii) a And
converting the image to be processed into Lab color space, and segmenting the image to be processed according to the sum S of the components a and b of the image to be processed and a preset S threshold value to obtain a binary image Sy
According to the binary image G0And the binary image SyDetermining the number of the information types corresponding to the image to be processed;
and selecting a corresponding preset method according to the number to extract preprinting information in the image to be processed.
In an exemplary embodiment of the disclosure, based on the foregoing scheme, the binary image G is obtained0And the binary image SyDetermining the number of the information types corresponding to the image to be processed, including:
calculating the binary image G0And the binary image SyThe overlapped part and the binary image G0The coincidence ratio of (2);
and determining the number of information types corresponding to the image to be processed according to the coincidence rate.
In an exemplary embodiment of the present disclosure, based on the foregoing scheme, the determining, according to the coincidence ratio, the number of information types corresponding to the image to be processed includes:
if the coincidence rate is greater than a preset judgment threshold value, judging that the number of the information types corresponding to the image to be processed is 2;
and if the coincidence rate is less than or equal to a preset judgment threshold value, judging that the number of the information types corresponding to the image to be processed is 3.
In an exemplary embodiment of the present disclosure, based on the foregoing scheme, the selecting, according to the number, a corresponding preset method to extract pre-print information in the image to be processed includes:
if the number of the information types corresponding to the image to be processed is 2, selecting a first preset method to extract preprinted information in the image to be processed;
and if the number of the information types corresponding to the image to be processed is 3, selecting a second preset method to extract preprinted information in the image to be processed.
In an exemplary embodiment of the present disclosure, based on the foregoing scheme, the first preset method includes:
segmenting the gray level image according to a preset first-level segmentation threshold value to extract preprinting information in the image to be processed; wherein the preset primary segmentation threshold is a gray threshold.
In an exemplary embodiment of the present disclosure, based on the foregoing scheme, the preset two-stage segmentation threshold includes a preset second segmentation threshold;
the binary image G comprises a binary image G corresponding to a preset second segmentation threshold value1
The second preset method comprises the following steps:
calculating the binary image SyAnd the binary image G1To extract pre-print information in the image to be processed.
In an exemplary embodiment of the present disclosure, based on the foregoing scheme, the binary image S is calculated in the step of calculatingyAnd the binary image G1After the overlap area a, the method further comprises:
respectively acquiring binary images G0In the preset information represents M0And the corresponding preset information in the overlapping area A represents MAThe position of (a);
representing M according to the preset information0And said preset information represents MATo the binary image G0Performing translation to make the preset information represent M0And said preset information represents MAAre matched with each other.
According to a second aspect of the present disclosure, there is provided an apparatus for extracting pre-print information in an image, comprising:
the first segmentation module is used for converting an image to be processed into a gray image and segmenting the gray image according to a preset two-stage segmentation threshold value to obtain a corresponding binary image G; the preset two-stage segmentation threshold comprises a preset first segmentation threshold, and the binary image G comprises a binary image G corresponding to the preset first segmentation threshold0
A second segmentation module, configured to convert the image to be processed into a Lab color space, and segment the image to be processed according to a sum S of components a and b of the image to be processed and a preset S threshold to obtain a binary image Sy
A quantity determination module for determining the quantity of the binary image G0And the binary image SyDetermining the number of the information types corresponding to the image to be processed;
and the information extraction module is used for selecting a corresponding preset method according to the number to extract the preprinting information in the image to be processed.
According to a third aspect of the present disclosure, there is provided a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the method of extracting pre-printed information in an image as described in the first aspect of the embodiments above.
According to a fourth aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including:
a processor; and
a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method of extracting pre-printed information in an image as described in the first aspect of the embodiments above.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:
an implement of this disclosureIn the method for extracting the pre-printed information in the image provided by the embodiment, the image to be processed is converted into the gray image, and the gray image is segmented according to the preset two-stage segmentation threshold value to obtain the corresponding binary image G0Simultaneously converting the image to be processed into Lab color space, and segmenting the image to be processed according to the sum S of the components a and b of the image to be processed and a preset S threshold value to obtain a binary image SyThen from said binary image G0And the binary image SyAnd determining the number of the information types corresponding to the image to be processed, and selecting a corresponding preset method according to the number to extract the image to be processed so as to obtain preprinting information. According to the technical scheme, the number of the information categories in the image to be processed can be accurately judged by combining the gray value of the image to be processed and the sum S of the ab components, so that different extraction methods are selected according to the number, the preprinted information in the image to be processed is accurately extracted, and the problem of extraction transition or extraction shortage of the preprinted information is avoided.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty. In the drawings:
FIG. 1 schematically illustrates a flow chart of a method for extracting pre-print information in an image in an exemplary embodiment of the disclosure;
FIG. 2 schematically illustrates a binary image G according to the present disclosure in an exemplary embodiment of the present disclosure0And the binary image SyA flow chart of a method for determining the number of information types corresponding to the image to be processed;
fig. 3 is a flowchart schematically illustrating a method for determining the number of information types corresponding to the to-be-processed image according to the coincidence ratio in the exemplary embodiment of the disclosure;
fig. 4 is a flowchart schematically illustrating a method for extracting pre-print information in the image to be processed according to the number of pre-print information extracted by selecting a corresponding preset method according to the number in the exemplary embodiment of the disclosure;
FIG. 5 schematically illustrates a flow chart of a method of processing an image in an exemplary embodiment of the disclosure;
fig. 6 schematically shows a result diagram obtained when pre-print information extraction is performed on a to-be-processed image with 2 kinds of information in an exemplary embodiment of the present disclosure;
FIG. 7 is a diagram schematically illustrating the result of pre-print information extraction on a to-be-processed image with 3 kinds of information according to an exemplary embodiment of the present disclosure;
FIG. 8 is a schematic diagram illustrating a composition of an apparatus for extracting pre-printed information from an image according to an exemplary embodiment of the disclosure;
FIG. 9 is a schematic diagram illustrating an exemplary configuration of an apparatus for extracting pre-printed information from an image according to another exemplary embodiment of the present disclosure;
FIG. 10 schematically illustrates a structural diagram of a computer system suitable for use with an electronic device that implements an exemplary embodiment of the present disclosure;
fig. 11 schematically illustrates a schematic diagram of a computer-readable storage medium, according to some embodiments of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
In the exemplary embodiment, firstly, an extraction method of preprinted information in an image is provided, which can be applied to a process of extracting specific information in an image to be processed, for example, for an OCR system, can be used for extracting preprinted information in a bill from a bill image. Referring to fig. 1, the method for extracting the preprinted information in the image may include the following steps:
s110, converting an image to be processed into a gray image, and segmenting the gray image according to a preset two-stage segmentation threshold value to obtain a corresponding binary image G; the preset two-stage segmentation threshold is a gray level threshold, and comprises a preset first segmentation threshold, and the binary image G comprises a binary image G corresponding to the preset first segmentation threshold0
S120, converting the image to be processed into a Lab color space, and segmenting the image to be processed according to the sum S of the components a and b of the image to be processed and a preset S threshold value to obtain a binary image Sy
S130, according to the binary image G0And the binary image SyDetermining the number of the information types corresponding to the image to be processed;
and S140, selecting a corresponding preset method according to the number to extract preprinting information in the image to be processed.
According to the method for extracting the pre-printed information in the image provided in the exemplary embodiment, the number of the information categories in the image to be processed can be accurately judged by adopting a method for determining the number of the information categories in the image to be processed by combining the gray value of the image to be processed and the sum S of the ab components, so that different extraction methods are selected according to the number, the pre-printed information in the image to be processed is accurately extracted, and the problem of extraction transition or extraction shortage of the pre-printed information is avoided.
Hereinafter, the steps of the extraction method of the preprinted information in the image in the present exemplary embodiment will be described in more detail with reference to the drawings and the embodiments.
The image to be processed such as the voucher type image of a bill and the like can be processed in two situations, wherein one situation is that the image only comprises background information and preprinting information, and the other situation is that the background information, the preprinting information and the printing information exist in the image at the same time. When the types of the information included in the image to be processed are different, the corresponding methods for extracting the information are also different, so that the types of the information in the image to be processed can be judged firstly, and then the preprinted information can be further extracted according to the types of the information.
Step S110, converting the image to be processed into a gray image, and segmenting the gray image according to a preset two-stage segmentation threshold value to obtain a corresponding binary image G.
In an example embodiment of the present disclosure, the preset two-level division threshold refers to a grayscale threshold, and specifically includes a preset first division threshold. The preset first segmentation threshold value can be set in a user-defined mode according to the difference between colors corresponding to different kinds of information in different images to be processed, and corresponds to the gray value range corresponding to the class of information with the highest gray value in the images to be processed, and the specific set range is not limited by the disclosure. After the image to be processed is converted into the gray image according to the conversion rule of the image, the gray value of each pixel point can be obtained. The pixel points corresponding to the information of the class with the highest gray value in the image to be processed can be extracted by presetting the first segmentation threshold value to form a corresponding binary image G0
For example, in the to-be-processed image corresponding to each bill, the colors corresponding to the background, the preprinting information and the printing information are all different, so after the to-be-processed image is converted into a gray image, the gray values of the background information, the preprinting information and the printing information are different to a certain extent. ByIn a common bill, the gray value of preprinted information or printed information is the lowest, a type of pixel points with the lowest gray value in an image to be processed can be extracted by setting a preset first segmentation threshold, and a binary image G is obtained by tentative segmentation0To extract preprinted information or printed information and further confirm the binary image G0Whether preprinted or printed information is included. The pixel points are grouped according to the gray value of each pixel point in the gray image corresponding to the image to be processed, so that the pixel points which are required by a user and are within the range of the preset two-stage segmentation threshold value can be extracted, and the tentative extraction of the preprinted information in the image to be processed is realized.
Step S120, converting the image to be processed into Lab color space, and segmenting the image to be processed according to the sum S of the components a and b of the image to be processed and a preset S threshold value to obtain a binary image Sy
In an example embodiment of the present disclosure, the preset S threshold is set by self-definition according to a difference between S values corresponding to information and a background in the image to be processed, and the range to which the S value corresponding to a pixel point having information in the image to be processed belongs can be obtained by extracting a pixel point having information in the image to be processed to obtain a binary image Sy. After the image to be processed is converted into a Lab color space, the Lab value of each pixel point in the image to be processed can be obtained, according to the characteristics of the Lab color space, the sum S of the a and b components is taken as the basis for grouping the pixel points, the sum S of the a and b components is grouped according to a preset S threshold, and a binary image formed by the pixel points of which the sum of the a and b components is within the range of the preset S threshold is the binary image Sy
For example, in the image to be processed of the bill, the background information is generally gray of the paper, and the printing information is generally black ink mark, so that the preprinting information can be distinguished from the background information and the printing information according to the preset S threshold corresponding to the sum S of the a and b components in the Lab color space, and the binary image S corresponding to the preprinting information is obtainedy. Through the characteristic of Lab color space, the pixel points close to gray in the image to be processed can be separated from the pixel points of other colors, so that the real color image processing method is practicalThe preprinted information is now separate from the background information and the printed information.
Step S130, according to the binary image G0And the binary image SyAnd determining the number of the information types corresponding to the image to be processed.
In an example embodiment of the present disclosure, the binary image G is generated according to0And the binary image SyDetermining the number of the information types corresponding to the image to be processed, as shown in fig. 2, includes the following steps S210 to S220:
step S210, calculating the binary image G0And the binary image SyThe overlapped part and the binary image G0The overlapping ratio of (3).
Step S220, determining the number of the information types corresponding to the image to be processed according to the coincidence rate.
In an exemplary embodiment of the present disclosure, the binary image G obtained in step S110 is used0Is composed of pixels with the lowest gray value, may correspond to preprinting information in the image to be processed, and may also correspond to printing information in the image to be processed, and at this time, the binary image S can be usedyAnd binary image G0The overlapped part is in the binary image G0The coincidence ratio in (2) is used to judge the number of information types contained in the image to be processed. Specifically, when 2 kinds of information including background information and preprinting information are included in the image to be processed, the binary image G0The corresponding information is preprinted information, and the binary image SyThe corresponding information is also pre-printed information, so that the two are overlapped in the binary image G0The coincidence ratio in (1) is large; when the image to be processed contains 3 kinds of information of background information, preprinting information and printing information, the binary image G0The corresponding information is print information, and thus corresponds to the binary image SyThe overlapped part is in the binary image G0The overlapping ratio in (1) is small. From the above analysis, it can be known that the binary image S can be obtainedyAnd binary image G0The overlapped part is in the binary image G0The size of the overlap ratio in (2) is determined based on the number of information types included in the image to be processed.
By analyzing the binary image obtained by segmenting the image according to the gray value and the characteristic of the sum of the a and b components of the Lab color space, the type of the information contained in the image to be processed can be accurately judged, and then accurate segmentation is carried out to obtain accurate preprinting information, so that extraction transition or extraction deficiency caused by interference of other information on the preprinting information when the information in the image to be processed is extracted according to the characteristic of the empirical color is avoided.
Preferably, different preset discrimination thresholds can be customized for different images to be processed. At this time, the number of information types corresponding to the image to be processed is determined according to the overlapping ratio, and as shown in fig. 3, the method includes the following steps S310 to S320:
step S310, if the coincidence rate is greater than a preset determination threshold, determining that the number of the information types corresponding to the to-be-processed image is 2.
Step S320, if the coincidence rate is less than or equal to a preset determination threshold, determining that the number of the information types corresponding to the image to be processed is 3.
In an exemplary embodiment of the disclosure, different preset discrimination thresholds are respectively customized for different types of images to be processed, and the number of information types corresponding to the images to be processed can be determined according to the size relationship between the coincidence rate calculated in step S210 and the preset discrimination threshold. When the coincidence rate is greater than a preset judging threshold value, the number of the information types corresponding to the image to be processed can be judged to be 2, namely preprinting information and background information; when the coincidence rate is less than or equal to the preset discrimination threshold, the number of the information types corresponding to the image to be processed can be judged to be 3, namely preprinting information, printing information and background information.
And step S140, selecting a corresponding preset method according to the number to extract preprinting information in the image to be processed.
In an example embodiment of the present disclosure, when the types of information corresponding to the images to be processed are different, there are different ways to extract the pre-print information in the images to be processed, and therefore, the selecting the corresponding preset method according to the number to extract the pre-print information in the images to be processed is shown in fig. 4, and includes the following steps S410 to S420:
step S410, if the number of the information types corresponding to the image to be processed is 2, selecting a first preset method to extract the pre-printed information in the image to be processed.
In an example embodiment of the present disclosure, when the number of information types corresponding to an image to be processed is 2, that is, when only pre-print information and background information are included in the image to be processed, a first preset method may be selected to extract the pre-print information in the image to be processed, and specifically, the first preset method may include: and segmenting the gray level image according to a preset first-level segmentation threshold value so as to extract preprinting information in the image to be processed.
In an example embodiment of the present disclosure, the preset one-level division threshold is a grayscale threshold. When the image to be processed only comprises the preprinting information and the background information, the gray image corresponding to the image to be processed only comprises two gray values, so that pixel points corresponding to the two gray values can be grouped according to a preset primary segmentation threshold value to respectively form binary images corresponding to the preprinting information and the background information, and then the preprinting information is extracted from the binary images corresponding to the preprinting information. After the image to be processed is determined to only include the preprinting information and the background information, the pixel points including the preprinting information and the background information in the image to be processed can be accurately distinguished according to the preset first-level segmentation threshold value by taking the gray value as the basis, and therefore the purpose of extracting the preprinting information in the image to be processed is achieved. In addition, because the pixel points including the background information are distinguished at the same time, the background information in the image to be processed can be extracted.
In step S420, if the number of the information types corresponding to the image to be processed is 3, selecting a second preset method to extract the pre-printed information in the image to be processed.
In an example embodiment of the present disclosure, when the preset two-stage segmentation threshold includes a preset second segmentation threshold, the binary image G includes a binary image G corresponding to the preset second segmentation threshold1. At this time, the number of the information types corresponding to the image to be processed is 3, that is, the image to be processed includes preprinting information and printing informationWhen information and background information are printed, the second preset method comprises the following steps: calculating the binary image SyAnd the binary image G1To extract pre-print information in the image to be processed.
In an example embodiment of the present disclosure, a grayscale threshold corresponding to the preset second division threshold is greater than a grayscale threshold corresponding to the preset first division threshold. When the image to be processed comprises three kinds of information, the gray values of the three kinds of information are respectively printing information, preprinting information and background information from low to high, so that the binary image G corresponding to the preset second segmentation threshold can be calculated1And a binary image S extracted in Lab color spaceyTo extract the preprinted information in the image to be processed. By extracting the preprinting information by the method for calculating the overlapping area, the problems of insufficient extraction or transitional extraction of the printing information on the extraction result when the preprinting information is extracted by single segmentation can be avoided. In addition, since the overlapping area a is determined to be an image area composed of pixel points containing preprinting information, the remaining area of the image to be processed only includes the printing information and the background information 2, so that the printing information and the background information can be distinguished according to a first preset method, and then the printing information and the background information in the image to be processed are respectively extracted.
The following takes 2 kinds of information and 3 kinds of information respectively existing in the image to be processed corresponding to the bill as an example, and details of implementation of the technical scheme of the embodiment of the present disclosure are set forth in detail:
1. there are 2 kinds of information, preprinted information and background information in the image to be processed.
Referring to fig. 6, 6(a) is an image to be processed; converting 6(a) into a gray image, and segmenting pixel points according to the gray value, referring to 6(b), wherein a region formed by pixel points with the same color as 610 is a binary image G0(ii) a Dividing the image to be processed in Lab color space, and referring to 6(c), wherein the region formed by the pixel points with the same color as 620 is a binary image Sy(ii) a Calculating a binary image G0And binary image SyThe overlapped part and the binary image G0The coincidence rate of the image to be processed is larger than a preset discrimination threshold value of 0.6, so that the image to be processed is judged to only comprise preprinting information and background information; the pre-print information in the image to be processed can be extracted by segmenting the gray image obtained by converting the image in the step 6(a), and a region formed by pixel points with the same color as white at the position 630 in the step 6(d) is referred to.
2. There are 3 kinds of information, preprinting information, printing information and background information in the image to be processed.
Referring to fig. 7, 7(a) is an image to be processed; converting 7(a) into a gray image, and segmenting pixel points according to the gray value, wherein a region formed by the pixel points with the same color as 710 is a binary image G as shown in 7(b)0The region composed of pixels with the same color as the 720 pixels is a binary image G1(ii) a Dividing the image to be processed in Lab color space, and referring to the table 7(c), wherein the region composed of the pixel points with the same color as that of 730 is a binary image Sy(ii) a Calculating a binary image G0And binary image SyThe overlapped part and the binary image G0The coincidence rate of the image to be processed is less than a preset discrimination threshold value of 0.6, so that the image to be processed is judged to only comprise preprinting information, printing information and background information; by obtaining a binary image G1And binary image SyThe overlapping area a of (a) may extract pre-print information in the image to be processed, referring to an area formed by pixel points with the same color as white at position 740 in fig. 7 (d).
Further, in the calculating the binary image SyAnd the binary image G1After the overlapping area a, the printing information and the preprinting information can be processed on the basis of respectively extracting the printing information and the preprinting information, so that the positions of the information corresponding to the contents in the printing information and the preprinting information are matched with each other, and the structured output of the data on the image to be processed is facilitated. Referring to fig. 5, the method includes the following steps S510 to S520:
step S510, respectively obtaining binary images G0In the preset information represents M0And the corresponding preset information in the overlapping area A represents MAThe position of (a).
Step S520, representing M according to preset information0And the preset information represents MATo the binary image G0Performing translation to make the preset information represent M0And the preset information represents MAIs matched.
In an example embodiment of the present disclosure, the binary image G0The sum-superposition area A is the binary image G obtained according to the above steps0And an overlap region a. When the information in the image to be processed is extracted by adopting a second preset method, the binary image G0And the overlapping area A includes print information and preprinting information, respectively, in this case, the binary image G0Selecting any one preset information representative M0And acquiring the position of the image in the image to be processed; simultaneous acquisition and presetting of information representative M0Is matched with each other, and the preset information represents MAPosition in the image to be processed. The position of each information in the printing information and the preprinting information of the bill is relatively fixed, and the contents of each information are mutually corresponding, so that the binary image G is processed0Performing translation so that the preset information represents M0And the matched preset information represents MAThe positions of the printing information and the preset information are matched with each other by matching.
Based on the basis of extracting the print information and the preprinting information respectively, the binary image G is processed by the position relation represented by the preset information matched with each other in the print information and the preprinting information0And the translation is carried out, so that the dislocation of the printing information and the preprinting information in the image to be processed can be corrected, the positions of the information corresponding to each content in the printing information and the preprinting information are matched with each other, and the structured output of the information in the image to be processed is facilitated.
It is noted that the above-mentioned figures are merely schematic illustrations of processes involved in methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
In addition, in an exemplary embodiment of the disclosure, an extraction device of preprinted information in an image is also provided. Referring to fig. 8, the apparatus 800 for extracting pre-printed information from an image includes: a first segmentation module 810, a second segmentation module 820, a number determination module 830 and an information extraction module 840.
The first segmentation module 810 may be configured to convert an image to be processed into a grayscale image, and segment the grayscale image according to a preset two-stage segmentation threshold to obtain a corresponding binary image G; the preset two-stage segmentation threshold comprises a preset first segmentation threshold, and the binary image G comprises a binary image G corresponding to the preset first segmentation threshold0
The second segmentation module 820 may be configured to convert the image to be processed into a Lab color space, and segment the image to be processed according to a sum S of a and b components of the image to be processed and a preset S threshold to obtain a binary image Sy
The number determination module 830 may be configured to determine the number of binary images G according to the binary image0And the binary image SyDetermining the number of the information types corresponding to the image to be processed;
the information extraction module 840 may be configured to select a corresponding preset method according to the number to extract the pre-print information in the image to be processed.
In an exemplary embodiment of the disclosure, based on the foregoing scheme, the number determination module 830 may be configured to calculate the binary image G0And the binary image SyThe overlapped part and the binary image G0The coincidence ratio of (2); and determining the number of information types corresponding to the image to be processed according to the coincidence rate.
In an exemplary embodiment of the present disclosure, based on the foregoing solution, the number determining module 830 may be configured to determine that the number of information types corresponding to the to-be-processed image is 2 when the coincidence rate is greater than a preset determination threshold; and when the coincidence rate is less than or equal to a preset discrimination threshold, judging that the number of the information types corresponding to the image to be processed is 3.
In an exemplary embodiment of the present disclosure, based on the foregoing solution, the information extraction module 840 may be configured to select a first preset method to extract pre-printed information in the image to be processed when the number of information types corresponding to the image to be processed is 2; and when the number of the information types corresponding to the image to be processed is 3, selecting a second preset method to extract preprinting information in the image to be processed.
In an exemplary embodiment of the present disclosure, based on the foregoing solution, the information extraction module 840 may be configured to segment the grayscale image according to a preset one-level segmentation threshold to extract pre-print information in the image to be processed; wherein the preset primary segmentation threshold is a gray threshold.
In an exemplary embodiment of the disclosure, based on the foregoing scheme, the information extraction module 840 may be configured to calculate the binary image SyAnd the binary image G1To extract pre-print information in the image to be processed.
In an exemplary embodiment of the present disclosure, based on the foregoing solution, referring to fig. 9, the apparatus 800 for extracting pre-print information in an image further includes an image processing module 850, which can be used to respectively obtain binary images G0In the preset information represents M0Position of (3) and binary image G1In the preset information represents MAThe position of (a); representing M according to the preset information0And said preset information represents MATo the binary image G0Performing translation to make the preset information represent M0And said preset information represents MAAre matched with each other.
For details which are not disclosed in the embodiment of the apparatus of the present disclosure, please refer to the embodiment of the method for extracting pre-print information in an image described above in relation to the embodiment of the apparatus of the present disclosure for details which are not disclosed in the embodiment of the present disclosure.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
In addition, in an exemplary embodiment of the present disclosure, an electronic device capable of implementing the extraction method of the pre-printed information in the image is also provided.
As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system.
An electronic device 1000 according to such an embodiment of the present disclosure is described below with reference to fig. 10. The electronic device 1000 shown in fig. 10 is only an example and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 10, the electronic device 1000 is embodied in the form of a general purpose computing device. The components of the electronic device 1000 may include, but are not limited to: the at least one processing unit 1010, the at least one memory unit 1020, a bus 1030 connecting different system components (including the memory unit 1020 and the processing unit 1010), and a display unit 1040.
Wherein the storage unit stores program code that is executable by the processing unit 1010 to cause the processing unit 1010 to perform steps according to various exemplary embodiments of the present disclosure described in the "exemplary methods" section above in this specification. For example, the processing unit 1010 may execute step S110 as shown in fig. 1: converting an image to be processed into a gray image, and processing the gray image according to a preset two-stage segmentation threshold valueObtaining a corresponding binary image G by line segmentation; the preset two-stage segmentation threshold is a gray level threshold, and comprises a preset first segmentation threshold, and the binary image G comprises a binary image G corresponding to the preset first segmentation threshold0(ii) a S120: converting the image to be processed into Lab color space, and segmenting the image to be processed according to the sum S of the components a and b of the image to be processed and a preset S threshold value to obtain a binary image Sy(ii) a S130: according to the binary image G0And the binary image SyDetermining the number of the information types corresponding to the image to be processed; s140: and selecting a corresponding preset method according to the number to extract preprinting information in the image to be processed.
As another example, the electronic device may implement the steps shown in fig. 2 to 5.
The memory unit 1020 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)1021 and/or a cache memory unit 1022, and may further include a read-only memory unit (ROM) 1023.
Storage unit 1020 may also include a program/utility 1024 having a set (at least one) of program modules 1025, such program modules 1025 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 1030 may be any one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, and a local bus using any of a variety of bus architectures.
The electronic device 1000 may also communicate with one or more external devices 1070 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 1000, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 1000 to communicate with one or more other computing devices. Such communication may occur through input/output (I/O) interfaces 1050. Also, the electronic device 1000 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet) via the network adapter 1060. As shown, the network adapter 1060 communicates with the other modules of the electronic device 1000 over the bus 1030. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the electronic device 1000, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.
In an exemplary embodiment of the present disclosure, there is also provided a computer-readable storage medium having stored thereon a program product capable of implementing the above-described method of the present specification. In some possible embodiments, aspects of the present disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to perform the steps according to various exemplary embodiments of the present disclosure described in the "exemplary methods" section above of this specification, when the program product is run on the terminal device.
Referring to fig. 11, a program product 1100 for implementing the above method according to an embodiment of the present disclosure is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
Furthermore, the above-described figures are merely schematic illustrations of processes included in methods according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the terms of the appended claims.

Claims (10)

1. A method for extracting preprinted information in an image is characterized by comprising the following steps:
converting an image to be processed into a gray image, and segmenting the gray image according to a preset two-stage segmentation threshold value to obtain a corresponding binary image G; the preset two-stage segmentation threshold is a gray level threshold, and comprises a preset first segmentation threshold, and the binary image G comprises a binary image G corresponding to the preset first segmentation threshold0(ii) a And
converting the image to be processed into Lab color space, and segmenting the image to be processed according to the sum S of the components a and b of the image to be processed and a preset S threshold value to obtain a binary image Sy
According to the binary image G0And the binary image SyDetermining the number of the information types corresponding to the image to be processed;
and selecting a corresponding preset method according to the number to extract preprinting information in the image to be processed.
2. The method of claim 1, wherein the image is from the binary image G0And the binary image SyDetermining the number of the information types corresponding to the image to be processed, including:
calculating the binary image G0And the binary image SyThe overlapped part and the binary image G0The coincidence ratio of (2);
and determining the number of information types corresponding to the image to be processed according to the coincidence rate.
3. The method according to claim 2, wherein the determining the number of information types corresponding to the image to be processed according to the coincidence ratio comprises:
if the coincidence rate is greater than a preset judgment threshold value, judging that the number of the information types corresponding to the image to be processed is 2;
and if the coincidence rate is less than or equal to a preset judgment threshold value, judging that the number of the information types corresponding to the image to be processed is 3.
4. The method according to claim 3, wherein the selecting the corresponding preset method according to the number to extract the pre-print information in the image to be processed comprises:
if the number of the information types corresponding to the image to be processed is 2, selecting a first preset method to extract preprinted information in the image to be processed;
and if the number of the information types corresponding to the image to be processed is 3, selecting a second preset method to extract preprinted information in the image to be processed.
5. The method according to claim 4, wherein the first preset method comprises:
segmenting the gray level image according to a preset first-level segmentation threshold value to extract preprinting information in the image to be processed; wherein the preset primary segmentation threshold is a gray threshold.
6. The method of claim 4, wherein the preset two-level segmentation threshold comprises a preset second segmentation threshold;
the binary image G comprises a binary image G corresponding to a preset second segmentation threshold value1
The second preset method comprises the following steps:
calculating the binary image SyAnd the binary image G1To extract pre-print information in the image to be processed.
7. The method according to claim 6, wherein said computing said binary image SyAnd the binary image G1After the overlap area a, the method further comprises:
respectively acquiring binary images G0In the preset information represents M0And the corresponding preset information in the overlapping area A represents MAThe position of (a);
representing M according to the preset information0And said preset information represents MATo the binary image G0Performing translation to make the preset information represent M0And said preset information represents MAAre matched with each other.
8. An apparatus for extracting pre-printed information from an image, comprising:
the first segmentation module is used for converting an image to be processed into a gray image and segmenting the gray image according to a preset two-stage segmentation threshold value to obtain a corresponding binary image G; the preset two-stage segmentation threshold comprises a preset first segmentation threshold, and the binary image G comprises a binary image G corresponding to the preset first segmentation threshold0
A second segmentation module, configured to convert the image to be processed into a Lab color space, and segment the image to be processed according to a sum S of components a and b of the image to be processed and a preset S threshold to obtain a binary image Sy
A quantity determination module for determining the quantity of the binary image G0And the binary image SyDetermining the number of the information types corresponding to the image to be processed;
and the information extraction module is used for selecting a corresponding preset method according to the number to extract the preprinting information in the image to be processed.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements a method of extracting pre-printed information in an image according to any one of claims 1 to 7.
10. An electronic device, comprising:
a processor; and
memory for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method of extracting pre-printed information in an image as claimed in any one of claims 1 to 7.
CN201911268302.6A 2019-12-11 2019-12-11 Method and device for extracting preprinted information in image, medium and electronic equipment Active CN111210455B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911268302.6A CN111210455B (en) 2019-12-11 2019-12-11 Method and device for extracting preprinted information in image, medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911268302.6A CN111210455B (en) 2019-12-11 2019-12-11 Method and device for extracting preprinted information in image, medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN111210455A true CN111210455A (en) 2020-05-29
CN111210455B CN111210455B (en) 2023-08-01

Family

ID=70789259

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911268302.6A Active CN111210455B (en) 2019-12-11 2019-12-11 Method and device for extracting preprinted information in image, medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN111210455B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030231785A1 (en) * 1993-11-18 2003-12-18 Rhoads Geoffrey B. Watermark embedder and reader
US20140119593A1 (en) * 2012-10-29 2014-05-01 Digimarc Corporation Determining pose for use with digital watermarking, fingerprinting and augmented reality
CN104574405A (en) * 2015-01-15 2015-04-29 北京天航华创科技股份有限公司 Color image threshold segmentation method based on Lab space
CN105120167A (en) * 2015-08-31 2015-12-02 广州市幸福网络技术有限公司 Certificate picture camera and certificate picture photographing method
CN108596916A (en) * 2018-04-16 2018-09-28 深圳市联软科技股份有限公司 Watermark recognition methods, system, terminal and medium similar in a kind of color
CN109255355A (en) * 2018-05-28 2019-01-22 北京京东尚科信息技术有限公司 Image processing method, device, terminal, electronic equipment and computer-readable medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030231785A1 (en) * 1993-11-18 2003-12-18 Rhoads Geoffrey B. Watermark embedder and reader
US20140119593A1 (en) * 2012-10-29 2014-05-01 Digimarc Corporation Determining pose for use with digital watermarking, fingerprinting and augmented reality
CN104574405A (en) * 2015-01-15 2015-04-29 北京天航华创科技股份有限公司 Color image threshold segmentation method based on Lab space
CN105120167A (en) * 2015-08-31 2015-12-02 广州市幸福网络技术有限公司 Certificate picture camera and certificate picture photographing method
CN108596916A (en) * 2018-04-16 2018-09-28 深圳市联软科技股份有限公司 Watermark recognition methods, system, terminal and medium similar in a kind of color
CN109255355A (en) * 2018-05-28 2019-01-22 北京京东尚科信息技术有限公司 Image processing method, device, terminal, electronic equipment and computer-readable medium

Also Published As

Publication number Publication date
CN111210455B (en) 2023-08-01

Similar Documents

Publication Publication Date Title
US8732570B2 (en) Non-symbolic data system for the automated completion of forms
CN110942074B (en) Character segmentation recognition method and device, electronic equipment and storage medium
CN111046784A (en) Document layout analysis and identification method and device, electronic equipment and storage medium
CN109670494B (en) Text detection method and system with recognition confidence
US20210064859A1 (en) Image processing system, image processing method, and storage medium
CN107045632A (en) Method and apparatus for extracting text from imaging files
CN110135225B (en) Sample labeling method and computer storage medium
CN111340037A (en) Text layout analysis method and device, computer equipment and storage medium
CN111724396B (en) Image segmentation method and device, computer readable storage medium and electronic equipment
CN112749649A (en) Method and system for intelligently identifying and generating electronic contract
CN116740723A (en) PDF document identification method based on open source Paddle framework
JP4626777B2 (en) Information processing apparatus and information processing program
CN112613367A (en) Bill information text box acquisition method, system, equipment and storage medium
CN111198664B (en) Document printing method and device, computer storage medium and terminal
CN111210455B (en) Method and device for extracting preprinted information in image, medium and electronic equipment
CN116110066A (en) Information extraction method, device and equipment of bill text and storage medium
CN115130437A (en) Intelligent document filling method and device and storage medium
KR20150021846A (en) System and method for restoring digital documents
CN114120305A (en) Training method of text classification model, and recognition method and device of text content
CN111476090A (en) Watermark identification method and device
CN112801960A (en) Image processing method and device, storage medium and electronic equipment
CN112101356A (en) Method and device for positioning specific text in picture and storage medium
CN111753836A (en) Character recognition method and device, computer readable medium and electronic equipment
CN111104936A (en) Text image recognition method, device, equipment and storage medium
CN115273113B (en) Table text semantic recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant