WO2017071062A1 - 区域提取方法及装置 - Google Patents

区域提取方法及装置 Download PDF

Info

Publication number
WO2017071062A1
WO2017071062A1 PCT/CN2015/099298 CN2015099298W WO2017071062A1 WO 2017071062 A1 WO2017071062 A1 WO 2017071062A1 CN 2015099298 W CN2015099298 W CN 2015099298W WO 2017071062 A1 WO2017071062 A1 WO 2017071062A1
Authority
WO
WIPO (PCT)
Prior art keywords
area
information area
character
information
region
Prior art date
Application number
PCT/CN2015/099298
Other languages
English (en)
French (fr)
Inventor
龙飞
张涛
陈志军
Original Assignee
小米科技有限责任公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 小米科技有限责任公司 filed Critical 小米科技有限责任公司
Priority to MX2016003769A priority Critical patent/MX364147B/es
Priority to JP2017547045A priority patent/JP6396605B2/ja
Priority to KR1020167005538A priority patent/KR101760109B1/ko
Priority to RU2016110818A priority patent/RU2642404C2/ru
Publication of WO2017071062A1 publication Critical patent/WO2017071062A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • G06V10/507Summing image-intensity values; Histogram projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/758Involving statistics of pixels or of feature values, e.g. histogram matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/146Aligning or centring of the image pick-up or image-field
    • G06V30/1475Inclination or skew detection or correction of characters or of image to be recognised
    • G06V30/1478Inclination or skew detection or correction of characters or of image to be recognised of characters or characters lines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/158Segmentation of character regions using character size, text spacings or pitch estimation

Definitions

  • the present disclosure relates to the field of image processing, and in particular, to a region extraction method and apparatus.
  • the automatic identification technology of the ID card is a technology for recognizing the text information on the ID card through image processing.
  • the related art provides an automatic identification method for an identity card, which scans an ID card according to a fixed relative position by an ID card scanning device, and obtains a scanned image of the ID card; and performs character recognition on n predetermined regions in the scanned image. At least one of name information, gender information, ethnic information, date of birth information, address information, and citizenship number information.
  • ID images obtained directly from the camera there is still a large difficulty in recognition.
  • the present disclosure provides a region extraction method and apparatus.
  • the technical solution is as follows:
  • a region extraction method comprising:
  • the area is cut by the second information area to obtain at least one character area.
  • the location of the region is represented by vertex coordinates
  • Determining the second information area according to the location of the area of the first information area including:
  • the first information area is a citizen identification number area in the second generation ID card
  • the at least two vertex coordinates are two vertex coordinates of the citizen identification number area
  • the second information area is the second generation The address information area in the ID card
  • Determining the second information area according to the at least two vertex coordinates of the first information area and the predetermined relative positional relationship including:
  • the address information area is cropped according to the lower edge, the upper edge, the left edge, and the right edge.
  • performing area cutting on the second information area to obtain at least one character area including:
  • the first histogram comprising: a vertical coordinate of each row of pixel points and an accumulated value of foreground color pixel points in each row of pixel points;
  • n is a positive integer according to a continuous row set consisting of rows in which the accumulated value of the foreground pixels in the first histogram is greater than the first threshold;
  • the second histogram is calculated according to the vertical direction, and the second histogram includes: the abscissa of each column of pixels and the accumulated value of the foreground pixels in each column of pixels, n ⁇ i ⁇ 1, i is a positive integer;
  • n i character regions are identified according to a continuous column set consisting of columns in which the accumulated value of the foreground color pixel in the second histogram is greater than the second threshold.
  • the method further includes:
  • the method further includes:
  • the character region located on the right side of the adjacent two character regions is identified as the first character in the current line text region. region;
  • the character region located on the left side of the adjacent two character regions is identified as the last character region in the current line text region.
  • an area extracting apparatus comprising:
  • An obtaining module configured to obtain an area location of a first information area in the document image
  • a determining module configured to determine a second information area according to an area position of the first information area
  • the identification module is configured to perform area cutting on the second information area to obtain at least one character area.
  • the location of the region is represented by vertex coordinates
  • the determining module is configured to determine the second information region according to the at least two vertex coordinates of the first information region and the predetermined relative positional relationship, where the relative positional relationship is a relative positional relationship between the vertex coordinates and the second information region.
  • the first information area is a citizen identification number area in the second generation ID card
  • the at least two vertex coordinates are two vertex coordinates of the citizen identification number area
  • the second information area is the second generation The address information area in the ID card
  • Identify modules including:
  • a first determining submodule configured to determine a lower edge of the address information region according to a vertical coordinate of a vertex coordinate of the vertex coordinates closest to the address information region of the two vertex coordinates;
  • a second determining submodule configured to determine an upper edge of the address information area according to a vertical coordinate and a predetermined height of the closest one vertex coordinate
  • a third determining submodule configured to determine a left edge of the address information area according to an abscissa and a first predetermined width of coordinates of any one of the two vertex coordinates;
  • a fourth determining submodule configured to determine a right edge of the address information region according to an abscissa and a second predetermined width of coordinates of any one of the two vertex coordinates;
  • the crop submodule is configured to crop the address information area according to the lower edge, the upper edge, the left edge, and the right edge.
  • the identification module comprises:
  • a binarization sub-module configured to perform binarization on the second information region to obtain a binarized second information region
  • a first calculation sub-module configured to calculate a first histogram according to a horizontal direction for the binarized second information region, where the first histogram comprises: a vertical coordinate of each row of pixel points and a front of each row of pixels The accumulated value of the scene pixel;
  • a row identification sub-module configured to identify n rows of text regions according to a continuous row set of rows in which the accumulated value of the foreground pixels in the first histogram is greater than the first threshold, where n is a positive integer;
  • a second calculation sub-module configured to calculate a second histogram according to a vertical direction for the i-th line of text regions, the second histogram comprising: an abscissa of each column of pixels and a foreground pixel of each column of pixels Cumulative value, n ⁇ i ⁇ 1, i is a positive integer;
  • the character recognition sub-module is configured to identify n i character regions according to a continuous column set consisting of columns in which the accumulated value of the foreground color pixel in the second histogram is greater than the second threshold.
  • the apparatus further includes:
  • a line spacing identification module configured to identify a line spacing between adjacent two lines of text regions according to a continuous line set consisting of rows in which the accumulated value of the foreground pixels in the first histogram is greater than the first threshold
  • the discarding module is configured to discard a line of text that is closer to an edge of the second information area when the line spacing is greater than the third threshold, and the edge is an upper edge or a lower edge.
  • the apparatus further includes:
  • a word spacing identification module configured to identify a word spacing between adjacent two character regions according to a continuous column set consisting of columns in which the accumulated value of the foreground color pixel points in the second histogram is greater than the second threshold value
  • the character recognition module is configured to recognize the character area located on the right side of the adjacent two character areas as the current line character when the adjacent two character areas are located on the left side of the second information area and the word spacing is greater than the fourth threshold The first character area in the area;
  • the single-character identification module is configured to recognize the character area located on the left side of the adjacent two character areas as the current line when the adjacent two character areas are located on the right side of the second information area and the word spacing is greater than the fifth threshold. The last character area in the text area.
  • an area extracting apparatus comprising:
  • a memory for storing processor executable instructions
  • processor is configured to:
  • the area is cut by the second information area to obtain at least one character area.
  • the second information area is determined by the position of the area of the first information area in the document image, and the second information area is determined.
  • the information area is cut to accurately locate the second information area and accurately identify the character area in the second information area.
  • FIG. 1 is a flowchart of a region extraction method according to an exemplary embodiment
  • FIG. 2A is a flowchart of a region extraction method according to another exemplary embodiment
  • 2B is a flowchart of a region extraction method according to another exemplary embodiment
  • 2C is a schematic diagram of determining a lower edge of an address information area according to an exemplary embodiment
  • 2D is a schematic diagram of determining an upper edge of an address information area according to an exemplary embodiment
  • 2E is a schematic diagram of determining a left edge of an address information area, according to an exemplary embodiment
  • FIG. 2F is a schematic diagram of determining a right edge of an address information area according to an exemplary embodiment
  • 2G is a schematic diagram of determining an address information area according to an exemplary embodiment
  • FIG. 3A is a flowchart of a region extraction method according to another exemplary embodiment
  • FIG. 3B is a schematic diagram of a second information area binarization according to an exemplary embodiment
  • FIG. 3C is a schematic diagram of calculating a first histogram according to a horizontal direction, according to an exemplary embodiment
  • FIG. 3D is a schematic diagram of a continuous row set, according to an exemplary embodiment
  • FIG. 3E is a schematic diagram of calculating a second histogram in a vertical direction, according to an exemplary embodiment
  • FIG. 3F is a schematic diagram of a continuous column set, according to an exemplary embodiment
  • FIG. 4A is a flowchart of a region extraction method according to another exemplary embodiment
  • FIG. 4B is a schematic diagram showing a line spacing between two adjacent lines of text according to an exemplary embodiment
  • FIG. 5A is a flowchart of a region extraction method according to another exemplary embodiment
  • FIG. 5B is a schematic diagram showing a character spacing between adjacent two character regions, according to an exemplary embodiment
  • FIG. 6 is a block diagram of an area extracting apparatus according to an exemplary embodiment
  • FIG. 7 is a block diagram of an area extracting apparatus according to another exemplary embodiment.
  • FIG. 8 is a block diagram of an area extracting apparatus according to another exemplary embodiment.
  • FIG. 9 is a block diagram of an area extracting apparatus according to still another exemplary embodiment.
  • FIG. 10 is a block diagram of an area extracting apparatus, according to an exemplary embodiment.
  • FIG. 1 is a flowchart of a region extraction method according to an exemplary embodiment. As shown in FIG. 1, the region extraction method includes the following steps.
  • step 101 obtaining an area location of the first information area in the document image
  • the ID image is an image obtained directly from the document, such as an ID card image, a social security card image, and the like.
  • the first information area refers to an area in the document image carrying the text information, such as: a name information area, a date of birth information area, a gender area, an address information area, a citizenship number information area, a number information area, an information area for issuing a certificate authority, At least one of an effective date information area and the like information area.
  • step 102 the second information area is determined according to the location of the area of the first information area
  • the positioning difficulty of the first information area is lower than the positioning difficulty of the second information area.
  • the second information area is area cut to obtain at least one character area.
  • the area extraction method obtaineds the location of the area of the first information area in the document image; determines the second information area according to the location of the area of the first information area; and performs the second information area
  • the area is cut to obtain at least one character area; the problem of difficulty in identifying certain information areas in the document image obtained by direct shooting and inaccurate positioning of certain information areas is solved in the related art;
  • the area of an information area determines the second information area and cuts the second information area, thereby accurately positioning the second information area and accurately identifying the character area in the second information area.
  • FIG. 2A is a flowchart of a region extraction method according to another exemplary embodiment. As shown in FIG. 2A, the region extraction method includes the following steps.
  • step 201 the location of the area of the first information area in the document image is obtained, and the location of the area is represented by vertex coordinates;
  • the ID image is an image obtained directly from the document, such as an ID card image, a social security card image, and the like.
  • a rectangular area for guiding shooting is set in the shooting interface, and when the user aligns the rectangular area with the document, the document image is captured.
  • the terminal acquires an area position of the first information area in the ID image, and acquires vertex coordinates of each vertex in the first area position according to the area position of the first information area. Or, the location of the area is represented by vertex coordinates.
  • the upper left corner of the document image is the origin
  • the upper edge is the positive half axis of the abscissa x
  • the left edge is the right half axis of the vertical coordinate y.
  • a rectangular coordinate system is established, and the respective vertices of the first information region are seated at right angles.
  • the position in the mark system obtains the coordinates of the vertex corresponding to each vertex, and the position of the area of the first information area is represented by the vertex coordinates.
  • a second information region is determined according to at least two vertex coordinates of the first information region and a predetermined relative positional relationship; the relative positional relationship is a relative positional relationship between the vertex coordinates and the second information region;
  • the predetermined relative positional relationship refers to the relative position between the vertex coordinates of the first information region and the upper edge, the lower edge, the left edge, and the right edge of the second information region.
  • the terminal may determine the location of the area of the second information area according to the at least two vertex coordinates acquired in the first information area and the predetermined relative positional relationship.
  • the first information area includes four vertices, and the vertex coordinates of which two vertices of the four vertices in the first information area are used are not specifically limited.
  • step 203 area cutting is performed on the second information area to obtain at least one character area.
  • the area cutting is performed on the second information area. After the area is cut, the second information area is cut into at least one character area.
  • the character area is an image area including a single character.
  • the region extraction method determines the location of the region of the first information region in the document image; and determines the number according to the coordinates of at least two vertices of the first information region and the predetermined relative position relationship.
  • the second information area, the relative positional relationship is a relative positional relationship between the vertex coordinates and the second information area; the area cutting of the second information area is performed to obtain at least one character area; and the identity of the automatic identification method for direct photographing is solved.
  • the identification of the ID card information in the certificate image is difficult and the positioning of the ID card information is inaccurate; the second information area is determined by the location of the area of the first information area in the ID image, and the second information area is cut, Thereby the effect of accurately positioning the second information area and accurately identifying the character area in the second information area.
  • the first information area is a citizenship number area in the second generation ID card, and at least two vertex coordinates are two vertices of the upper left vertex and the upper right vertex of the citizenship number area.
  • the second information area is the address information area in the second generation ID card.
  • Step 202 can be replaced with the following steps 202a to 202e, as shown in FIG. 2B:
  • step 202a according to a vertex of the two vertex coordinates closest to the address information area
  • the vertical coordinate of the target determines the lower edge of the address information area
  • the address information area is above the citizenship number area. Therefore, according to the establishment of the Cartesian coordinate system, the higher the vertex in the coordinates of the two vertices, the smaller the vertical coordinate and the closer to the address information area, so the vertical coordinates of the higher vertices in the obtained two vertex coordinates are located.
  • the horizontal straight line serves as the lower edge of the address information area, and as shown in FIG. 2C, the horizontal line in which the vertical coordinate of the first numeral 3 of the citizenship number area is located is taken as the lower edge m1 of the address information area.
  • step 202b determining an upper edge of the address information area according to a vertical coordinate of the closest vertex coordinate and a predetermined height
  • the vertical coordinate of the vertex coordinate is used as the starting position, and the distance of the predetermined height is translated upward, and the horizontal line of the vertical coordinate after the predetermined height is translated is As the upper edge of the address information area.
  • the predetermined height is a relatively loose height, and only the area that is translated by the predetermined height covers the address information area, as shown in FIG. 2D, the vertical coordinate of the first number 3 of the citizenship number area.
  • the horizontal line where the vertical coordinate corresponding to the h height is located is taken as the upper edge m2 of the address information area.
  • step 202c the left edge of the address information area is determined according to the abscissa of the coordinates of any one of the two vertex coordinates and the first predetermined width;
  • the abscissa of the first digit 3 of the citizenship number area is the starting position, and after shifting the r*w width to the left, where r is a percentage, w is the length of the citizenship number area, and the width corresponding to the r*w width is The vertical line where the coordinates are located serves as the left edge m3 of the address information area.
  • the first predetermined width corresponding to the abscissa of the different vertex coordinates is different, that is, the first predetermined width that is translated to the left by the abscissa of the different vertex coordinates is different.
  • the first predetermined width is a percentage of the length of the citizenship number area.
  • step 202d the right edge of the address information area is determined according to the abscissa of the coordinates of any one of the two vertex coordinates and the second predetermined width;
  • the second predetermined width corresponding to the abscissa of the different vertex coordinates is different, that is, the second predetermined width of the abscissa translation using different vertex coordinates is different, and at the same time, when determining the right edge of the address information area, there is The abscissa of the vertex coordinates needs to be shifted to the left, and the abscissa of some vertex coordinates needs to be translated to the right, and the translation direction of the abscissa of different vertex coordinates is different. This part of the content is defined by the above relative positional relationship.
  • the second predetermined width is a percentage of the length of the citizenship number area.
  • step 202e the address information area is cropped according to the lower edge, the upper edge, the left edge, and the right edge.
  • the address information area is cropped according to the lower edge, the upper edge, the left edge, and the right edge of the address information area determined in steps 202a to 202d, as shown in FIG. 2G.
  • the region extraction method determines the upper edge, the lower edge, the left edge, and the right of the second information region according to the coordinates of the two vertices in the first information region and the predetermined relative positional relationship.
  • the edge thereby enabling the cropping of the approximate location of the second information area, facilitates accurate positioning of the characters in the second information area.
  • the implementation of the location of the area of the citizenship number area in the image of the second generation ID card in step 201 is due to the format of the citizenship number area in the second generation identity image. Relatively fixed, the acquisition method of the related art is relatively mature, and will not be described in this embodiment.
  • the model uses the training model to identify the citizenship number area in the second generation identity image to be identified, thereby determining the regional location of the citizenship number area.
  • step 203 may be replaced by the following steps 203a through 203e, as shown in FIG. 3A:
  • step 203a the second information area is binarized to obtain a binarized second information area
  • the second information area is pre-processed according to the second information area determined in step 202, where the pre-processing may include: denoising, filtering, extracting edges, etc.; and pre-processing the second information.
  • the area is binarized.
  • Binarization refers to comparing the gray value of the pixel in the second information area with the preset gray threshold, and dividing the pixel in the second information area into two parts: a pixel group larger than the preset gray threshold and smaller than The pixel group of the preset gray threshold is displayed in the second information area in two different colors in the second information area to obtain a binarized second information area, as shown in FIG. 3B.
  • a pixel of a color located in the foreground is referred to as a foreground pixel, that is, a white pixel in FIG. 3B
  • a color pixel located in the background is referred to as a background color pixel, that is, FIG. 3B Black pixel points.
  • the first histogram is calculated according to the horizontal direction for the binarized second information region, and the first histogram includes: a vertical coordinate of each row of pixels and a foreground pixel in each row of pixels. Cumulative value
  • step 203c according to the continuous row set formed by the rows of the foreground pixel points in the first histogram that are larger than the first threshold, n rows of text regions are identified, where n is a positive integer;
  • the accumulated value of the foreground color pixel in each row of pixels can be obtained, and the accumulated value of the foreground color pixel in each row of pixels is compared with the first threshold, and the first histogram is A set of consecutive rows consisting of rows of scene pixels having a cumulative value greater than the first threshold is determined to be the row in which the text region is located.
  • the continuous line set means that the line whose accumulated value of the foreground color pixel is larger than the first threshold is a continuous m line, and the set of the consecutive m line pixel points is as shown in FIG. 3D, for the m line pixels in the figure. Point, the accumulated value of the foreground color pixel in the left histogram is greater than the first threshold.
  • the m-line pixel points correspond to the text area "Village Dongwang 126" in the ID image.
  • Each successive row set is identified as a line of text, and n consecutive rows are identified as n lines of text.
  • the second histogram is calculated according to the vertical direction, and the second histogram includes: the abscissa of each column of pixels and the accumulated value of the foreground pixels in each column of pixels, n ⁇ i ⁇ 1, i is a positive integer;
  • the second histogram is calculated according to the vertical direction, the second histogram represents the abscissa of each column of pixels in the horizontal direction, and the foreground pixel in each column of pixels in the vertical direction.
  • the cumulative number of points is shown in Figure 3E.
  • n i character regions are identified according to a continuous column set consisting of columns in which the accumulated value of the foreground color pixel points in the second histogram is greater than the second threshold value;
  • an accumulated value of the foreground color pixel points in each column of pixels can be obtained, and the accumulated value of the foreground color pixel points in each column of pixels is compared with a second threshold value, and the foreground color pixel in the second histogram is obtained.
  • a contiguous set of columns consisting of columns whose accumulated value is greater than the second threshold is determined to be the column in which the character region is located.
  • the continuous column set means that the column whose accumulated value of the foreground color pixel is larger than the second threshold is a continuous p column, and the set of the consecutive p column pixel points is as shown in FIG. 3F, and the continuous column set is p, also That is, a continuous white area formed in the second histogram.
  • the accumulated values of the foreground color pixel points located in the lower side histogram are all greater than the second threshold.
  • the p column pixel points correspond to the character area "Zhe" in the ID image.
  • Each successive set of columns is identified as a character region, and n consecutive sets of columns are identified as n character regions.
  • the region extraction method determines the second information region by binarizing the second information region and calculating the first histogram according to the horizontal direction in the binarized second information region.
  • the second histogram is calculated in the vertical direction for each of the n-line text areas, and the character area corresponding to each character is identified.
  • step 203c the following steps may also be included after step 203c, as shown in FIG. 4A:
  • step 401 the line spacing between adjacent two lines of text regions is identified according to a continuous line set consisting of rows in which the accumulated value of the foreground color pixel points in the first histogram is greater than the first threshold value;
  • the address information area usually includes one to three lines of text areas, and the one to three lines of text areas have shorter line spacings. At the same time, the text in the one to three lines and the text in other information areas The word area has a large line spacing. In this step, the character area of the non-second information area is discarded by the feature of the line spacing.
  • the line spacing between adjacent two lines of text regions is obtained.
  • the line spacing refers to the interval between the two lines of text in the first histogram.
  • the line spacing between the line of text and the adjacent line of text is h1.
  • step 402 when the line spacing is greater than the third threshold, a line of text that is closer to the edge of the second information area is discarded, and the edge is the upper edge or the lower edge;
  • the text area is searched from bottom to top.
  • the text area of the next line is discarded, and the search continues.
  • searching again that the line spacing of two adjacent lines of text is greater than the third threshold, the search ends and the text area of the above line is discarded.
  • the search ends.
  • the text area is searched from top to bottom.
  • the line spacing of the first adjacent two lines of text searched is greater than a third threshold
  • the text area of the upper line is discarded.
  • the search ends and the text area of the following line is discarded.
  • the search ends.
  • the region extraction method identifies the adjacent two lines of text regions by using a continuous row set consisting of rows in which the accumulated value of the foreground color pixel points in the first histogram is greater than the first threshold.
  • the line spacing between the lines when the line spacing is greater than the third threshold, the line of text that is closer to the edge of the second information area is discarded, and the edge is the upper edge or the lower edge;
  • the line text area in the second information area is determined, so that the positioning of the second information area is more accurate.
  • step 203e may also be included after step 203e, as shown in FIG. 5A:
  • step 501 the word spacing between adjacent two character regions is identified according to a continuous column set consisting of columns in which the accumulated value of the foreground color pixel points in the second histogram is greater than the second threshold value;
  • N i The characters identified region obtained in step 203e, acquires character pitch between two adjacent character regions, each row of adjacent small amount of space between the two characters in the character region area.
  • the word pitch refers to the interval between two character regions in the second histogram, as shown in FIG. 5B, the word spacing between words is h2.
  • step 502 when the adjacent two character regions are located on the left side of the second information region and the word spacing is greater than the fourth threshold, the character regions located on the right side of the adjacent two character regions are identified as being in the current line text region.
  • the adjacent two character regions are all determined to belong to the character regions in the current line text region.
  • step 503 when the adjacent two character regions are located on the right side of the second information region and the word spacing is greater than the fifth threshold, the character regions located on the left side of the adjacent two character regions are identified as being in the current line text region. The last character area.
  • the adjacent two character regions are all determined to belong to the character regions in the current line text region.
  • the region extraction method identifies that two adjacent character regions are obtained according to a continuous column set composed of columns in which the accumulated value of the foreground color pixel points in the second histogram is greater than the second threshold.
  • the word spacing between two adjacent character regions is located on the left side of the second information region and the word spacing is greater than the fourth threshold value, and the character region located on the right side of the adjacent two character regions is recognized as the current line text region.
  • the first character region of the second character region when the adjacent two character regions are located on the right side of the second information region and the word spacing is greater than the fifth threshold, the character region located on the left side of the adjacent two character regions is recognized as the current line character
  • the last character region in the region; the character region in the second information region is determined according to the size of the word spacing, so that each character region in the second information region is accurately located.
  • the characters in the embodiment shown in FIG. 1, the embodiment shown in FIG. 2A, the embodiment shown in FIG. 2B, the embodiment shown in FIG. 3A, the embodiment shown in FIG. 4A, and the embodiment shown in FIG. 5A are used.
  • the character area can be further processed according to the existing character recognition algorithm to recognize the characters in the character area.
  • ID card image referred to in the above method embodiment is a schematic illustration in the present disclosure, and is not a true ID card image.
  • FIG. 6 is a block diagram of an area extracting apparatus according to an exemplary embodiment. As shown in FIG. 6, the area extracting apparatus includes but is not limited to:
  • the obtaining module 610 is configured to obtain an area location of the first information area in the ID image
  • the ID image is an image obtained directly from the document, such as an ID card image, a social security card image, and the like.
  • the first information area refers to an area in the document image carrying the text information, such as: a name information area, a date of birth information area, a gender area, an address information area, a citizenship number information area, At least one of a number information area, an issuance authority information area, an effective date information area, and the like.
  • the determining module 620 is configured to determine the second information area according to the location of the area of the first information area
  • the identification module 630 is configured to perform area cutting on the second information area to obtain at least one character area.
  • the area extracting apparatus obtains the area position of the first information area in the document image; determines the second information area according to the area position of the first information area; and performs the second information area
  • the area is cut to obtain at least one character area; the problem of difficulty in identifying certain information areas in the document image obtained by direct shooting and inaccurate positioning of certain information areas is solved in the related art;
  • the area of an information area determines the second information area and cuts the second information area, thereby accurately positioning the second information area and accurately identifying the character area in the second information area.
  • FIG. 7 is a block diagram of an area extracting apparatus according to another exemplary embodiment. As shown in FIG. 7, the area extracting apparatus includes, but is not limited to:
  • the obtaining module 610 is configured to obtain an area location of the first information area in the ID image
  • the ID image is an image obtained directly from the document, such as an ID card image, a social security card image, and the like.
  • the obtaining module 610 acquires vertex coordinates of each vertex in the first region position according to the region position of the first information region when acquiring the region position of the first information region in the document image. Or, the location of the area is represented by vertex coordinates.
  • the upper left corner of the document image is the origin
  • the upper edge is the positive half axis of the abscissa x
  • the left edge is the positive half axis of the vertical coordinate y.
  • a rectangular coordinate system is established, according to the vertices of the first information region in the Cartesian coordinate system. The position of the vertex corresponding to each vertex is obtained, and the position of the area of the first information area is represented by the vertex coordinates.
  • the determining module 620 is configured to determine the second information area according to the location of the area of the first information area
  • the determining module 620 is further configured to determine, according to the at least two vertex coordinates of the first information region and the predetermined relative positional relationship, the second information region, where the relative positional relationship is the vertex coordinate and the second information region.
  • the relative positional relationship between domains is further configured to determine, according to the at least two vertex coordinates of the first information region and the predetermined relative positional relationship, the second information region, where the relative positional relationship is the vertex coordinate and the second information region. The relative positional relationship between domains.
  • the predetermined relative positional relationship refers to the relative position between the vertex coordinates of the first information region and the upper edge, the lower edge, the left edge, and the right edge of the second information region.
  • the determining module 620 can determine the regional location of the second information region according to the at least two vertex coordinates acquired in the first information region and the predetermined relative positional relationship.
  • the determining module 620 can include the following submodules:
  • the first determining sub-module 621 is configured to determine a lower edge of the address information area according to a vertical coordinate of a vertex coordinate of the vertex coordinates closest to the address information area of the two vertex coordinates;
  • the address information area is above the citizenship number area. Therefore, according to the establishment of the Cartesian coordinate system, the higher the vertex in the coordinates of the two vertices, the smaller the vertical coordinate, and the closer to the address information area, so the first determining sub-module 621 will obtain the higher of the two vertex coordinates.
  • the horizontal line of the vertex of the vertex is the lower edge of the address information area
  • the second determining sub-module 622 is configured to determine an upper edge of the address information area according to a vertical coordinate and a predetermined height of the closest one vertex coordinate;
  • the second determining sub-module 622 takes the vertical coordinate of the vertex coordinate as the starting position and translates the distance of the predetermined height upward, which will be The horizontal line on which the vertical coordinates after the predetermined height translation is located serves as the upper edge of the address information area.
  • the third determining sub-module 623 is configured to determine a left edge of the address information area according to an abscissa of the coordinate of any one of the two vertex coordinates and a first predetermined width;
  • the third determining sub-module 623 translates the abscissa of any one of the vertex coordinates to the left by a first predetermined width, and the vertical line of the abscissa after the vertex coordinates are translated as the left edge of the address information area.
  • a fourth determining sub-module 624 configured to determine a right edge of the address information area according to an abscissa and a second predetermined width of any one of the vertex coordinates;
  • the fourth determining sub-module 624 translates the abscissa of the coordinates of any one of the two vertex coordinates by a second predetermined width, and the vertical line of the abscissa after the vertex coordinates are translated as the right edge of the address information area.
  • a crop sub-module 625 configured to crop the ground according to the lower edge, the upper edge, the left edge, and the right edge Address information area.
  • the lower edge, upper edge, left edge, and right edge cropping sub-module 625 of the address information area determined in the first determining sub-module 621 through the fourth determining sub-module 624 are tailored to the address information area.
  • the identification module 630 is configured to perform area cutting on the second information area to obtain at least one character area.
  • the identification module 630 After the cropping sub-module 625 determines the location of the region of the second information region, the identification module 630 performs region cutting on the second information region. After the area is cut, the second information area is cut into at least one character area.
  • the character area is an image area including a single character.
  • the identification module 630 can include the following sub-modules, as shown in FIG. 8:
  • the binarization sub-module 631 is configured to perform binarization on the second information region to obtain a binarized second information region;
  • the second information area is pre-processed according to the second information area determined by the cropping sub-module 625, where the pre-processing may include: denoising, filtering, extracting edges, etc.; The two information areas are binarized.
  • Binarization refers to comparing the gray value of the pixel in the second information area with the preset gray threshold, and dividing the pixel in the second information area into two parts: a pixel group larger than the preset gray threshold and smaller than The pixel group of the preset grayscale threshold presents two different color colors in the second information region in the second information region, and obtains the binarized second information region.
  • the first calculation sub-module 632 is configured to calculate a first histogram according to a horizontal direction for the binarized second information region, where the first histogram comprises: a vertical coordinate of each row of pixel points and each row of pixels The accumulated value of the foreground color pixel;
  • the first calculation sub-module 632 calculates the first histogram in the horizontal direction by the second information region processed by the binarization sub-module 631, and the first histogram represents the vertical coordinate of each row of pixel points in the vertical direction.
  • the horizontal direction represents the cumulative value of the number of foreground pixels in each row of pixels.
  • the row identification sub-module 633 is configured to identify n rows of text regions according to a continuous row set consisting of rows in which the accumulated value of the foreground pixels in the first histogram is greater than the first threshold, where n is a positive integer;
  • the accumulated value of the foreground color pixel in each row of pixels may be obtained according to the first histogram, and the row identification sub-module 633 compares the accumulated value of the foreground pixel in each row of pixels with the first threshold.
  • a set of consecutive rows consisting of rows in which the accumulated value of the foreground color pixel in the first histogram is greater than the first threshold is determined as the row in which the text region is located.
  • the continuous row set refers to a set in which the accumulated value of the foreground color pixel is larger than the first threshold is a continuous m row, and the continuous m row pixel is composed of a set.
  • Each successive row set is identified as a line of text, and n consecutive rows are identified as n lines of text.
  • the second calculation sub-module 634 is configured to calculate a second histogram according to a vertical direction for the i-th line of text regions, where the second histogram comprises: an abscissa of each column of pixels and a foreground pixel in each column of pixels Cumulative value, n ⁇ i ⁇ 1, i is a positive integer;
  • the second calculation sub-module 634 calculates a second histogram in the vertical direction, the second histogram indicating the abscissa of each column of pixels in the horizontal direction, in the vertical
  • the direction represents the cumulative value of the number of foreground pixels in each column of pixels.
  • the character recognition sub-module 635 is configured to identify n i character regions according to a continuous column set consisting of columns in which the accumulated value of the foreground color pixel points in the second histogram is greater than the second threshold.
  • an accumulated value of the foreground color pixel points in each column of pixels may be acquired, and the character recognition sub-module 635 compares the accumulated value of the foreground color pixel points in each column of pixels with a second threshold, and the second rectangular In the figure, the cumulative column of the foreground pixel is larger than the second column of the second threshold, and is determined to be the column in which the character region is located.
  • the continuous column set means that the column whose accumulated value of the foreground color pixel is larger than the second threshold is a continuous p column, and the continuous p column pixel group is composed of a set.
  • Each successive set of columns is identified as a character region, and n consecutive sets of columns are identified as n character regions.
  • the area extracting apparatus determines the second information area by binarizing the second information area and calculating the first histogram according to the horizontal direction in the binarized second information area.
  • the second histogram is calculated in the vertical direction for each of the n-line text areas, and the character area corresponding to each character is identified.
  • the apparatus may further comprise the following modules, as shown in Figure 9:
  • the line spacing identification module 910 is configured to be larger according to the accumulated value of the foreground color pixel points in the first histogram And contiguous rows of rows formed by the rows of the first threshold, identifying a line spacing between adjacent two lines of text regions;
  • the line spacing identification module 910 obtains the line spacing between adjacent two lines of text regions.
  • Line spacing refers to the spacing between two lines of text in the first histogram.
  • the discarding module 920 is configured to discard a line of text that is closer to the edge of the second information area when the line spacing is greater than the third threshold, the edge being the upper edge or the lower edge.
  • the text area is searched from bottom to top.
  • the discarding module 920 discards the text area of the next line, and continues upward. Search, when searching again that the line spacing of two adjacent lines of text is greater than the third threshold, the search ends and the text area of the above line is discarded. At the same time, it is determined that the remaining text area belongs to the second information area.
  • the word spacing identification module 930 is configured to identify a word spacing between adjacent two character regions according to a continuous column set consisting of columns in which the accumulated value of the foreground color pixel points in the second histogram is greater than the second threshold value;
  • the word spacing recognition module 930 acquires the word spacing between adjacent two character regions, and the word spacing between adjacent two character regions in each line of text region. Smaller.
  • Word spacing refers to the spacing between two character regions in the second histogram.
  • the character recognition module 940 is configured to identify the character region located on the right side of the adjacent two character regions as the current row when the adjacent two character regions are located on the left side of the second information region and the word pitch is greater than the fourth threshold.
  • the single-character identification module 950 is configured to recognize the character region located on the left side of the adjacent two character regions as the current when the adjacent two character regions are located on the right side of the second information region and the word spacing is greater than the fifth threshold. The last character area in the line text area.
  • the region extracting apparatus identifies the adjacent two character regions by using a continuous column set composed of columns in which the accumulated value of the foreground color pixel points in the second histogram is greater than the second threshold.
  • the word spacing between two adjacent character regions is located on the left side of the second information region and the word spacing is greater than the fourth threshold value, and the character region located on the right side of the adjacent two character regions is recognized as the current line text region.
  • the first character region; the adjacent two character regions are located to the right of the second information region and When the word spacing is greater than the fifth threshold, the character area located on the left side of the adjacent two character areas is identified as the last character area in the current line text area; and the character area in the second information area is determined according to the size of the word spacing. So that each character region in the second information area is accurately located.
  • An exemplary embodiment of the present disclosure provides an area extracting apparatus capable of implementing the area extracting method provided by the present disclosure, the area extracting apparatus comprising: a processor, a memory for storing processor executable instructions;
  • processor is configured to:
  • the area is cut by the second information area to obtain at least one character area.
  • FIG. 10 is a block diagram of an apparatus for using a region extraction method, according to an exemplary embodiment.
  • device 1000 can be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a gaming console, a tablet device, a medical device, a fitness device, a personal digital assistant, and the like.
  • apparatus 1000 can include one or more of the following components: processing component 1002, memory 1004, power component 1006, multimedia component 1008, audio component 1010, input/output (I/O) interface 1012, sensor component 1014, and Communication component 1016.
  • Processing component 1002 typically controls the overall operation of device 1000, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations.
  • Processing component 1002 can include one or more processors 1018 to execute instructions to perform all or part of the steps of the above described methods.
  • processing component 1002 can include one or more modules to facilitate interaction between component 1002 and other components.
  • processing component 1002 can include a multimedia module to facilitate interaction between multimedia component 1008 and processing component 1002.
  • the memory 1004 is configured to store various types of data to support operation at the device 1000. Examples of such data include instructions for any application or method operating on device 1000, contact data, phone book data, messages, pictures, videos, and the like.
  • the memory 1004 can be of any type Lossless or non-volatile memory devices or combinations thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable programmable read only memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Disk or Optical Disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read only memory
  • EPROM erasable programmable read only memory
  • PROM Programmable Read Only Memory
  • ROM Read Only Memory
  • Magnetic Memory Flash Memory
  • Disk Disk or Optical Disk.
  • Power component 1006 provides power to various components of device 1000.
  • Power component 1006 can include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for device 1000.
  • the multimedia component 1008 includes a screen between the device 1000 and the user that provides an output interface.
  • the screen can include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen can be implemented as a touch screen to receive input signals from the user.
  • the touch panel includes one or more touch sensors to sense touches, slides, and gestures on the touch panel. The touch sensor can sense not only the boundaries of the touch or sliding action, but also the duration and pressure associated with the touch or slide operation.
  • the multimedia component 1008 includes a front camera and/or a rear camera. When the device 1000 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front and rear camera can be a fixed optical lens system or have focal length and optical zoom capabilities.
  • the audio component 1010 is configured to output and/or input an audio signal.
  • the audio component 1010 includes a microphone (MIC) that is configured to receive an external audio signal when the device 1000 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode.
  • the received audio signal may be further stored in memory 1004 or transmitted via communication component 1016.
  • the audio component 1010 also includes a speaker for outputting an audio signal.
  • the I/O interface 1012 provides an interface between the processing component 1002 and the peripheral interface module, which may be a keyboard, a click wheel, a button, or the like. These buttons may include, but are not limited to, a home button, a volume button, a start button, and a lock button.
  • Sensor assembly 1014 includes one or more sensors for providing device 1000 with various aspects of state assessment.
  • the sensor assembly 1014 can detect an open/closed state of the device 1000, the relative positioning of the components, such as a display and a keypad of the device 1000, and the sensor assembly 1014 can also detect a change in position of a component of the device 1000 or device 1000, the user The presence or absence of contact with device 1000, device 1000 orientation or acceleration/deceleration and temperature variation of device 1000.
  • Sensor assembly 1014 can include a proximity sensor configured to detect nearby without any physical contact The existence of an object.
  • Sensor assembly 1014 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications.
  • the sensor assembly 1014 can also include an acceleration sensor, a gyro sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
  • Communication component 1016 is configured to facilitate wired or wireless communication between device 1000 and other devices.
  • the device 1000 can access a wireless network based on a communication standard, such as Wi-Fi, 2G or 3G, or a combination thereof.
  • communication component 1016 receives broadcast signals or broadcast associated information from an external broadcast management system via a broadcast channel.
  • communication component 1016 also includes a near field communication (NFC) module to facilitate short range communication.
  • NFC near field communication
  • the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IRDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
  • RFID radio frequency identification
  • IRDA infrared data association
  • UWB ultra-wideband
  • Bluetooth Bluetooth
  • apparatus 1000 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable A gate array (FPGA), controller, microcontroller, microprocessor or other electronic component implementation for performing the above-described region extraction method.
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGA field programmable A gate array
  • controller microcontroller, microprocessor or other electronic component implementation for performing the above-described region extraction method.
  • non-transitory computer readable storage medium comprising instructions, such as a memory 1004 comprising instructions executable by processor 1018 of apparatus 1000 to perform the region extraction method described above.
  • the non-transitory computer readable storage medium can be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, and an optical data storage device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Character Input (AREA)
  • Image Analysis (AREA)

Abstract

本公开揭示了一种区域提取方法及装置,属于图像处理领域。所述区域提取方法包括:通过获取证件图像中第一信息区域的区域位置;根据所述第一信息区域的区域位置确定出第二信息区域;对所述第二信息区域进行区域切割,得到至少一个字符区域;解决了相关技术中对于直接拍摄得到的证件图像中的某些信息区域的识别难度大和对某些信息区域的定位不准确的问题;达到了通过证件图像中第一信息区域的区域位置确定第二信息区域,并对第二信息区域进行切割,从而对第二信息区域准确定位和对第二信息区域中的字符区域进行准确识别的效果。

Description

区域提取方法及装置
本申请基于申请号为201510726272.4、申请日为2015年10月30日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本公开涉及图像处理领域,特别涉及一种区域提取方法及装置。
背景技术
身份证的自动识别技术是一种通过图像处理对身份证上的文字信息进行识别的技术。
相关技术提供了一种身份证的自动识别方法,通过身份证扫描设备按照固定的相对位置对身份证进行扫描,得到身份证的扫描图像;对扫描图像中的n个预定区域进行文字识别,得到姓名信息、性别信息、民族信息、出生日期信息、地址信息和公民身份号码信息中的至少一种。但是对于直接拍摄得到的身份证图像,仍然有较大的识别难度。
发明内容
为了解决相关技术中存在的问题,本公开提供一种区域提取方法及装置。所述技术方案如下:
根据本公开实施例的第一方面,提供一种区域提取方法,该方法包括:
获取证件图像中第一信息区域的区域位置;
根据第一信息区域的区域位置确定出第二信息区域;
对第二信息区域进行区域切割,得到至少一个字符区域。
在一个可选的实施例中,区域位置采用顶点坐标表示;
根据第一信息区域的区域位置确定第二信息区域,包括:
根据第一信息区域的至少两个顶点坐标和预定的相对位置关系,确定出第二信息区域,相对位置关系是顶点坐标与第二信息区域之间的相对位置关系。
在一个可选的实施例中,第一信息区域是第二代身份证中的公民身份号码区域,至少两个顶点坐标是公民身份号码区域的两个顶点坐标,第二信息区域是第二代身份证中的地址信息区域;
根据第一信息区域的至少两个顶点坐标和预定的相对位置关系,确定出第二信息区域,包括:
根据两个顶点坐标中与地址信息区域最接近的一个顶点坐标的竖坐标,确定出地址信息区域的下边缘;
根据最接近的一个顶点坐标的竖坐标和预定高度,确定出地址信息区域的上边缘;
根据两个顶点坐标中任意一个顶点坐标的横坐标和第一预定宽度,确定出地址信息区域的左边缘;
根据两个顶点坐标中任意一个顶点坐标的横坐标和第二预定宽度,确定出地址信息区域的右边缘;
根据下边缘、上边缘、左边缘和右边缘裁剪出地址信息区域。
在一个可选的实施例中,对第二信息区域进行区域切割,得到至少一个字符区域,包括:
对第二信息区域进行二值化,得到二值化后的第二信息区域;
对二值化后的第二信息区域按照水平方向计算第一直方图,第一直方图包括:每行像素点的竖坐标和每行像素点中前景色像素点的累加值;
根据第一直方图中前景色像素点的累加值大于第一阈值的行所组成的连续行集合,识别得到n行文字区域,n为正整数;
对于第i行文字区域,按照竖直方向计算第二直方图,第二直方图包括:每列像素点的横坐标和每列像素点中前景色像素点的累加值,n≥i≥1,i为正整数;
根据第二直方图中前景色像素点的累加值大于第二阈值的列所组成的连续列集合,识别得到ni个字符区域。
在一个可选的实施例中,该方法还包括:
根据第一直方图中前景色像素点的累加值大于第一阈值的行所组成的连续行集合,识别得到相邻两行文字区域之间的行间距;
在行间距大于第三阈值时,将与第二信息区域的边缘更接近的一行文字区域进行丢弃,边缘为上边缘或下边缘。
在一个可选的实施例中,该方法还包括:
根据第二直方图中前景色像素点的累加值大于第二阈值的列所组成的连续列集合,识别得到相邻两个字符区域之间的字间距;
在相邻两个字符区域位于第二信息区域的左侧且字间距大于第四阈值时,将相邻两个字符区域中位于右侧的字符区域识别为当前行文字区域中的第一个字符区域;
在相邻两个字符区域位于第二信息区域的右侧且字间距大于第五阈值时,将相邻两个字符区域中位于左侧的字符区域识别为当前行文字区域中的最后一个字符区域。
根据本公开实施例的第二方面,提供一种区域提取装置,该装置包括:
获取模块,被配置为获取证件图像中第一信息区域的区域位置;
确定模块,被配置为根据第一信息区域的区域位置确定出第二信息区域;
识别模块,被配置为对第二信息区域进行区域切割,得到至少一个字符区域。
在一个可选的实施例中,区域位置采用顶点坐标表示;
确定模块,被配置为根据第一信息区域的至少两个顶点坐标和预定的相对位置关系,确定出第二信息区域,相对位置关系是顶点坐标与第二信息区域之间的相对位置关系。
在一个可选的实施例中,第一信息区域是第二代身份证中的公民身份号码区域,至少两个顶点坐标是公民身份号码区域的两个顶点坐标,第二信息区域是第二代身份证中的地址信息区域;
确定模块,包括:
第一确定子模块,被配置为根据两个顶点坐标中与地址信息区域最接近的一个顶点坐标的竖坐标,确定出地址信息区域的下边缘;
第二确定子模块,被配置为根据最接近的一个顶点坐标的竖坐标和预定高度,确定出地址信息区域的上边缘;
第三确定子模块,被配置为根据两个顶点坐标中任意一个顶点坐标的横坐标和第一预定宽度,确定出地址信息区域的左边缘;
第四确定子模块,被配置为根据两个顶点坐标中任意一个顶点坐标的横坐标和第二预定宽度,确定出地址信息区域的右边缘;
裁剪子模块,被配置为根据下边缘、上边缘、左边缘和右边缘裁剪出地址信息区域。
在一个可选的实施例中,识别模块,包括:
二值化子模块,被配置为对第二信息区域进行二值化,得到二值化后的第二信息区域;
第一计算子模块,被配置为对二值化后的第二信息区域按照水平方向计算第一直方图,第一直方图包括:每行像素点的竖坐标和每行像素点中前景色像素点的累加值;
行识别子模块,被配置为根据第一直方图中前景色像素点的累加值大于第一阈值的行所组成的连续行集合,识别得到n行文字区域,n为正整数;
第二计算子模块,被配置为对于第i行文字区域,按照竖直方向计算第二直方图,第二直方图包括:每列像素点的横坐标和每列像素点中前景色像素点的累加值,n≥i≥1,i为正整数;
字符识别子模块,被配置为根据第二直方图中前景色像素点的累加值大于第二阈值的列所组成的连续列集合,识别得到ni个字符区域。
在一个可选的实施例中,该装置还包括:
行间距识别模块,被配置为根据第一直方图中前景色像素点的累加值大于第一阈值的行所组成的连续行集合,识别得到相邻两行文字区域之间的行间距;
丢弃模块,被配置为在行间距大于第三阈值时,将与第二信息区域的边缘更接近的一行文字区域进行丢弃,边缘为上边缘或下边缘。
在一个可选的实施例中,该装置还包括:
字间距识别模块,被配置为根据第二直方图中前景色像素点的累加值大于第二阈值的列所组成的连续列集合,识别得到相邻两个字符区域之间的字间距;
文字识别模块,被配置为在相邻两个字符区域位于第二信息区域的左侧且字间距大于第四阈值时,将相邻两个字符区域中位于右侧的字符区域识别为当前行文字区域中的第一个字符区域;
单字符识别模块,被配置为在相邻两个字符区域位于第二信息区域的右侧且字间距大于第五阈值时,将相邻两个字符区域中位于左侧的字符区域识别为当前行文字区域中的最后一个字符区域。
根据本公开实施例的第三方面,提供一种区域提取装置,该装置包括:
处理器;
用于存储处理器可执行指令的存储器;
其中,处理器被配置为:
获取证件图像中第一信息区域的区域位置;
根据第一信息区域的区域位置确定出第二信息区域;
对第二信息区域进行区域切割,得到至少一个字符区域。
本公开的实施例提供的技术方案可以包括以下有益效果:
通过获取证件图像中第一信息区域的区域位置;根据第一信息区域的区域位置确定出第二信息区域;对第二信息区域进行区域切割,得到至少一个字符区域;解决了相关技术中对于直接拍摄得到的证件图像中的某些信息区域的识别难度大和对某些信息区域的定位不准确的问题;达到了通过证件图像中第一信息区域的区域位置确定第二信息区域,并对第二信息区域进行切割,从而对第二信息区域准确定位和对第二信息区域中的字符区域进行准确识别的效果。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性的,并不能限制本公开。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并于说明书一起用于解释本公开的原理。
图1是根据一示例性实施例示出的一种区域提取方法的流程图;
图2A是根据另一示例性实施例示出的一种区域提取方法的流程图;
图2B是根据另一示例性实施例示出的一种区域提取方法的流程图;
图2C是根据一示例性实施例示出的一种确定地址信息区域下边缘的示意图;
图2D是根据一示例性实施例示出的一种确定地址信息区域上边缘的示意图;
图2E是根据一示例性实施例示出的一种确定地址信息区域左边缘的示意图;
图2F是根据一示例性实施例示出的一种确定地址信息区域右边缘的示意图;
图2G是根据一示例性实施例示出的一种确定地址信息区域的示意图;
图3A是根据另一示例性实施例示出的一种区域提取方法的流程图;
图3B是根据一示例性实施例示出的一种第二信息区域二值化的示意图;
图3C是根据一示例性实施例示出的一种按照水平方向计算第一直方图的示意图;
图3D是根据一示例性实施例示出的一种连续行集合的示意图;
图3E是根据一示例性实施例示出的一种按照竖直方向计算第二直方图的示意图;
图3F是根据一示例性实施例示出的一种连续列集合的示意图;
图4A是根据另一示例性实施例示出的一种区域提取方法的流程图;
图4B是根据一示例性实施例示出的一种相邻两行文字区域之间行间距的示意图;
图5A是根据另一示例性实施例示出的一种区域提取方法的流程图;
图5B是根据一示例性实施例示出的一种相邻两个字符区域之间字符间距的示意图;
图6是根据一示例性实施例示出的一种区域提取装置的框图;
图7是根据另一示例性实施例示出的一种区域提取装置的框图;
图8是根据另一示例性实施例示出的一种区域提取装置的框图;
图9是根据再一示例性实施例示出的一种区域提取装置的框图;
图10是根据一示例性实施例示出的一种区域提取装置的框图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。
图1是根据一示例性实施例示出的一种区域提取方法的流程图,如图1所示,该区域提取方法包括以下几个步骤。
在步骤101中,获取证件图像中第一信息区域的区域位置;
证件图像是对证件直接拍摄得到的图像,比如:身份证图像、社会保障卡图像等。
第一信息区域是指证件图像中携带有文字信息的区域,比如:姓名信息区域、出生日期信息区域、性别区域、地址信息区域、公民身份号码信息区域、编号信息区域、颁发证件机关信息区域、有效日期信息区域等等信息区域中的至少一种。
在步骤102中,根据第一信息区域的区域位置确定出第二信息区域;
可选的,第一信息区域的定位难度低于第二信息区域的定位难度。
在步骤103中,对第二信息区域进行区域切割,得到至少一个字符区域。
综上所述,本公开实施例中提供的区域提取方法,通过获取证件图像中第一信息区域的区域位置;根据第一信息区域的区域位置确定出第二信息区域;对第二信息区域进行区域切割,得到至少一个字符区域;解决了相关技术中对于直接拍摄得到的证件图像中的某些信息区域的识别难度大和对某些信息区域的定位不准确的问题;达到了通过证件图像中第一信息区域的区域位置确定第二信息区域,并对第二信息区域进行切割,从而对第二信息区域准确定位和对第二信息区域中的字符区域进行准确识别的效果。
图2A是根据另一示例性实施例示出的一种区域提取方法的流程图,如图2A所示,该区域提取方法包括以下几个步骤。
在步骤201中,获取证件图像中第一信息区域的区域位置,区域位置采用顶点坐标表示;
证件图像是对证件直接拍摄得到的图像,比如:身份证图像、社会保障卡图像等。可选地,在拍摄证件图像时,拍摄界面中设置有用于引导拍摄的矩形区域,用户将矩形区域对准证件时,拍摄得到证件图像。
终端获取证件图像中第一信息区域的区域位置,根据第一信息区域的区域位置获取该第一区域位置中的各个顶点的顶点坐标。或者说,区域位置采用顶点坐标来表示。
比如:以证件图像的左上角为原点,上边缘为横坐标x的正半轴、左边缘为竖坐标y的正半轴建立直角坐标系,根据第一信息区域的各个顶点在直角坐 标系中的位置,获取各个顶点对应的顶点坐标,利用顶点坐标表示该第一信息区域的区域位置。
在步骤202中,根据第一信息区域的至少两个顶点坐标和预定的相对位置关系,确定出第二信息区域;相对位置关系是顶点坐标与第二信息区域之间的相对位置关系;
预定的相对位置关系是指第一信息区域的顶点坐标与第二信息区域上边缘、下边缘、左边缘和右边缘之间的相对位置。
终端根据第一信息区域中获取的至少两个顶点坐标和预定的相对位置关系即可确定出第二信息区域的区域位置。
可选地,第一信息区域包括四个顶点,采用第一信息区域中四个顶点的哪两个顶点的顶点坐标不作具体限定。可选地,在第一信息区域中的两个顶点坐标之间的距离越大,则确定出的第二信息区域出现的误差会越小。
在步骤203中,对第二信息区域进行区域切割,得到至少一个字符区域。
在确定出第二信息区域的区域位置后,对第二信息区域进行区域切割。经过区域切割后,第二信息区域被切割为至少一个字符区域。字符区域是包括单个字符的图像区域。
综上所述,本公开实施例中提供的区域提取方法,通过获取证件图像中第一信息区域的区域位置;根据第一信息区域的至少两个顶点坐标和预定的相对位置关系,确定出第二信息区域,相对位置关系是顶点坐标与第二信息区域之间的相对位置关系;对第二信息区域进行区域切割,得到至少一个字符区域;解决了身份证自动识别方法对于直接拍摄得到的身份证图像中的身份证信息的识别难度大和对身份证信息的定位不准确的问题;达到了通过证件图像中第一信息区域的区域位置确定第二信息区域,并对第二信息区域进行切割,从而对第二信息区域准确定位和对第二信息区域中的字符区域进行准确识别的效果。
在基于图2A所示的可选实施例中,第一信息区域是第二代身份证中的公民身份号码区域,至少两个顶点坐标是公民身份号码区域的左上顶点和右上顶点的两个顶点坐标,第二信息区域是第二代身份证中的地址信息区域。步骤202可替换成为如下步骤202a至202e,如图2B所示:
在步骤202a中,根据两个顶点坐标中与地址信息区域最接近的一个顶点坐 标的竖坐标,确定出地址信息区域的下边缘;
由公民身份号码区域和地址信息区域之间预定的相对位置关系可知,地址信息区域在公民身份号码区域的上方。因此,根据直角坐标系的建立方式可知,两个顶点坐标中顶点越高,竖坐标越小,也越接近地址信息区域,故将获取到的两个顶点坐标中更高的顶点的竖坐标所在的横直线作为地址信息区域的下边缘,如图2C所示,将公民身份号码区域的第一个数字3的竖坐标所在的横直线作为地址信息区域的下边缘m1。
在步骤202b中,根据最接近的一个顶点坐标的竖坐标和预定高度,确定出地址信息区域的上边缘;
在确定出与地址信息区域最接近的一个顶点坐标的竖坐标后,以该顶点坐标的竖坐标为起始位置,向上平移预定高度的距离,将经过预定高度平移后的竖坐标所在的横直线作为地址信息区域的上边缘。
可选的,该预定高度是一个比较宽松的高度,只需经过预定高度平移的区域覆盖了地址信息区域即可,如图2D所示,以公民身份号码区域的第一个数字3的竖坐标为起始位置,向上平移h高度后,将h高度对应的竖坐标所在的横直线作为地址信息区域的上边缘m2。
在步骤202c中,根据两个顶点坐标中任意一个顶点坐标的横坐标和第一预定宽度,确定出地址信息区域的左边缘;
将两个顶点坐标中任意一个顶点坐标的横坐标向左平移第一预定宽度,将该顶点坐标平移后的横坐标所在的竖直线作为地址信息区域的左边缘;如图2E所示,以公民身份号码区域的第一个数字3的横坐标为起始位置,向左平移r*w宽度后,其中,r为百分比,w为公民身份号码区域的长度,将r*w宽度对应的横坐标所在的竖直线作为地址信息区域的左边缘m3。
可选的,不同的顶点坐标的横坐标对应的第一预定宽度不同,也即利用不同的顶点坐标的横坐标向左平移的第一预定宽度不同。
可选的,第一预定宽度为公民身份号码区域长度的百分比。
在步骤202d,中,根据两个顶点坐标中任意一个顶点坐标的横坐标和第二预定宽度,确定出地址信息区域的右边缘;
将两个顶点坐标中任意一个顶点坐标的横坐标平移第二预定宽度,将该顶点坐标平移后的横坐标所在的竖直线作为地址信息区域的右边缘;如图2F所示, 以公民身份号码区域的最后一个数字4的横坐标为起始位置,向左平移d宽度后,将d宽度对应的横坐标所在的竖直线作为地址信息区域的右边缘m4。
可选的,不同的顶点坐标的横坐标对应的第二预定宽度不同,也即利用不同的顶点坐标的横坐标平移的第二预定宽度不同,同时,在确定地址信息区域的右边缘时,有的顶点坐标的横坐标需要向左平移,有的顶点坐标的横坐标需要向右平移,不同顶点坐标的横坐标的平移方向不同。此部分内容均由上述相对位置关系来定义。
可选的,第二预定宽度为公民身份号码区域长度的百分比。
在步骤202e中,根据下边缘、上边缘、左边缘和右边缘裁剪出地址信息区域。
根据步骤202a至步骤202d中确定的地址信息区域的下边缘、上边缘、左边缘和右边缘裁剪出地址信息区域,如图2G所示。
综上所述,本实施例提供的区域提取方法,根据第一信息区域中的两个顶点坐标和预定的相对位置关系,分别确定出第二信息区域的上边缘、下边缘、左边缘和右边缘,从而使得能够裁剪出第二信息区域的大概位置,有利于对第二信息区域中字符切割时的准确定位。
需要说明的一点是:在图2B实施例中,对于在步骤201中获取第二代身份证图像中公民身份号码区域的区域位置的实现方式,由于第二代身份图像中的公民身份号码区域格式相对固定,相关技术的获取方法比较成熟,本实施例中不再进行说明。作为示例性的一种方式:通过对第二代身份证图像中的haar特征或其他特征进行提取,将提取的特征输入到Adaboost或SVM(Support Vector Machine,支持向量机)中进行训练,得到训练模型;利用训练模型对待识别的第二代身份图像中的公民身份号码区域进行识别,从而确定出公民身份号码区域的区域位置。
在基于图2A所示的可选实施例中,步骤203可替代成为如下步骤203a至203e,如图3A所示:
在步骤203a中,对第二信息区域进行二值化,得到二值化后的第二信息区域;
可选地,根据步骤202确定出的第二信息区域,对该第二信息区域进行预处理,其中,预处理可以包括:去噪、滤波、提取边缘等操作;将预处理后的第二信息区域进行二值化。
二值化是指将第二信息区域中的像素点的灰度值与预设灰度阈值比较,将第二信息区域中的像素点分成两部分:大于预设灰度阈值的像素群和小于预设灰度阈值的像素群,将两部分像素群在第二信息区域中分别呈现出黑和白两种不同的颜色,得到二值化后的第二信息区域,如图3B所示。其中,位于前景的一种颜色的像素点称之为前景色像素点,也即图3B中的白色像素点;位于背景的一种颜色像素点称之为背景色像素点,也即图3B中的黑色像素点。
在步骤203b中,对二值化后的第二信息区域按照水平方向计算第一直方图,第一直方图包括:每行像素点的竖坐标和每行像素点中前景色像素点的累加值;
将二值化后的第二信息区域按照水平方向计算第一直方图,该第一直方图在竖直方向表示每行像素点的竖坐标,在水平方向表示每行像素点中前景色像素点的个数累加值,如图3C所示。
在步骤203c中,根据第一直方图中前景色像素点的累加值大于第一阈值的行所组成的连续行集合,识别得到n行文字区域,n为正整数;
根据第一直方图可以获取到每一行像素点中前景色像素点的累加值,将每一行像素点中前景色像素点的累加值与第一阈值进行比较,将第一直方图中前景色像素点的累加值大于第一阈值的行所组成的连续行集合,确定为文字区域所在的行。
连续行集合是指:前景色像素点的累加值大于第一阈值的行是连续的m行,该连续的m行像素点所组成的集合,如图3D所示,对于图中的m行像素点,在位于左侧直方图中的前景色像素点的累加值均大于第一阈值。而该m行像素点在证件图像中对应文字区域“村大东王126号”。
每个连续行集合识别为一行文字区域,n个连续行集合识别为n行文字区域。
在步骤203d中,对于第i行文字区域,按照竖直方向计算第二直方图,第二直方图包括:每列像素点的横坐标和每列像素点中前景色像素点的累加值,n≥i≥1,i为正整数;
确定出n行文字区域后,按照竖直方向计算第二直方图,该第二直方图在水平方向表示每列像素点的横坐标,在竖直方向表示每列像素点中前景色像素 点的个数累加值,如图3E所示。
在步骤203e中,根据第二直方图中前景色像素点的累加值大于第二阈值的列所组成的连续列集合,识别得到ni个字符区域;
根据第二直方图可以获取到每一列像素点中前景色像素点的累加值,将每一列像素点中前景色像素点的累加值与第二阈值进行比较,将第二直方图中前景色像素点的累加值大于第二阈值的列所组成的连续列集合,确定为字符区域所在的列。
连续列集合是指:前景色像素点的累加值大于第二阈值的列是连续的p列,该连续的p列像素点所组成的集合,如图3F所示,连续列集合为p,也即第二直方图中形成的连续白色区域。对于图中的p列像素点,在位于下侧直方图中的前景色像素点的累加值均大于第二阈值。而该p列像素点在证件图像中对应字符区域“浙”。
每个连续列集合识别为一个字符区域,n个连续列集合识别为n个字符区域。
综上所述,本实施例提供的区域提取方法,通过对第二信息区域二值化,并将二值化后的第二信息区域按照水平方向计算第一直方图,确定第二信息区域中n行文字区域,再通过分别对n行文字区域按照竖直方向计算第二直方图,识别出每个文字对应的字符区域。通过先确定文字区域所在的行,再根据文字区域所在的行确定出每一行文字区域中的字符区域,使得对第二信息区域中字符区域的定位更加准确。
在图3A所示的实施例中,通过第一信息区域的区域位置和预定的相对位置关系确定第二信息区域的过程中,可能会出现误差,将非第二信息区域的文字或者噪声划分至第二信息区域的区域范围,因此可以通过行间距丢弃非第二信息区域的文字区域,请参考如下实施例:
在基于图3A所示的可选实施例中,在步骤203c之后还可以包括如下步骤,如图4A所示:
在步骤401中,根据第一直方图中前景色像素点的累加值大于第一阈值的行所组成的连续行集合,识别得到相邻两行文字区域之间的行间距;
需要说明的是,地址信息区域通常包括一至三行文字区域,该一至三行文字区域具有较短的行间距。同时,该一至三行文字区域与其它信息区域中的文 字区域具有较大的行间距。本步骤通过行间距的特征,丢弃非第二信息区域的文字区域。
对于步骤203c中识别得到的n行文字区域,获取相邻两行文字区域之间的行间距。行间距是指两行文字区域在第一直方图中之间的间隔,如图4B所示,一行文字区域和相邻行文字区域之间的行间距为h1。
在步骤402中,在行间距大于第三阈值时,将与第二信息区域的边缘更接近的一行文字区域进行丢弃,边缘为上边缘或下边缘;
根据第一直方图,从下向上对文字区域进行搜索,当搜索到的第一个相邻两行文字区域的行间距大于第三阈值时,丢弃下面一行的文字区域,继续向上搜索,当再次搜索到相邻两行文字区域的行间距大于第三阈值时,结束搜索,并丢弃上面一行的文字区域。同时,确定剩余的文字区域属于第二信息区域。
当搜索到的第一个相邻两行文字区域的行间距小于第三阈值时,确定该两行文字区域均属于第二信息区域。继续向上搜索,直至搜索到相邻两行文字区域的行间距大于第三阈值时,丢弃上面一行的文字区域,并结束搜索。或者,继续向上搜索,搜索不到相邻两行文字区域的行间距大于第三阈值时,结束搜索。
可选的,根据第一直方图,从上向下对文字区域进行搜索,当搜索到的第一个相邻两行文字区域的行间距大于第三阈值时,丢弃上面一行的文字区域,继续向上搜索,当再次搜索到相邻两行文字区域的行间距大于第三阈值时,结束搜索,并丢弃下面一行的文字区域。同时,确定剩余的文字区域属于第二信息区域。
当搜索到的第一个相邻两行文字区域的行间距小于第三阈值时,确定该两行文字区域均属于第二信息区域,继续向下搜索,直至搜索到相邻两行文字区域的行间距大于第三阈值时,丢弃下面一行的文字区域,并结束搜索。或者,继续向下搜索,搜索不到相邻两行文字区域的行间距大于第三阈值时,结束搜索。
综上所述,本实施例提供的区域提取方法,通过根据第一直方图中前景色像素点的累加值大于第一阈值的行所组成的连续行集合,识别得到相邻两行文字区域之间的行间距;在行间距大于第三阈值时,将与第二信息区域的边缘更接近的一行文字区域进行丢弃,边缘为上边缘或下边缘;根据行间距的大小确 定出第二信息区域中的行文字区域,使得对第二信息区域的定位更加准确。
在图3A所示的实施例中,对第二信息区域左边缘和右边缘的确定过程中,可能会出现误差,将非第二信息区域的字符区域划分至第二信息区域的范围,因此可以通过字间距丢弃非第二信息区域的字符区域,请参考如下实施例:
在基于图4A所示的可选实施例中,在步骤203e之后还可以包括如下步骤,如图5A所示:
在步骤501中,根据第二直方图中前景色像素点的累加值大于第二阈值的列所组成的连续列集合,识别得到相邻两个字符区域之间的字间距;
根据步骤203e中识别得到的ni个字符区域,获取相邻两个字符区域之间的字间距,每行文字区域中相邻两个字符区域之间的字间距较小。
字间距是指两个字符区域在第二直方图中之间的间隔,如图5B所示,字与字之间的字间距为h2。
在步骤502中,在相邻两个字符区域位于第二信息区域的左侧且字间距大于第四阈值时,将相邻两个字符区域中位于右侧的字符区域识别为当前行文字区域中的第一个字符区域;
从当前文字区域的中间某个字符为起始位置,从起始位置向左开始搜索,当搜索到第一个相邻两个字符区域之间的字间距大于第四阈值时,丢弃相邻两个字符区域中位于左侧的字符区域(位于该字间距左侧的全部字符区域),将相邻两个字符区域中位于右侧的字符区域识别为当前行文字区域中的第一个字符区域。识别第一个字符区域后,再根据第一个字符区域的位置向右开始搜索,直至搜索到相邻两个字符区域之间的字间距大于第四阈值时,结束搜索。
当搜索到第一个相邻两个字符区域之间的字间距小于第四阈值时,将相邻两个字符区域都确定为属于当前行文字区域中的字符区域。
在步骤503中,在相邻两个字符区域位于第二信息区域的右侧且字间距大于第五阈值时,将相邻两个字符区域中位于左侧的字符区域识别为当前行文字区域中的最后一个字符区域。
从当前文字区域的中间某个字符为起始位置,从起始位置向右开始搜索,当搜索到第一个相邻两个字符区域之间的字间距大于第五阈值时,丢弃相邻两个字符区域中位于右侧的字符区域(位于该字间距右侧的全部字符区域),将相 邻两个字符区域中位于左侧的字符区域识别为当前行文字区域中的最后一个字符区域。识别最后一个字符区域后,再根据最后一个字符区域的位置向左开始搜索,直至搜索到相邻两个字符区域之间的字间距大于第五阈值时,结束搜索。
当搜索到第一个相邻两个字符区域之间的字间距小于第五阈值时,将相邻两个字符区域都确定为属于当前行文字区域中的字符区域。
综上所述,本实施例提供的区域提取方法,通过根据第二直方图中前景色像素点的累加值大于第二阈值的列所组成的连续列集合,识别得到相邻两个字符区域之间的字间距;在相邻两个字符区域位于第二信息区域的左侧且字间距大于第四阈值时,将相邻两个字符区域中位于右侧的字符区域识别为当前行文字区域中的第一个字符区域;在相邻两个字符区域位于第二信息区域的右侧且字间距大于第五阈值时,将相邻两个字符区域中位于左侧的字符区域识别为当前行文字区域中的最后一个字符区域;根据字间距的大小确定出第二信息区域中的字符区域,使得准确定位出第二信息区域中的每个字符区域。
需要说明的一点是:图1所示实施例、图2A所示实施例、图2B所示实施例、图3A所示实施例、图4A所示实施例和图5A所示实施例中对字符区域识别后,可以根据现有的字符识别算法对字符区域进一步处理,识别出字符区域中的字符。
需要说明的另一点是:上述方法实施例中涉及到的身份证图像是本公开中示意性的举例说明,并不是真实的身份证图像。
下述为本公开装置实施例,可以用于执行本公开方法实施例。对于本公开装置实施例中未披露的细节,请参照本公开方法实施例。
图6是根据一示例性实施例示出的一种区域提取装置的框图,如图6所示,该区域提取装置包括但不限于:
获取模块610,被配置为获取证件图像中第一信息区域的区域位置;
证件图像是对证件直接拍摄得到的图像,比如:身份证图像、社会保障卡图像等。
第一信息区域是指证件图像中携带有文字信息的区域,比如:姓名信息区域、出生日期信息区域、性别区域、地址信息区域、公民身份号码信息区域、 编号信息区域、颁发证件机关信息区域、有效日期信息区域等等信息区域中的至少一种。
确定模块620,被配置为根据第一信息区域的区域位置确定出第二信息区域;
识别模块630,被配置为对第二信息区域进行区域切割,得到至少一个字符区域。
综上所述,本公开实施例中提供的区域提取装置,通过获取证件图像中第一信息区域的区域位置;根据第一信息区域的区域位置确定出第二信息区域;对第二信息区域进行区域切割,得到至少一个字符区域;解决了相关技术中对于直接拍摄得到的证件图像中的某些信息区域的识别难度大和对某些信息区域的定位不准确的问题;达到了通过证件图像中第一信息区域的区域位置确定第二信息区域,并对第二信息区域进行切割,从而对第二信息区域准确定位和对第二信息区域中的字符区域进行准确识别的效果。
图7是根据另一示例性实施例示出的一种区域提取装置的框图,如图7所示,该区域提取装置包括但不限于:
获取模块610,被配置为获取证件图像中第一信息区域的区域位置;
证件图像是对证件直接拍摄得到的图像,比如:身份证图像、社会保障卡图像等。
获取模块610在获取证件图像中第一信息区域的区域位置时,根据第一信息区域的区域位置获取该第一区域位置中的各个顶点的顶点坐标。或者说,区域位置采用顶点坐标来表示。
比如:以证件图像的左上角为原点,上边缘为横坐标x的正半轴、左边缘为竖坐标y的正半轴建立直角坐标系,根据第一信息区域的各个顶点在直角坐标系中的位置,获取各个顶点对应的顶点坐标,利用顶点坐标表示该第一信息区域的区域位置。
确定模块620,被配置为根据第一信息区域的区域位置确定出第二信息区域;
确定模块620,还被配置为根据第一信息区域的至少两个顶点坐标和预定的相对位置关系,确定出第二信息区域,相对位置关系是顶点坐标与第二信息区 域之间的相对位置关系。
预定的相对位置关系是指第一信息区域的顶点坐标与第二信息区域上边缘、下边缘、左边缘和右边缘之间的相对位置。
确定模块620根据第一信息区域中获取的至少两个顶点坐标和预定的相对位置关系即可确定出第二信息区域的区域位置。
本实施例中,确定模块620可以包括如下几个子模块:
第一确定子模块621,被配置为根据两个顶点坐标中与地址信息区域最接近的一个顶点坐标的竖坐标,确定出地址信息区域的下边缘;
由公民身份号码区域和地址信息区域之间预定的相对位置关系可知,地址信息区域在公民身份号码区域的上方。因此,根据直角坐标系的建立方式可知,两个顶点坐标中顶点越高,竖坐标越小,也越接近地址信息区域,故第一确定子模块621将获取到的两个顶点坐标中更高的顶点的竖坐标所在的横直线作为地址信息区域的下边缘
第二确定子模块622,被配置为根据最接近的一个顶点坐标的竖坐标和预定高度,确定出地址信息区域的上边缘;
在第一确定子模块621确定出与地址信息区域最接近的一个顶点坐标的竖坐标后,第二确定子模块622以该顶点坐标的竖坐标为起始位置,向上平移预定高度的距离,将经过预定高度平移后的竖坐标所在的横直线作为地址信息区域的上边缘。
第三确定子模块623,被配置为根据两个顶点坐标中任意一个顶点坐标的横坐标和第一预定宽度,确定出地址信息区域的左边缘;
第三确定子模块623将两个顶点坐标中任意一个顶点坐标的横坐标向左平移第一预定宽度,将该顶点坐标平移后的横坐标所在的竖直线作为地址信息区域的左边缘。
第四确定子模块624,被配置为根据两个顶点坐标中任意一个顶点坐标的横坐标和第二预定宽度,确定出地址信息区域的右边缘;
第四确定子模块624将两个顶点坐标中任意一个顶点坐标的横坐标平移第二预定宽度,将该顶点坐标平移后的横坐标所在的竖直线作为地址信息区域的右边缘。
裁剪子模块625,被配置为根据下边缘、上边缘、左边缘和右边缘裁剪出地 址信息区域。
根据步骤第一确定子模块621至第四确定子模块624中确定的地址信息区域的下边缘、上边缘、左边缘和右边缘裁剪子模块625裁剪出地址信息区域。
识别模块630,被配置为对第二信息区域进行区域切割,得到至少一个字符区域。
在裁剪子模块625确定出第二信息区域的区域位置后,识别模块630对第二信息区域进行区域切割。经过区域切割后,第二信息区域被切割为至少一个字符区域。字符区域是包括单个字符的图像区域。
在基于图7所示的可选实施例中,识别模块630可以包括如下几个子模块,如图8所示:
二值化子模块631,被配置为对第二信息区域进行二值化,得到二值化后的第二信息区域;
可选地,根据裁剪子模块625确定出的第二信息区域,对该第二信息区域进行预处理,其中,预处理可以包括:去噪、滤波、提取边缘等操作;将预处理后的第二信息区域进行二值化。
二值化是指将第二信息区域中的像素点的灰度值与预设灰度阈值比较,将第二信息区域中的像素点分成两部分:大于预设灰度阈值的像素群和小于预设灰度阈值的像素群,将两部分像素群在第二信息区域中分别呈现出黑和白两种不同的颜色,得到二值化后的第二信息区域。
第一计算子模块632,被配置为对二值化后的第二信息区域按照水平方向计算第一直方图,第一直方图包括:每行像素点的竖坐标和每行像素点中前景色像素点的累加值;
第一计算子模块632将二值化子模块631处理后的第二信息区域按照水平方向计算第一直方图,该第一直方图在竖直方向表示每行像素点的竖坐标,在水平方向表示每行像素点中前景色像素点的个数累加值。
行识别子模块633,被配置为根据第一直方图中前景色像素点的累加值大于第一阈值的行所组成的连续行集合,识别得到n行文字区域,n为正整数;
根据第一直方图可以获取到每一行像素点中前景色像素点的累加值,行识别子模块633将每一行像素点中前景色像素点的累加值与第一阈值进行比较, 将第一直方图中前景色像素点的累加值大于第一阈值的行所组成的连续行集合,确定为文字区域所在的行。
连续行集合是指:前景色像素点的累加值大于第一阈值的行是连续的m行,该连续的m行像素点所组成的集合。
每个连续行集合识别为一行文字区域,n个连续行集合识别为n行文字区域。
第二计算子模块634,被配置为对于第i行文字区域,按照竖直方向计算第二直方图,第二直方图包括:每列像素点的横坐标和每列像素点中前景色像素点的累加值,n≥i≥1,i为正整数;
在行识别子模块633确定出n行文字区域后,第二计算子模块634按照竖直方向计算第二直方图,该第二直方图在水平方向表示每列像素点的横坐标,在竖直方向表示每列像素点中前景色像素点的个数累加值。
字符识别子模块635,被配置为根据第二直方图中前景色像素点的累加值大于第二阈值的列所组成的连续列集合,识别得到ni个字符区域。
根据第二直方图可以获取到每一列像素点中前景色像素点的累加值,字符识别子模块635将每一列像素点中前景色像素点的累加值与第二阈值进行比较,将第二直方图中前景色像素点的累加值大于第二阈值的列所组成的连续列集合,确定为字符区域所在的列。
连续列集合是指:前景色像素点的累加值大于第二阈值的列是连续的p列,该连续的p列像素点所组成的集合。
每个连续列集合识别为一个字符区域,n个连续列集合识别为n个字符区域。
综上所述,本实施例提供的区域提取装置,通过对第二信息区域二值化,并将二值化后的第二信息区域按照水平方向计算第一直方图,确定第二信息区域中n行文字区域,再通过分别对n行文字区域按照竖直方向计算第二直方图,识别出每个文字对应的字符区域。通过先确定文字区域所在的行,再根据文字区域所在的行确定出每一行文字区域中的字符区域,使得对第二信息区域中字符区域的定位更加准确。
在基于图8所示的可选实施例中,该装置还可以包括如下模块,如图9所示:
行间距识别模块910,被配置为根据第一直方图中前景色像素点的累加值大 于第一阈值的行所组成的连续行集合,识别得到相邻两行文字区域之间的行间距;
对于行识别子模块633中识别得到的n行文字区域,行间距识别模块910获取相邻两行文字区域之间的行间距。行间距是指两行文字区域在第一直方图中之间的间隔。
丢弃模块920,被配置为在行间距大于第三阈值时,将与第二信息区域的边缘更接近的一行文字区域进行丢弃,边缘为上边缘或下边缘。
根据第一直方图,从下向上对文字区域进行搜索,当搜索到的第一个相邻两行文字区域的行间距大于第三阈值时,丢弃模块920丢弃下面一行的文字区域,继续向上搜索,当再次搜索到相邻两行文字区域的行间距大于第三阈值时,结束搜索,并丢弃上面一行的文字区域。同时,确定剩余的文字区域属于第二信息区域。
字间距识别模块930,被配置为根据第二直方图中前景色像素点的累加值大于第二阈值的列所组成的连续列集合,识别得到相邻两个字符区域之间的字间距;
根据字符识别子模块635中识别得到的ni个字符区域,字间距识别模块930获取相邻两个字符区域之间的字间距,每行文字区域中相邻两个字符区域之间的字间距较小。
字间距是指两个字符区域在第二直方图中之间的间隔。
文字识别模块940,被配置为在相邻两个字符区域位于第二信息区域的左侧且字间距大于第四阈值时,将相邻两个字符区域中位于右侧的字符区域识别为当前行文字区域中的第一个字符区域;
单字符识别模块950,被配置为在相邻两个字符区域位于第二信息区域的右侧且字间距大于第五阈值时,将相邻两个字符区域中位于左侧的字符区域识别为当前行文字区域中的最后一个字符区域。
综上所述,本实施例提供的区域提取装置,通过根据第二直方图中前景色像素点的累加值大于第二阈值的列所组成的连续列集合,识别得到相邻两个字符区域之间的字间距;在相邻两个字符区域位于第二信息区域的左侧且字间距大于第四阈值时,将相邻两个字符区域中位于右侧的字符区域识别为当前行文字区域中的第一个字符区域;在相邻两个字符区域位于第二信息区域的右侧且 字间距大于第五阈值时,将相邻两个字符区域中位于左侧的字符区域识别为当前行文字区域中的最后一个字符区域;根据字间距的大小确定出第二信息区域中的字符区域,使得准确定位出第二信息区域中的每个字符区域。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
本公开一示例性实施例提供了一种区域提取装置,能够实现本公开提供的区域提取方法,该区域提取装置包括:处理器、用于存储处理器可执行指令的存储器;
其中,处理器被配置为:
获取证件图像中第一信息区域的区域位置;
根据第一信息区域的区域位置确定出第二信息区域;
对第二信息区域进行区域切割,得到至少一个字符区域。
图10是根据一示例性实施例示出的一种用区域提取方法的装置的框图。例如,装置1000可以是移动电话,计算机,数字广播终端,消息收发设备,游戏控制台,平板设备,医疗设备,健身设备,个人数字助理等。
参照图10,装置1000可以包括以下一个或多个组件:处理组件1002,存储器1004,电源组件1006,多媒体组件1008,音频组件1010,输入/输出(I/O)接口1012,传感器组件1014,以及通信组件1016。
处理组件1002通常控制装置1000的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理组件1002可以包括一个或多个处理器1018来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件1002可以包括一个或多个模块,便于处理组件1002和其他组件之间的交互。例如,处理组件1002可以包括多媒体模块,以方便多媒体组件1008和处理组件1002之间的交互。
存储器1004被配置为存储各种类型的数据以支持在装置1000的操作。这些数据的示例包括用于在装置1000上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器1004可以由任何类型的易 失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
电源组件1006为装置1000的各种组件提供电力。电源组件1006可以包括电源管理系统,一个或多个电源,及其他与为装置1000生成、管理和分配电力相关联的组件。
多媒体组件1008包括在装置1000和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件1008包括一个前置摄像头和/或后置摄像头。当装置1000处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。
音频组件1010被配置为输出和/或输入音频信号。例如,音频组件1010包括一个麦克风(MIC),当装置1000处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器1004或经由通信组件1016发送。在一些实施例中,音频组件1010还包括一个扬声器,用于输出音频信号。
I/O接口1012为处理组件1002和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。
传感器组件1014包括一个或多个传感器,用于为装置1000提供各个方面的状态评估。例如,传感器组件1014可以检测到装置1000的打开/关闭状态,组件的相对定位,例如组件为装置1000的显示器和小键盘,传感器组件1014还可以检测装置1000或装置1000一个组件的位置改变,用户与装置1000接触的存在或不存在,装置1000方位或加速/减速和装置1000的温度变化。传感器组件1014可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近 物体的存在。传感器组件1014还可以包括光传感器,如CMOS或CCD图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件1014还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。
通信组件1016被配置为便于装置1000和其他设备之间有线或无线方式的通信。装置1000可以接入基于通信标准的无线网络,如Wi-Fi,2G或3G,或它们的组合。在一个示例性实施例中,通信组件1016经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,通信组件1016还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IRDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。
在示例性实施例中,装置1000可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述区域提取方法。
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器1004,上述指令可由装置1000的处理器1018执行以完成上述区域提取方法。例如,非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其它实施方案。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。

Claims (13)

  1. 一种区域提取方法,其特征在于,所述方法包括:
    获取证件图像中第一信息区域的区域位置;
    根据所述第一信息区域的区域位置确定出第二信息区域;
    对所述第二信息区域进行区域切割,得到至少一个字符区域。
  2. 根据权利要求1所述的方法,其特征在于,所述区域位置采用顶点坐标表示;
    所述根据所述第一信息区域的区域位置确定第二信息区域,包括:
    根据所述第一信息区域的至少两个所述顶点坐标和预定的相对位置关系,确定出所述第二信息区域,所述相对位置关系是所述顶点坐标与所述第二信息区域之间的相对位置关系。
  3. 根据权利要求2所述的方法,其特征在于,所述第一信息区域是第二代身份证中的公民身份号码区域,所述至少两个所述顶点坐标是所述公民身份号码区域的两个顶点坐标,所述第二信息区域是所述第二代身份证中的地址信息区域;
    所述根据所述第一信息区域的至少两个所述顶点坐标和预定的相对位置关系,确定出所述第二信息区域,包括:
    根据两个所述顶点坐标中与所述地址信息区域最接近的一个顶点坐标的竖坐标,确定出所述地址信息区域的下边缘;
    根据所述最接近的一个顶点坐标的所述竖坐标和预定高度,确定出所述地址信息区域的上边缘;
    根据所述两个顶点坐标中任意一个顶点坐标的横坐标和第一预定宽度,确定出所述地址信息区域的左边缘;
    根据所述两个顶点坐标中任意一个顶点坐标的横坐标和第二预定宽度,确定出所述地址信息区域的右边缘;
    根据所述下边缘、所述上边缘、所述左边缘和所述右边缘裁剪出所述地址信息区域。
  4. 根据权利要求1至3任一所述的方法,其特征在于,所述对所述第二信息区域进行区域切割,得到至少一个字符区域,包括:
    对所述第二信息区域进行二值化,得到二值化后的第二信息区域;
    对所述二值化后的第二信息区域按照水平方向计算第一直方图,所述第一直方图包括:每行像素点的竖坐标和所述每行像素点中前景色像素点的累加值;
    根据所述第一直方图中前景色像素点的累加值大于第一阈值的行所组成的连续行集合,识别得到n行文字区域,n为正整数;
    对于第i行文字区域,按照竖直方向计算第二直方图,所述第二直方图包括:每列像素点的横坐标和所述每列像素点中前景色像素点的累加值,n≥i≥1,i为正整数;
    根据所述第二直方图中前景色像素点的累加值大于第二阈值的列所组成的连续列集合,识别得到ni个字符区域。
  5. 根据权利要求4所述的方法,其特征在于,所述方法还包括:
    根据所述第一直方图中前景色像素点的累加值大于所述第一阈值的行所组成的连续行集合,识别得到相邻两行所述文字区域之间的行间距;
    在所述行间距大于第三阈值时,将与所述第二信息区域的边缘更接近的一行所述文字区域进行丢弃,所述边缘为上边缘或下边缘。
  6. 根据权利要求4所述的方法,其特征在于,所述方法还包括:
    根据所述第二直方图中前景色像素点的累加值大于所述第二阈值的列所组成的连续列集合,识别得到相邻两个所述字符区域之间的字间距;
    在相邻两个所述字符区域位于所述第二信息区域的左侧且所述字间距大于第四阈值时,将相邻两个所述字符区域中位于右侧的所述字符区域识别为当前行文字区域中的第一个所述字符区域;
    在相邻两个所述字符区域位于所述第二信息区域的右侧且所述字间距大于第五阈值时,将相邻两个所述字符区域中位于左侧的所述字符区域识别为当前行文字区域中的最后一个所述字符区域。
  7. 一种区域提取装置,其特征在于,所述装置包括:
    获取模块,被配置为获取证件图像中第一信息区域的区域位置;
    确定模块,被配置为根据所述第一信息区域的区域位置确定出第二信息区域;
    识别模块,被配置为对所述第二信息区域进行区域切割,得到至少一个字符区域。
  8. 根据权利要求7所述的装置,其特征在于,所述区域位置采用顶点坐标表示;
    所述确定模块,被配置为根据所述第一信息区域的至少两个所述顶点坐标和预定的相对位置关系,确定出所述第二信息区域,所述相对位置关系是所述顶点坐标与所述第二信息区域之间的相对位置关系。
  9. 根据权利要求8所述的装置,其特征在于,所述第一信息区域是第二代身份证中的公民身份号码区域,所述至少两个所述顶点坐标是所述公民身份号码区域的两个顶点坐标,所述第二信息区域是所述第二代身份证中的地址信息区域;
    所述确定模块,包括:
    第一确定子模块,被配置为根据两个所述顶点坐标中与所述地址信息区域最接近的一个顶点坐标的竖坐标,确定出所述地址信息区域的下边缘;
    第二确定子模块,被配置为根据所述最接近的一个顶点坐标的所述竖坐标和预定高度,确定出所述地址信息区域的上边缘;
    第三确定子模块,被配置为根据所述两个顶点坐标中任意一个顶点坐标的横坐标和第一预定宽度,确定出所述地址信息区域的左边缘;
    第四确定子模块,被配置为根据所述两个顶点坐标中任意一个顶点坐标的横坐标和第二预定宽度,确定出所述地址信息区域的右边缘;
    裁剪子模块,被配置为根据所述下边缘、所述上边缘、所述左边缘和所述右边缘裁剪出所述地址信息区域。
  10. 根据权利要求7至9任一所述的装置,其特征在于,所述识别模块, 包括:
    二值化子模块,被配置为对所述第二信息区域进行二值化,得到二值化后的第二信息区域;
    第一计算子模块,被配置为对所述二值化后的第二信息区域按照水平方向计算第一直方图,所述第一直方图包括:每行像素点的竖坐标和所述每行像素点中前景色像素点的累加值;
    行识别子模块,被配置为根据所述第一直方图中前景色像素点的累加值大于第一阈值的行所组成的连续行集合,识别得到n行文字区域,n为正整数;
    第二计算子模块,被配置为对于第i行文字区域,按照竖直方向计算第二直方图,所述第二直方图包括:每列像素点的横坐标和所述每列像素点中前景色像素点的累加值,n≥i≥1,i为正整数;
    字符识别子模块,被配置为根据所述第二直方图中前景色像素点的累加值大于第二阈值的列所组成的连续列集合,识别得到ni个字符区域。
  11. 根据权利要求10所述的装置,其特征在于,所述装置还包括:
    行间距识别模块,被配置为根据所述第一直方图中前景色像素点的累加值大于所述第一阈值的行所组成的连续行集合,识别得到相邻两行所述文字区域之间的行间距;
    丢弃模块,被配置为在所述行间距大于第三阈值时,将与所述第二信息区域的边缘更接近的一行所述文字区域进行丢弃,所述边缘为上边缘或下边缘。
  12. 根据权利要求10所述的装置,其特征在于,所述装置还包括:
    字间距识别模块,被配置为根据所述第二直方图中前景色像素点的累加值大于所述第二阈值的列所组成的连续列集合,识别得到相邻两个所述字符区域之间的字间距;
    文字识别模块,被配置为在相邻两个所述字符区域位于所述第二信息区域的左侧且所述字间距大于第四阈值时,将相邻两个所述字符区域中位于右侧的所述字符区域识别为当前行文字区域中的第一个所述字符区域;
    单字符识别模块,被配置为在相邻两个所述字符区域位于所述第二信息区域的右侧且所述字间距大于第五阈值时,将相邻两个所述字符区域中位于左侧 的所述字符区域识别为当前行文字区域中的最后一个所述字符区域。
  13. 一种区域提取装置,其特征在于,所述装置包括:
    处理器;
    用于存储所述处理器可执行指令的存储器;
    其中,所述处理器被配置为:
    获取证件图像中第一信息区域的区域位置;
    根据所述第一信息区域的区域位置确定出第二信息区域;
    对所述第二信息区域进行区域切割,得到至少一个字符区域。
PCT/CN2015/099298 2015-10-30 2015-12-29 区域提取方法及装置 WO2017071062A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
MX2016003769A MX364147B (es) 2015-10-30 2015-12-29 Método y dispositivo para extracción de región.
JP2017547045A JP6396605B2 (ja) 2015-10-30 2015-12-29 領域抽出方法及び装置
KR1020167005538A KR101760109B1 (ko) 2015-10-30 2015-12-29 영역 추출 방법 및 장치
RU2016110818A RU2642404C2 (ru) 2015-10-30 2015-12-29 Способ и устройство для извлечения области изображения

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510726272.4 2015-10-30
CN201510726272.4A CN105426818B (zh) 2015-10-30 2015-10-30 区域提取方法及装置

Publications (1)

Publication Number Publication Date
WO2017071062A1 true WO2017071062A1 (zh) 2017-05-04

Family

ID=55505018

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/099298 WO2017071062A1 (zh) 2015-10-30 2015-12-29 区域提取方法及装置

Country Status (8)

Country Link
US (1) US10127471B2 (zh)
EP (1) EP3163504B1 (zh)
JP (1) JP6396605B2 (zh)
KR (1) KR101760109B1 (zh)
CN (1) CN105426818B (zh)
MX (1) MX364147B (zh)
RU (1) RU2642404C2 (zh)
WO (1) WO2017071062A1 (zh)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107229932B (zh) * 2016-03-25 2021-05-28 阿里巴巴集团控股有限公司 一种图像文本的识别方法和装置
CN106547912A (zh) * 2016-11-25 2017-03-29 西安理工大学 身份证数据库中非二代身份证照片的识别和剔除方法
CN108388872B (zh) * 2018-02-28 2021-10-22 北京奇艺世纪科技有限公司 一种基于字体颜色的新闻标题识别方法及装置
CN108764240A (zh) * 2018-03-28 2018-11-06 中科博宏(北京)科技有限公司 基于字符相对大小的计算机视觉身份证字符分割识别技术
KR102063036B1 (ko) * 2018-04-19 2020-01-07 한밭대학교 산학협력단 딥러닝과 문자인식으로 구현한 시각주의 모델 기반의 문서 종류 자동 분류 장치 및 방법
CN109977959B (zh) * 2019-03-29 2021-07-06 国家电网有限公司 一种火车票字符区域分割方法及装置
US11501548B2 (en) * 2019-04-02 2022-11-15 Edgeverve Systems Limited Method and system for determining one or more target objects in an image
CN110321895A (zh) * 2019-04-30 2019-10-11 北京市商汤科技开发有限公司 证件识别方法和装置、电子设备、计算机可读存储介质
CN110378340A (zh) * 2019-07-23 2019-10-25 上海秒针网络科技有限公司 地址合规识别方法、装置、存储介质及电子装置
WO2021145466A1 (ko) * 2020-01-13 2021-07-22 엘지전자 주식회사 객체의 정보를 확인하는 이동 단말기 및 그 제어 방법
CN111539269A (zh) * 2020-04-07 2020-08-14 北京达佳互联信息技术有限公司 文本区域的识别方法、装置、电子设备和存储介质
CN111582085B (zh) * 2020-04-26 2023-10-10 中国工商银行股份有限公司 单据拍摄图像识别方法及装置
CN111639648B (zh) * 2020-05-26 2023-09-19 浙江大华技术股份有限公司 证件识别方法、装置、计算设备和存储介质
CN111898601A (zh) * 2020-07-14 2020-11-06 浙江大华技术股份有限公司 一种身份证要素提取方法及装置
CN112633193A (zh) * 2020-12-28 2021-04-09 深圳壹账通智能科技有限公司 地址信息的提取方法、装置、设备及介质
WO2022173415A1 (en) * 2021-02-09 2022-08-18 Hewlett-Packard Development Company, L.P. Edge identification of documents within captured image
CN113592877B (zh) * 2021-03-25 2024-04-12 国网新源控股有限公司 一种抽水蓄能电站红线超标识别方法及装置
CN115082919B (zh) * 2022-07-22 2022-11-29 平安银行股份有限公司 一种地址识别方法、电子设备及存储介质
CN115862041B (zh) * 2023-02-13 2023-05-09 武汉天恒信息技术有限公司 一种基于神经网络的不动产证书识别方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101561876A (zh) * 2009-06-05 2009-10-21 四川泸州航天金穗高技术有限公司 一种身份证信息采集与识别方法及系统
CN102222241A (zh) * 2010-04-19 2011-10-19 日本电产三协株式会社 字符串识别装置及字符串识别方法
US20140219561A1 (en) * 2013-02-06 2014-08-07 Nidec Sankyo Corporation Character segmentation device and character segmentation method
CN104408450A (zh) * 2014-11-21 2015-03-11 深圳天源迪科信息技术股份有限公司 身份证识别方法、装置及系统
CN104573616A (zh) * 2013-10-29 2015-04-29 腾讯科技(深圳)有限公司 一种信息识别方法、相关装置及系统

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3795238B2 (ja) * 1998-10-01 2006-07-12 シャープ株式会社 文書画像処理装置及び文書画像処理方法
RU2329535C2 (ru) * 2006-05-24 2008-07-20 Самсунг Электроникс Ко., Лтд. Способ автоматического кадрирования фотографий
JP2010231541A (ja) * 2009-03-27 2010-10-14 Oki Electric Ind Co Ltd 情報処理装置、文字認識方法、およびプログラム
CN102955941A (zh) * 2011-08-31 2013-03-06 汉王科技股份有限公司 身份信息录入方法和装置
KR101295000B1 (ko) * 2013-01-22 2013-08-09 주식회사 케이지모빌리언스 카드 번호의 영역 특성을 이용하는 신용 카드의 번호 인식 시스템 및 신용 카드의 번호 인식 방법
JP6188052B2 (ja) * 2013-02-26 2017-08-30 Kddi株式会社 情報システム及びサーバー
CN103488984B (zh) * 2013-10-11 2017-04-12 瑞典爱立信有限公司 基于智能移动设备的二代身份证识别方法及装置
KR20150047060A (ko) 2013-10-23 2015-05-04 주식회사 디오텍 명함 이미지 여부를 판별하는 장치 및 방법

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101561876A (zh) * 2009-06-05 2009-10-21 四川泸州航天金穗高技术有限公司 一种身份证信息采集与识别方法及系统
CN102222241A (zh) * 2010-04-19 2011-10-19 日本电产三协株式会社 字符串识别装置及字符串识别方法
US20140219561A1 (en) * 2013-02-06 2014-08-07 Nidec Sankyo Corporation Character segmentation device and character segmentation method
CN104573616A (zh) * 2013-10-29 2015-04-29 腾讯科技(深圳)有限公司 一种信息识别方法、相关装置及系统
CN104408450A (zh) * 2014-11-21 2015-03-11 深圳天源迪科信息技术股份有限公司 身份证识别方法、装置及系统

Also Published As

Publication number Publication date
US20170124718A1 (en) 2017-05-04
US10127471B2 (en) 2018-11-13
EP3163504A1 (en) 2017-05-03
EP3163504B1 (en) 2019-01-02
CN105426818A (zh) 2016-03-23
MX364147B (es) 2019-04-12
JP2018500704A (ja) 2018-01-11
CN105426818B (zh) 2019-07-02
RU2642404C2 (ru) 2018-01-24
JP6396605B2 (ja) 2018-09-26
KR20170061630A (ko) 2017-06-05
KR101760109B1 (ko) 2017-07-31
RU2016110818A (ru) 2017-10-02
MX2016003769A (es) 2017-05-30

Similar Documents

Publication Publication Date Title
WO2017071062A1 (zh) 区域提取方法及装置
JP6401873B2 (ja) 領域認識方法及び装置
JP6392468B2 (ja) 領域認識方法及び装置
JP6400226B2 (ja) 領域認識方法及び装置
JP6392467B2 (ja) 領域識別方法及び装置
WO2017071064A1 (zh) 区域提取方法、模型训练方法及装置
US20170032219A1 (en) Methods and devices for picture processing
WO2017031901A1 (zh) 人脸识别方法、装置及终端
KR101734860B1 (ko) 화상들을 분류하기 위한 방법 및 디바이스
US20220292293A1 (en) Character recognition method and apparatus, electronic device, and storage medium
CN108010009B (zh) 一种去除干扰图像的方法及装置
CN112954110A (zh) 图像处理方法、装置及存储介质

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2017547045

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20167005538

Country of ref document: KR

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: MX/A/2016/003769

Country of ref document: MX

ENP Entry into the national phase

Ref document number: 2016110818

Country of ref document: RU

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15907123

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15907123

Country of ref document: EP

Kind code of ref document: A1