WO2017071063A1 - 区域识别方法及装置 - Google Patents

区域识别方法及装置 Download PDF

Info

Publication number
WO2017071063A1
WO2017071063A1 PCT/CN2015/099299 CN2015099299W WO2017071063A1 WO 2017071063 A1 WO2017071063 A1 WO 2017071063A1 CN 2015099299 W CN2015099299 W CN 2015099299W WO 2017071063 A1 WO2017071063 A1 WO 2017071063A1
Authority
WO
WIPO (PCT)
Prior art keywords
abscissa
histogram
accumulated value
threshold
text
Prior art date
Application number
PCT/CN2015/099299
Other languages
English (en)
French (fr)
Inventor
龙飞
张涛
陈志军
Original Assignee
小米科技有限责任公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 小米科技有限责任公司 filed Critical 小米科技有限责任公司
Priority to RU2016110434A priority Critical patent/RU2639668C2/ru
Priority to JP2017547046A priority patent/JP6392468B2/ja
Priority to MX2016003679A priority patent/MX2016003679A/es
Priority to KR1020167005567A priority patent/KR101805090B1/ko
Publication of WO2017071063A1 publication Critical patent/WO2017071063A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/40Image enhancement or restoration using histogram techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • G06V10/507Summing image-intensity values; Histogram projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/758Involving statistics of pixels or of feature values, e.g. histogram matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/146Aligning or centring of the image pick-up or image-field
    • G06V30/1475Inclination or skew detection or correction of characters or of image to be recognised
    • G06V30/1478Inclination or skew detection or correction of characters or of image to be recognised of characters or characters lines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/28Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet
    • G06V30/287Character recognition specially adapted to the type of the alphabet, e.g. Latin alphabet of Kanji, Hiragana or Katakana characters

Definitions

  • the present disclosure relates to the field of image processing, and in particular, to an area recognition method and apparatus.
  • the terminal Before the terminal recognizes the text in the image, the terminal needs to recognize the character region of the character first.
  • An area recognition method provided in the related art includes: removing a background in an image by a terminal, extracting a foreground image; and then identifying an edge of the extracted foreground image by an edge enhancement technique; determining the edge of each character according to the identification The character area of each text.
  • the present disclosure provides a region identification method and device.
  • the technical solution is as follows:
  • a region identification method comprising:
  • the histogram comprising: an abscissa of each column of pixels and an accumulated value of foreground pixels in each column of pixels;
  • the character area of the text in the text area is identified based on the distribution information of the accumulated values in the histogram.
  • the character area of the text in the text area is identified according to the distribution information of the accumulated value in the histogram, including:
  • each set of abscissas including a first abscissa and a first second abscissa located on a right side of the first abscissa; the first abscissa and the first cross
  • the accumulated value corresponding to the adjacent abscissa on the right side of the coordinate is greater than the first threshold, and the accumulated value corresponding to the adjacent abscissa on the left side of the first abscissa is smaller than the second threshold; the second abscissa and the left side of the second abscissa
  • the accumulated value corresponding to the adjacent abscissa is greater than the first threshold, and the accumulated value corresponding to the adjacent abscissa of the right side of the second abscissa is less than the second threshold;
  • the pixel column in which the first abscissa is located is recognized as the left edge of one character region, and the pixel column in which the second abscissa is located is recognized as the right edge of the character region.
  • determining a plurality of sets of abscissas according to the distribution information of the accumulated values in the histogram including:
  • a third abscissa in the histogram wherein the third abscissa is: an abscissa corresponding to a left edge of the character region of the first valid character in the plurality of characters in the histogram, Or, the right edge of the character region of the last valid text of the plurality of characters is in the abscissa corresponding to the histogram;
  • the third abscissa is used as the search starting point, and a plurality of sets of abscissas are searched based on the distribution information of the accumulated values according to the predetermined direction.
  • the third abscissa is the abscissa corresponding to the left edge of the character region of the first valid character of the plurality of characters in the histogram, and the third abscissa is used as the search starting point, and the accumulated value is based on the predetermined direction.
  • the distribution information searches for several sets of abscissas, including:
  • the first abscissa in the i-th group abscissa in the histogram is used as the search starting point, and the first fourth abscissa, the fourth abscissa and the left side of the fourth abscissa are searched to the right.
  • the accumulated value corresponding to the adjacent abscissa is greater than the first threshold, and the accumulated value corresponding to the adjacent abscissa on the right side of the fourth abscissa is smaller than the second threshold; 1 ⁇ i ⁇ n, i is a positive integer with an initial value of 1, n is The number of valid characters in a number of characters; the first abscissa in the first set of coordinates is the third abscissa;
  • the fourth abscissa in the histogram is used as the search starting point, and the first fifth abscissa, the fifth abscissa and the accumulated value corresponding to the adjacent abscissa on the right side of the fifth abscissa are searched to the right side. Greater than the first threshold, the accumulated value corresponding to the adjacent abscissa on the left side of the fifth abscissa is smaller than the second threshold;
  • the third abscissa is the abscissa corresponding to the right edge of the character region of the last valid character of the plurality of characters in the histogram, and the third abscissa is used as the search starting point, according to the predetermined direction base.
  • the distribution information of the accumulated value is searched for several sets of abscissas, including:
  • the second abscissa in the jth group coordinate in the histogram is used as the search starting point, and the first sixth abscissa, the sixth abscissa and the adjacent horizontal side of the sixth abscissa are searched to the left.
  • the accumulated value corresponding to the coordinate is greater than the first threshold, and the accumulated value corresponding to the adjacent abscissa on the left side of the sixth abscissa is smaller than the second threshold; 1 ⁇ j ⁇ n, j is a positive integer with an initial value of n, and n is a plurality of The number of valid characters in the text;
  • the second abscissa in the nth group coordinate is the third abscissa;
  • the six horizontal coordinates in the histogram are used as the search starting point, and the first seventh horizontal coordinate is searched to the left side, and the cumulative value corresponding to the adjacent horizontal coordinate on the left side of the seventh horizontal coordinate and the seventh horizontal coordinate is greater than
  • the first threshold the accumulated value corresponding to the adjacent abscissa on the right side of the seventh abscissa is smaller than the second threshold; 1 ⁇ j ⁇ n, j is a positive integer whose initial value is n;
  • the third abscissa is the abscissa corresponding to the left edge of the character region of the first valid character of the plurality of characters in the histogram, and the third cross in the histogram is identified according to the distribution information of the accumulated value. Coordinates, including:
  • the plurality of characters include valid text and invalid text, and the first distance between the valid text and the invalid text is greater than the second distance between the adjacent two valid characters; starting from the preset abscissa in the histogram, to the left
  • the side queries the first gap whose width is greater than the second distance, and determines the abscissa whose accumulated value of the first foreground pixel located on the right side of the gap is greater than the first threshold as the third abscissa; the preset abscissa belongs to the preset interval.
  • the preset interval is an interval set according to the empirical value; the accumulated value of the foreground color pixel of the gap is smaller than the second threshold;
  • a plurality of characters are valid characters, and the abscissa whose accumulated value of the first foreground pixel on the left side in the histogram is greater than the first threshold is determined as the third abscissa.
  • the third abscissa is the abscissa corresponding to the right edge of the character region of the last valid character in the plurality of characters in the histogram, and the third abscissa in the histogram is identified according to the distribution information of the accumulated value.
  • the plurality of characters include valid text and invalid text, and the first distance between the valid text and the invalid text is greater than the second distance between the adjacent two valid characters, starting from the preset abscissa in the histogram, to the right
  • the side query width is greater than the gap of the second distance, and will be located at the first foreground pixel on the left side of the gap.
  • the abscissa whose accumulated value is greater than the first threshold is determined as the third abscissa;
  • the preset abscissa is the coordinate belonging to the preset interval, and the preset interval is the interval set according to the empirical value;
  • the accumulated value of the foreground color pixel of the gap is smaller than the Two thresholds;
  • a plurality of characters are valid characters, and the abscissa whose accumulated value of the first foreground pixel on the right side of the histogram is greater than the first threshold is determined as the third abscissa.
  • the method further includes:
  • the horizontal histogram comprising: a vertical coordinate of each row of pixels and an accumulated value of foreground pixels in each row of pixels;
  • each set of vertical coordinates including a first vertical coordinate and a second vertical coordinate located at a lower side of the first vertical coordinate; for each set of vertical coordinates, the first The pixel row where the vertical coordinate is located is recognized as the upper edge of the line of text, and the pixel row where the second vertical coordinate is located is recognized as the lower edge of the text region; the first vertical coordinate and the adjacent vertical coordinate of the lower side of the first vertical coordinate correspond to The accumulated value is greater than the first threshold, and the accumulated value corresponding to the adjacent vertical coordinate of the upper side of the first vertical coordinate is smaller than the second threshold; and the accumulated value corresponding to the adjacent vertical coordinate of the second vertical coordinate and the upper side of the second vertical coordinate is greater than the first a threshold value, and the accumulated vertical value corresponding to the adjacent vertical coordinate of the lower side of the second vertical coordinate is smaller than the second threshold;
  • the step of binarizing the text area to obtain the binarized text area is performed, m ⁇ k ⁇ 1, k is a positive integer, and m is the total number of lines identified.
  • an area identifying apparatus comprising:
  • the first binarization module is configured to perform binarization on the text area to obtain a binarized text area, where the text area includes a plurality of characters belonging to the same line;
  • a first calculating module configured to calculate a histogram according to a vertical direction of the binarized text area, the histogram comprising: an abscissa of each column of pixels and an accumulated value of foreground pixels in each column of pixels;
  • the area identification module is configured to recognize the character area of the text in the text area based on the distribution information of the accumulated values in the histogram.
  • the area identification module includes:
  • a coordinate determination submodule configured to determine a plurality of sets of abscissas according to the distribution information of the accumulated values in the histogram, each set of abscissas including a first abscissa and a first second cross on the right side of the first abscissa
  • the first abscissa and the accumulated abscissa corresponding to the adjacent abscissa on the right side of the first abscissa are greater than the first threshold, and the accumulated value corresponding to the adjacent abscissa on the left side of the first abscissa is smaller than the second threshold;
  • the cumulative value corresponding to the abscissa and the adjacent abscissa on the left side of the second abscissa is greater than the first threshold, and the accumulated value corresponding to the adjacent abscissa on the right side of the second abscissa is less than the second threshold;
  • the region identification sub-module is configured to identify, for each set of abscissas, a pixel column in which the first abscissa is located as a left edge of one character region, and a pixel column in which the second abscissa is located as a right edge of the character region.
  • the coordinate determination sub-module includes:
  • the coordinate recognition sub-module is configured to identify a third abscissa in the histogram according to the distribution information of the accumulated value, and the third abscissa is: a left edge of the character region of the first valid character in the plurality of characters in the histogram The coordinates corresponding to the coordinates, or the coordinates of the right edge of the character region of the last valid text of the plurality of characters in the histogram;
  • the coordinate search sub-module is configured to search for a plurality of sets of abscissas based on the distribution information of the accumulated values according to the predetermined direction with the third abscissa as the search starting point.
  • the third abscissa is an abscissa corresponding to a left edge of a character region of the first valid text of the plurality of characters in the histogram;
  • the coordinate search submodule is also configured to:
  • the first abscissa in the i-th group abscissa in the histogram is used as the search starting point, and the first fourth abscissa, the fourth abscissa and the left side of the fourth abscissa are searched to the right.
  • the accumulated value corresponding to the adjacent abscissa is greater than the first threshold, and the accumulated value corresponding to the adjacent abscissa on the right side of the fourth abscissa is smaller than the second threshold; 1 ⁇ i ⁇ n, i is a positive integer with an initial value of 1, n is The number of valid characters in a number of characters; the first abscissa in the first set of coordinates is the third abscissa;
  • the fourth abscissa in the histogram is used as the search starting point, and the first fifth abscissa, the fifth abscissa and the accumulated value corresponding to the adjacent abscissa on the right side of the fifth abscissa are searched to the right side. Greater than the first threshold, the accumulated value corresponding to the adjacent abscissa on the left side of the fifth abscissa is smaller than the second threshold;
  • the third abscissa is an abscissa corresponding to a right edge of a character region of a last valid text of the plurality of characters in the histogram;
  • the coordinate search submodule is also configured to:
  • the second abscissa in the jth group coordinate in the histogram is used as the search starting point, and the first sixth abscissa, the sixth abscissa and the adjacent horizontal side of the sixth abscissa are searched to the left.
  • the accumulated value corresponding to the coordinate is greater than the first threshold, and the accumulated value corresponding to the adjacent abscissa on the left side of the sixth abscissa is smaller than the second threshold; 1 ⁇ j ⁇ n, j is a positive integer with an initial value of n, and n is a plurality of The number of valid characters in the text;
  • the second abscissa in the nth group coordinate is the third abscissa;
  • the six horizontal coordinates in the histogram are used as the search starting point, and the first seventh horizontal coordinate is searched to the left side, and the cumulative value corresponding to the adjacent horizontal coordinate on the left side of the seventh horizontal coordinate and the seventh horizontal coordinate is greater than
  • the first threshold the accumulated value corresponding to the adjacent abscissa on the right side of the seventh abscissa is smaller than the second threshold; 1 ⁇ j ⁇ n, j is a positive integer whose initial value is n;
  • the third abscissa is an abscissa corresponding to a left edge of a character region of the first valid text of the plurality of characters in the histogram;
  • the coordinate recognition submodule is also configured to:
  • the first distance between the valid text and the invalid text is greater than the second distance between the adjacent two valid characters, starting from the preset abscissa in the histogram, Querying the first gap with the width greater than the second distance to the left side, and determining the abscissa of the first foreground pixel located on the right side of the gap greater than the first threshold as the third abscissa;
  • the preset abscissa is Set the coordinates of the interval, the preset interval is an interval set according to the empirical value; the accumulated value of the foreground color pixel of the gap is less than the second threshold;
  • the abscissa whose accumulated value of the first foreground pixel on the left side in the histogram is greater than the first threshold is determined as the third abscissa.
  • the third abscissa is an abscissa corresponding to a right edge of a character region of a last valid text of the plurality of characters in the histogram;
  • the coordinate recognition submodule is also configured to:
  • the first distance between the valid text and the invalid text is greater than the second distance between the adjacent two valid characters, starting from the preset abscissa in the histogram, Query the gap with the width greater than the second distance to the right side, and determine the abscissa whose accumulated value of the first foreground pixel on the left side of the gap is greater than the first threshold as the third abscissa; the preset abscissa belongs to the pre- Set the coordinates of the interval, the preset interval is an interval set according to the empirical value; the accumulated value of the foreground color pixel of the gap is less than the second threshold;
  • the abscissa whose accumulated value of the first foreground pixel on the right side of the histogram is greater than the first threshold is determined as the third abscissa.
  • the device further includes:
  • a second binarization module configured to perform binarization on the target image region to obtain a binarized target image region
  • a second calculating module configured to calculate a horizontal histogram according to a horizontal direction of the binarized target image region, the horizontal histogram comprising: a vertical coordinate of each row of pixel points and an accumulated value of the foreground color pixel points in each row of pixels ;
  • An edge determining module configured to determine a plurality of sets of vertical coordinates according to the distribution information of the accumulated values in the horizontal histogram, each set of vertical coordinates including a first vertical coordinate and a second vertical coordinate located at a lower side of the first vertical coordinate; Set the vertical coordinate, identify the pixel row where the first vertical coordinate is located as the upper edge of the line of text, and identify the pixel row where the second vertical coordinate is located as the lower edge of the text region; the first vertical coordinate and the lower side of the first vertical coordinate The accumulated value corresponding to the adjacent vertical coordinate is greater than the first threshold, and the accumulated value corresponding to the adjacent vertical coordinate of the upper side of the first vertical coordinate is smaller than the second threshold; the second vertical coordinate and the adjacent vertical of the upper side of the second vertical coordinate The accumulated value corresponding to the coordinate is greater than the first threshold, and the accumulated value corresponding to the adjacent vertical coordinate of the lower side of the second vertical coordinate is smaller than the second threshold;
  • the first binarization module is further configured to perform a step of binarizing the text region for the k-th character region to obtain a binarized text region, m ⁇ k ⁇ 1, k being a positive integer, m To identify the total number of rows obtained.
  • an area identifying apparatus comprising:
  • a memory for storing processor executable instructions
  • processor is configured to:
  • the histogram including: each column image The abscissa of the prime point and the accumulated value of the foreground pixel in each column of pixels;
  • the character area of the text in the text area is identified based on the distribution information of the accumulated values in the histogram.
  • the histogram is calculated according to the vertical direction of the binarized text area, and the character area of the text in the text area is identified according to the distribution information in the histogram; the problem of low positioning accuracy of the text area in the related art is solved; The effect of accurately locating the character region of the text can be based on the distribution information of the accumulated values of the foreground pixels in the histogram.
  • FIG. 1 is a schematic diagram of a text area shown in accordance with some exemplary embodiments
  • FIG. 2 is a flowchart of an area identification method according to an exemplary embodiment
  • FIG. 3A is a flowchart of an area identification method according to another exemplary embodiment
  • FIG. 3B is a schematic diagram of binarizing a text area according to another exemplary embodiment
  • 3C is a schematic diagram of a histogram calculated in a vertical direction, according to another exemplary embodiment
  • FIG. 3D is a flowchart of a method for determining a certain group of abscissas by a terminal according to another exemplary embodiment
  • FIG. 3E is a schematic diagram of a third abscissa in a histogram obtained by a terminal according to another exemplary embodiment
  • FIG. 3F is a schematic diagram showing each group of abscissas determined according to a histogram according to another exemplary embodiment
  • FIG. 3G is a schematic diagram of a preset abscissa in a histogram according to another exemplary embodiment
  • FIG. 3H is a schematic diagram showing a third abscissa determined according to a histogram according to another exemplary embodiment
  • FIG. 4 is a flowchart showing a method for identifying each group of abscissas according to distribution information of a third abscissa and an accumulated value, according to an exemplary embodiment
  • FIG. 5 is a flowchart showing another method for identifying each group of abscissas according to distribution information of a third abscissa and an accumulated value, according to another exemplary embodiment
  • FIG. 6 is a flowchart of a method for identifying a text area obtained by a terminal according to still another exemplary embodiment
  • FIG. 7 is a block diagram of an area identifying apparatus according to an exemplary embodiment
  • FIG. 8 is a block diagram of an area identifying apparatus according to another exemplary embodiment.
  • FIG. 9 is a block diagram of an area identification apparatus, according to an exemplary embodiment.
  • the text area includes a plurality of characters belonging to the same line, and the text area may be an area in the image of the document, or may be an area in the scanned image of the article, or may be an area in the electronic document, which is not limited in this embodiment.
  • the upper edge of the text area in this embodiment is between the upper edge of the text in the same line and the lower edge of the previous line of text; the lower edge of the text area is at the lower edge of the line of text and the upper edge of the next line of text. between.
  • the text area is an area containing the citizenship number belonging to the same line in the second generation ID card.
  • the upper edge of the text area is located between l 1 and l 2
  • the lower edge is located at l Between 3 and l 4 .
  • the text area is the area 11 shown in FIG.
  • the text in the text area can be either a valid text or a combination of valid text and invalid text.
  • the valid text is the text that needs to recognize the character area
  • the invalid text is the text that does not need to recognize the character area.
  • the text in the text area may include only the number ‘3 30421199012162834' may also include at least one Chinese character located on the left side of the number while including the number '330421199012162834', such as the text in the text area being 'number 330421199012162834'.
  • the character area refers to the area corresponding to a single character.
  • characters in the various embodiments of the present disclosure may be numbers, letters, Chinese characters, photos, or other content in the calculated histogram in which the value of the foreground color pixel is greater than a threshold.
  • FIG. 2 is a flowchart of an area identification method according to an exemplary embodiment. As shown in FIG. 2, the area identification method includes the following steps.
  • step 201 the text area is binarized to obtain a binarized text area including a plurality of characters belonging to the same line.
  • a histogram is calculated for the binarized text region in a vertical direction, the histogram comprising: an abscissa of each column of pixels and an accumulated value of foreground pixels in each column of pixels.
  • step 203 the character area of the character in the text area is identified based on the distribution information of the accumulated values in the histogram.
  • the area recognition method provided in the embodiment of the present disclosure calculates a histogram according to a vertical direction of the binarized text area, and identifies a character area of the text in the text area according to the distribution information in the histogram.
  • the invention solves the problem that the positioning accuracy of the text area in the related art is low; and the effect of accurately positioning the character area of the character according to the distribution information of the accumulated values of the foreground color pixel points in the histogram is solved.
  • FIG. 3A is a flowchart of an area identification method according to another exemplary embodiment. As shown in FIG. 3A, the area identification method includes the following steps.
  • step 301 the text area is binarized to obtain a binarized text area including a plurality of characters belonging to the same line.
  • the terminal performs pre-processing on the text area, where the pre-processing may include: performing operations such as denoising, filtering, and edge-taking; and binarizing the pre-processed text area.
  • Binarization refers to comparing the gray value of the pixel in the text area with the preset gray threshold, and dividing the pixel in the text area into two parts: a pixel group larger than the preset gray threshold and less than the preset gray level.
  • the two partial pixel groups respectively display two different colors of black and white in the text area to obtain a binarized text area, as shown in FIG. 3B.
  • a pixel of one color is located in the foreground It is called the foreground pixel, that is, the white pixel in FIG. 3B; the pixel of one color located in the background is called the background color pixel, that is, the black pixel in FIG. 3B.
  • a histogram is calculated for the binarized text region in a vertical direction, the histogram comprising: an abscissa of each column of pixels and an accumulated value of foreground pixels in each column of pixels.
  • the histogram is calculated in the vertical direction.
  • the histogram represents the abscissa of each column of pixels in the horizontal direction, and represents the cumulative value of the foreground pixels in each column of pixels in the vertical direction; the foreground pixel refers to the white in the binarized text area.
  • the pixel of the area which is relative to the background color pixel.
  • the terminal calculates a histogram as shown in FIG. 3C.
  • step 303 a plurality of sets of abscissas are determined based on the distribution information of the accumulated values in the histogram.
  • This step may include:
  • step 303a the third abscissa in the histogram is identified based on the distribution information of the accumulated values.
  • the third abscissa is: the abscissa corresponding to the left edge of the character region of the first valid character of the plurality of characters in the histogram, or the character region of the last valid character of the plurality of characters.
  • the third abscissa may be: the left edge of the first significant digit '3' corresponds to the histogram.
  • the abscissa X 1 , or the right edge of the last significant digit '4' is the abscissa X 2 corresponding to the histogram.
  • step 303b the third abscissa is used as the search starting point, and a plurality of sets of abscissas are searched based on the distribution information of the accumulated values in a predetermined direction.
  • the terminal may search for a plurality of sets of abscissas based on the distribution information of the accumulated values according to the third horizontal coordinate in the histogram as the search starting point.
  • the predetermined direction is the rightward direction; and the third horizontal coordinate is the character of the last valid character.
  • the predetermined direction is the direction to the left.
  • the number of groups of the abscissa corresponds to the number of valid characters in the text area, that is, the first abscissa corresponding to the left edge of the character area of each set of abscissas including a valid character in the histogram and the valid text
  • the right edge of the character area is the second abscissa corresponding to the histogram. That is, each set of abscissa includes a first abscissa and a first second abscissa located to the right of the first abscissa.
  • the cumulative value corresponding to the first abscissa and the adjacent abscissa on the right side of the first abscissa is greater than the first threshold, and the first cross
  • the accumulated value corresponding to the adjacent abscissa on the left side of the coordinate is smaller than the second threshold; the accumulated value corresponding to the adjacent abscissa on the left side of the second abscissa and the second abscissa is greater than the first threshold, and the right side of the second abscissa
  • the accumulated value corresponding to the adjacent abscissa is smaller than the second threshold.
  • the terminal can recognize each group of abscissas shown in FIG. 3F. It should be noted that, in FIG. 3F, only a certain set of abscissas are obtained for exemplification, and the actual implementation also includes more sets of abscissas, which is not limited in this embodiment.
  • the first threshold and the second threshold mentioned above may be values having a small value.
  • the first threshold and the second threshold are values that are slightly greater than zero.
  • the first threshold may be 0, and the second threshold may be a value close to 0.
  • the accumulated value corresponding to the first abscissa and the adjacent abscissa on the right side of the first abscissa is not 0, and the accumulated value corresponding to the adjacent abscissa on the left side of the first abscissa is 0;
  • the cumulative value corresponding to the adjacent abscissa of the second abscissa and the left side of the second abscissa is not 0, and the accumulated value corresponding to the adjacent abscissa of the right side of the second abscissa is 0.
  • step 304 for each set of abscissas, the pixel column in which the first abscissa is located is identified as the left edge of one character region, and the pixel column in which the second abscissa is located is identified as the right edge of the character region.
  • the area recognition method provided in the embodiment of the present disclosure calculates a histogram according to a vertical direction of the binarized text area, and identifies a character area of the text in the text area according to the distribution information in the histogram.
  • the invention solves the problem that the positioning accuracy of the text area in the related art is low; and the effect of accurately positioning the character area of the character according to the distribution information of the accumulated values of the foreground color pixel points in the histogram is solved.
  • Step 303a may include:
  • the terminal starts from the preset abscissa in the histogram, and queries the left side for the first gap whose width is greater than the second distance, and the accumulated value of the first foreground pixel located on the right side of the gap is greater than the abscissa of the first threshold. Determined as the third abscissa.
  • the preset abscissa is the coordinate belonging to the preset interval, and the preset interval belongs to the mapping interval corresponding to the effective text in the text region in the histogram. For example, taking the valid text as the citizenship number as an example, in conjunction with FIG. 3E, the preset interval belongs to the interval of [X 1 , X 2 ] in the figure. This preset interval is usually an interval set according to the empirical value. And the accumulated value of the foreground color pixel of the gap is less than the second threshold.
  • the preset abscissa can be the level of the second-generation ID card.
  • the position of half of the direction is the abscissa corresponding to the histogram.
  • the preset abscissa is X 0 in the figure. The terminal can start the query from X 0 to the left.
  • the terminal Since the first distance between two adjacent numbers in the citizenship number is much smaller than the second distance between the Chinese character 'code' and the first digit, the terminal queries the left side for a gap having a width greater than the second distance. That is, after the gap d in the figure, the terminal may determine the abscissa whose accumulated value of the first foreground pixel located on the right side of the gap is greater than the first threshold as the third abscissa, that is, determine the third abscissa X 1 .
  • the terminal starts to query from the preset abscissa to the left side.
  • the terminal can also start to query the right side from the preset abscissa, and after querying the gap with the width greater than the second distance,
  • the abscissa of the first foreground pixel located on the left side of the gap is greater than the first threshold, and the abscissa is determined as the third abscissa. This embodiment is not illustrated herein.
  • step 303a may include the following steps.
  • the terminal calculates the histogram of the binarized text area, the accumulated value of the first foreground color pixel on the left side in the calculated histogram is greater than the first threshold.
  • the abscissa is the abscissa corresponding to the first valid text in the histogram, so the terminal can determine the cumulative value of the first foreground pixel located on the left side in the histogram is greater than the abscissa of the first threshold. Is the third abscissa.
  • the terminal can determine X 1 in the figure as the third abscissa.
  • the terminal may determine X 2 in the figure as the third abscissa.
  • step 303b may be replaced with steps 401 to 404.
  • step 401 for the i-th group abscissa, the first abscissa in the i-th group abscissa in the histogram is used as the search starting point, and the first fourth abscissa is searched to the right.
  • n is the number of valid characters in a number of characters. For example, taking the valid text as the citizenship number in the second-generation ID card as an example, the number n of valid characters is 18.
  • the first abscissa in the first set of coordinates is the third abscissa
  • the accumulated value corresponding to the adjacent abscissa on the fourth abscissa and the left side of the fourth abscissa is greater than the first threshold
  • the fourth abscissa is on the right
  • the accumulated value corresponding to the adjacent abscissa is smaller than the second threshold.
  • the terminal can search for X 1 from the histogram and search for the left side. Determine to get the first fourth abscissa.
  • step 402 the fourth abscissa is determined as the second abscissa in the i-th set of coordinates.
  • step 403 if i ⁇ n, the fourth abscissa in the histogram is used as the search starting point, and the first fifth abscissa is searched to the right.
  • the terminal can determine that there is a valid character on the right side of the unrecognized character area. At this time, the terminal can determine the right edge of the character area of the currently valid character in the histogram corresponding to the abscissa as the search starting point. , continue to search for the abscissa corresponding to the left edge of the character area of the next valid text in the histogram.
  • the terminal may use the fourth abscissa in the histogram as the search starting point, and search for the first fifth abscissa, the fifth abscissa, and the accumulated value corresponding to the adjacent abscissa on the right side of the fifth abscissa. More than the first threshold, the accumulated value corresponding to the adjacent abscissa on the left side of the fifth abscissa is smaller than the second threshold.
  • the terminal determines the abscissa obtained by the search as the abscissa corresponding to the left edge of the character region of the next valid character in the histogram.
  • the terminal continues to search to the right according to the above method to determine a set of abscissas corresponding to the character regions of the respective valid characters.
  • step 303b may be replaced with steps 501 to 504.
  • step 501 for the jth group coordinate, the second abscissa in the jth group coordinate in the histogram is used as the search starting point, and the first sixth abscissa is searched to the left.
  • the cumulative value corresponding to the adjacent abscissa on the sixth abscissa and the right side of the sixth abscissa is greater than the first threshold, and the accumulated value corresponding to the adjacent abscissa on the left side of the sixth abscissa is smaller than the second threshold; 1 ⁇ j ⁇ n , j is a positive integer with an initial value of n, n is the number of valid characters in a number of characters; and the second abscissa in the nth group of coordinates is the third abscissa.
  • step 502 the sixth abscissa is determined as the first abscissa in the jth set of coordinates.
  • step 503 if j>0, the six abscissas in the histogram are used as the search starting point, and the first seventh abscissa, the seventh abscissa and the adjacent abscissa of the left side of the seventh abscissa are searched to the left.
  • the corresponding accumulated value is greater than the first threshold, and the accumulated value corresponding to the adjacent abscissa on the right side of the seventh abscissa is smaller than the second threshold; 1 ⁇ j ⁇ n, j is a positive integer whose initial value is n.
  • steps 501 to 504 are similar to the above steps 401 to 404. The difference between the two is that the methods provided in steps 401 to 404 are from the left to the right, and the steps 501 to 504 are from Search from right to left, so this embodiment will not be described again here.
  • the terminal may perform the following steps before binarizing the text area to obtain the binarized text area:
  • step 601 the target image region is binarized to obtain a binarized target image region.
  • the target image area may be an area including a plurality of lines of text.
  • This step is similar to the step 301 in the foregoing embodiment.
  • This step is similar to the step 301 in the foregoing embodiment.
  • a horizontal histogram is calculated according to the horizontal direction of the binarized target image region, and the horizontal histogram includes: a vertical coordinate of each row of pixel points and an accumulated value of foreground color pixel points in each row of pixel points.
  • step 302 is similar to step 302 in the above embodiment.
  • the histogram is calculated according to the vertical file for the binarized text area, and the step is to level the target image area after binarization.
  • the direction calculates the histogram.
  • a plurality of sets of vertical coordinates are determined according to the distribution information of the accumulated values in the horizontal histogram, and each set of vertical coordinates includes a first vertical coordinate and a second vertical coordinate located at a lower side of the first vertical coordinate;
  • the coordinates identify the pixel row where the first vertical coordinate is located as the upper edge of a line of text, and the pixel row where the second vertical coordinate is located as the lower edge of the text region.
  • the terminal may determine a plurality of sets of vertical coordinates according to the distribution information of the accumulated values in the horizontal histogram, and then determine the area of each row according to each set of vertical coordinates.
  • This step and the distribution information of the accumulated values in the histogram according to the vertical direction in the above embodiment determine a plurality of sets of abscissas, and then determine the left and right edges of the text according to each set of abscissas.
  • a plurality of sets of abscissas and then determine the left and right edges of the text according to each set of abscissas.
  • the accumulated value corresponding to the first vertical coordinate and the adjacent vertical coordinate of the lower side of the first vertical coordinate is greater than the first threshold, and the accumulated value corresponding to the adjacent vertical coordinate of the upper side of the first vertical coordinate is smaller than the second threshold;
  • the accumulated value corresponding to the vertical coordinate and the adjacent vertical coordinate of the upper side of the second vertical coordinate is greater than the first threshold, and the accumulated value corresponding to the adjacent vertical coordinate of the lower side of the second vertical coordinate is smaller than the second threshold.
  • step 604 for the k-th character area, the step of binarizing the text area to obtain the binarized character area is performed, m ⁇ k ⁇ 1, k is a positive integer, and m is the total number of lines identified. .
  • the terminal can perform an operation of binarizing the text area to obtain a binarized text area.
  • the terminal can recognize the character area of each valid character in each line in the target image area.
  • the terminal may also determine the text area by other determining manners.
  • the terminal can locate the text area by image positioning technology.
  • the text area is used as the citizenship number in the second generation ID card.
  • the location of the citizenship number in the second generation ID card is relatively fixed, and the distance between the citizenship number and the address and avatar above is far. Therefore, the terminal can directly locate the lower 1/5 area of the ID image, and the image area obtained by the positioning is used as the text area, which is not limited in this embodiment.
  • FIG. 7 is a block diagram of an area identifying apparatus according to an exemplary embodiment.
  • the area identifying apparatus includes, but is not limited to, a first binarization module 710, a first calculating module 720, and area identification. Module 730.
  • the first binarization module 710 is configured to binarize the text region to obtain a binarized text region, the text region including a plurality of characters belonging to the same row.
  • the first calculating module 720 is configured to calculate a histogram in the vertical direction for the binarized text region, the histogram comprising: an abscissa of each column of pixels and an accumulated value of foreground pixels in each column of pixels.
  • the area identification module 730 is configured to recognize the character area of the text in the text area based on the distribution information of the accumulated values in the histogram.
  • the area recognition apparatus calculates the histogram in the vertical direction by binarizing the text area, and identifies the character area of the text in the text area according to the distribution information in the histogram.
  • the invention solves the problem that the positioning accuracy of the text area in the related art is low; and the effect of accurately positioning the character area of the character according to the distribution information of the accumulated values of the foreground color pixel points in the histogram is solved.
  • FIG. 8 is a block diagram of an area identifying apparatus according to another exemplary embodiment.
  • the area identifying apparatus includes, but is not limited to, a first binarization module 810, a first calculating module 820, and an area. Identification module 830.
  • the first binarization module 810 is configured to binarize the text region to obtain a binarized text region, the text region including a plurality of characters belonging to the same row.
  • the first binarization module 810 performs pre-processing on the text area, where the pre-processing may include: operations such as denoising, filtering, and edge-taking; and binarizing the pre-processed text area.
  • Binarization refers to comparing the gray value of the pixel in the text area with the preset gray threshold, and dividing the pixel in the text area into two parts: a pixel group larger than the preset gray threshold and less than the preset gray level.
  • the pixel group of the threshold value presents two different colors of black and white in the text area, and the binarized text area is obtained.
  • the first calculating module 820 is configured to calculate a histogram in a vertical direction for the binarized text area
  • the histogram includes: the abscissa of each column of pixels and the accumulated value of the foreground pixels in each column of pixels.
  • the first calculation module 820 calculates the histogram in the vertical direction.
  • the histogram represents the abscissa of each column of pixels in the horizontal direction, and represents the cumulative value of the foreground pixels in each column of pixels in the vertical direction; the foreground pixel refers to the white in the binarized text area.
  • the pixel of the area which is relative to the background color pixel.
  • the area identification module 830 is configured to recognize the character area of the text in the text area based on the distribution information of the accumulated values in the histogram.
  • the area identification module 830 includes a coordinate determination sub-module 831 and an area identification sub-module 832.
  • the coordinate determining sub-module 831 is configured to determine a plurality of sets of abscissas according to the distribution information of the accumulated values in the histogram, each set of abscissas including a first abscissa and a first second abscissa located on a right side of the first abscissa;
  • the accumulated value corresponding to the first abscissa and the adjacent abscissa on the right side of the first abscissa is greater than the first threshold, and the accumulated value corresponding to the adjacent abscissa on the left side of the first abscissa is smaller than the second threshold;
  • the second abscissa And the accumulated value corresponding to the adjacent abscissa on the left side of the second abscissa is greater than the first threshold, and the accumulated value corresponding to the adjacent abscissa on the right side of the second abscissa is smaller than the second threshold
  • the region identification sub-module 832 is configured to identify, for each set of abscissas, a pixel column in which the first abscissa is located as a left edge of one character region, and a pixel column in which the second abscissa is located as a right edge of the character region.
  • the coordinate determination sub-module 831 includes: a coordinate recognition sub-module 831a and a coordinate search sub-module 831b.
  • the coordinate recognition sub-module 831a is configured to identify a third abscissa in the histogram according to the distribution information of the accumulated value, and the third abscissa is: the left edge of the character region of the first valid character of the plurality of characters is in the square The coordinates corresponding to the figure, or the coordinates of the right edge of the character area of the last valid text in several characters in the histogram.
  • the coordinate search sub-module 831b is configured to search for a plurality of sets of abscissas based on the distribution information of the accumulated values in a predetermined direction with the third abscissa as a search starting point.
  • the coordinate search sub-module 831b may search for a plurality of sets of abscissas based on the distribution information of the accumulated values in the predetermined direction using the third abscissa in the histogram as the search starting point.
  • the predetermined direction is the rightward direction
  • the third abscissa is When the right edge of the character region of the last valid character is in the abscissa corresponding to the histogram, the predetermined direction is the leftward direction.
  • the number of groups of the abscissa corresponds to the number of valid characters in the text area, that is, the first abscissa corresponding to the left edge of the character area of each set of abscissas including a valid character in the histogram and the valid text
  • the right edge of the character area is the second abscissa corresponding to the histogram. That is, each set of abscissa includes a first abscissa and a first second abscissa located to the right of the first abscissa.
  • the accumulated value corresponding to the first abscissa and the adjacent abscissa on the right side of the first abscissa is greater than the first threshold, and the accumulated value corresponding to the adjacent abscissa on the left side of the first abscissa is smaller than the second threshold;
  • the cumulative value corresponding to the adjacent abscissa of the second abscissa and the left side of the second abscissa is greater than the first threshold, and the accumulated value corresponding to the adjacent abscissa of the right side of the second abscissa is less than the second threshold.
  • the first threshold and the second threshold mentioned above may be values having a small value.
  • the first threshold and the second threshold are values that are slightly greater than zero.
  • the first threshold may be 0, and the second threshold may be a value close to 0.
  • the accumulated value corresponding to the first abscissa and the adjacent abscissa on the right side of the first abscissa is not 0, and the accumulated value corresponding to the adjacent abscissa on the left side of the first abscissa is 0;
  • the cumulative value corresponding to the adjacent abscissa of the second abscissa and the left side of the second abscissa is not 0, and the accumulated value corresponding to the adjacent abscissa of the right side of the second abscissa is 0.
  • the third abscissa is an abscissa corresponding to a left edge of a character region of the first valid text of the plurality of characters in the histogram;
  • the coordinate search sub-module 831b is also configured to:
  • the first abscissa in the i-th group abscissa in the histogram is used as the search starting point, and the first fourth abscissa, the fourth abscissa and the left side of the fourth abscissa are searched to the right.
  • the accumulated value corresponding to the adjacent abscissa is greater than the first threshold, and the accumulated value corresponding to the adjacent abscissa on the right side of the fourth abscissa is smaller than the second threshold; 1 ⁇ i ⁇ n, i is a positive integer with an initial value of 1, n is The number of valid characters in a number of characters; the first abscissa in the first set of coordinates is the third abscissa.
  • the fourth abscissa is determined as the second abscissa in the i-th group of coordinates.
  • the fourth abscissa in the histogram is used as the search starting point, and the first fifth abscissa, the fifth abscissa and the accumulated value corresponding to the adjacent abscissa on the right side of the fifth abscissa are searched to the right side. More than the first threshold, the accumulated value corresponding to the adjacent abscissa on the left side of the fifth abscissa is smaller than the second threshold.
  • the coordinate search sub-module 831b determines the searched abscissa as the character area of the next valid character.
  • the left edge of the field is the abscissa corresponding to the histogram.
  • the coordinate search sub-module 831b continues to search to the right according to the above method to determine a set of abscissas corresponding to the character regions of the respective valid characters.
  • the third abscissa is an abscissa corresponding to a right edge of a character region of a last valid text of the plurality of characters in the histogram;
  • the coordinate search sub-module 831b is also configured to:
  • the second abscissa in the jth group coordinate in the histogram is used as the search starting point, and the first sixth abscissa, the sixth abscissa and the adjacent horizontal side of the sixth abscissa are searched to the left.
  • the accumulated value corresponding to the coordinate is greater than the first threshold, and the accumulated value corresponding to the adjacent abscissa on the left side of the sixth abscissa is smaller than the second threshold; 1 ⁇ j ⁇ n, j is a positive integer with an initial value of n, and n is a plurality of The number of valid characters in the text; the second abscissa in the nth group coordinate is the third abscissa.
  • the sixth abscissa is determined as the first abscissa in the jth group of coordinates.
  • the six horizontal coordinates in the histogram are used as the search starting point, and the first seventh horizontal coordinate is searched to the left side, and the cumulative value corresponding to the adjacent horizontal coordinate on the left side of the seventh horizontal coordinate and the seventh horizontal coordinate is greater than
  • the first threshold value, the accumulated value corresponding to the adjacent abscissa on the right side of the seventh abscissa is smaller than the second threshold; 1 ⁇ j ⁇ n, and j is a positive integer whose initial value is n.
  • the third abscissa is an abscissa corresponding to a left edge of a character region of the first valid text of the plurality of characters in the histogram;
  • the execution step of the coordinate search submodule 831b and the third step described above when the abscissa is the abscissa of the character area of the first valid character of the plurality of characters in the histogram, the steps performed are similar, so the embodiment will not be described herein.
  • the coordinate recognition sub-module 831a is also configured to:
  • the first distance between the valid text and the invalid text is greater than the second distance between the adjacent two valid characters, starting from the preset abscissa in the histogram, Querying the first gap having a width greater than the second distance to the left side, and determining the abscissa of the first foreground pixel pixel located on the right side of the gap greater than the first threshold as the third abscissa;
  • the preset abscissa is the genus
  • the preset interval is an interval set according to the empirical value; the accumulated value of the foreground color pixel of the gap is smaller than the second threshold.
  • the abscissa whose accumulated value of the first foreground pixel on the left side in the histogram is greater than the first threshold is determined as the third abscissa.
  • the coordinate recognition sub-module 831a Starting from the preset abscissa in the histogram, the first gap having the width greater than the second distance is searched to the left side, and the abscissa of the first foreground pixel located on the right side of the gap is greater than the first threshold. Is the third abscissa.
  • the preset abscissa is the coordinate belonging to the preset interval, and the preset interval belongs to the mapping interval corresponding to the effective text in the text region in the histogram.
  • the coordinate recognition sub-module 831a can also start from the preset abscissa to the right side, and the query width is greater than the first After the gap of the two distances, the abscissa of the first foreground pixel located on the left side of the gap is determined to be the third abscissa, which is not described herein.
  • the accumulated value of the first foreground color pixel on the left side of the calculated histogram is calculated.
  • the abscissa corresponding to the first threshold is the abscissa corresponding to the first valid text in the histogram, so the coordinate recognition sub-module 831a can accumulate the accumulated value of the first foreground pixel located on the left side in the histogram.
  • An abscissa greater than the first threshold is determined to be the third abscissa.
  • the third abscissa is an abscissa corresponding to a right edge of a character region of a last valid text of the plurality of characters in the histogram;
  • the coordinate recognition sub-module 831a is also configured to:
  • the first distance between the valid text and the invalid text is greater than the second distance between the adjacent two valid characters, starting from the preset abscissa in the histogram, Query the gap with the width greater than the second distance to the right side, and determine the abscissa whose accumulated value of the first foreground pixel on the left side of the gap is greater than the first threshold as the third abscissa; the preset abscissa belongs to the preset interval.
  • the coordinate of the preset interval is an interval set according to the empirical value; the accumulated value of the foreground color pixel of the gap is smaller than the second threshold.
  • the abscissa whose accumulated value of the first foreground pixel on the right side of the histogram is greater than the first threshold is determined as the third abscissa.
  • the execution step of the coordinate recognition sub-module 831a and the third step described above When the abscissa is the abscissa of the character area of the first valid character of the plurality of characters in the histogram, the steps performed are similar, so the embodiment will not be described herein.
  • the device further includes: a second binarization module 840, a second calculation module 850, and an edge determination module 860.
  • the second binarization module 840 is configured to binarize the target image region to obtain a binarized target image region.
  • the target image area may be an area including a plurality of lines of text.
  • the second binarization module 840 is similar to the first binarization module 810. For details, refer to the first binarization module 810.
  • the second binarization module 840 is not limited in this embodiment.
  • the second calculating module 850 is configured to calculate a horizontal histogram according to the horizontal direction of the binarized target image region, where the horizontal histogram comprises: a vertical coordinate of each row of pixels and an accumulation of foreground pixels in each row of pixels value.
  • the second calculation module 850 is similar to the first calculation module 820. The difference is that the first calculation module 820 calculates a histogram according to the vertical file for the binarized text area, and the second calculation module 850 performs binarization. The subsequent target image area calculates a histogram in the horizontal direction.
  • the edge determining module 860 is configured to determine, according to the distribution information of the accumulated values in the horizontal histogram, a plurality of sets of vertical coordinates, where each set of vertical coordinates includes a first vertical coordinate and a second vertical coordinate located at a lower side of the first vertical coordinate; For each set of vertical coordinates, the pixel row where the first vertical coordinate is located is recognized as the upper edge of the line of text, and the pixel row where the second vertical coordinate is located is recognized as the lower edge of the text region; the first vertical coordinate and the first vertical coordinate The accumulated value corresponding to the adjacent abscissa of the side is greater than the first threshold, and the accumulated value corresponding to the adjacent abscissa of the left side of the first vertical coordinate is smaller than the second threshold; the second vertical coordinate and the adjacent left side of the second vertical coordinate The accumulated value corresponding to the abscissa is greater than the first threshold, and the accumulated value corresponding to the adjacent abscissa of the right side of the second vertical coordinate is smaller than
  • the edge determination module 860 can be based on the level
  • the distribution information of the accumulated values in the histogram determines a number of sets of vertical coordinates, and then determines the area of each line according to each set of vertical coordinates.
  • the first binarization module 810 is further configured to perform a step of binarizing the text region for the k-th character region to obtain a binarized text region, m ⁇ k ⁇ 1, and k is a positive integer. m is the total number of lines identified.
  • the area identifying means can recognize the character area of each valid character in each line in the target image area.
  • the area recognition apparatus calculates the histogram in the vertical direction by binarizing the text area, and identifies the character area of the text in the text area according to the distribution information in the histogram.
  • the invention solves the problem that the positioning accuracy of the text area in the related art is low; and the effect of accurately positioning the character area of the character according to the distribution information of the accumulated values of the foreground color pixel points in the histogram is solved.
  • An exemplary embodiment of the present disclosure provides an area identifying apparatus capable of implementing the area identifying method provided by the present disclosure, the area identifying apparatus comprising: a processor, a memory for storing processor executable instructions;
  • processor is configured to:
  • the histogram comprising: an abscissa of each column of pixels and an accumulated value of foreground pixels in each column of pixels;
  • the character area of the text in the text area is identified based on the distribution information of the accumulated values in the histogram.
  • FIG. 9 is a block diagram of an area identification apparatus, according to an exemplary embodiment.
  • device 900 can be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a gaming console, a tablet device, a medical device, a fitness device, a personal digital assistant, and the like.
  • device 900 can include one or more of the following components: processing component 902, memory 904, power component 906, multimedia component 908, audio component 910, input/output (I/O) interface 912, sensor component 914, and communication component 916.
  • Processing component 902 typically controls the overall operation of device 900, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations.
  • Processing component 902 can include one or more processors 918 to execute instructions to perform all or part of the steps of the above described methods.
  • processing component 902 can include one or more modules to facilitate interaction between component 902 and other components.
  • processing component 902 can include a multimedia module to facilitate interaction between multimedia component 908 and processing component 902.
  • Memory 904 is configured to store various types of data to support operation at device 900. Examples of such data include instructions for any application or method operating on device 900, contact data, phone book data, messages, pictures, videos, and the like.
  • the memory 904 can be implemented by any type of volatile or non-volatile storage device, or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read only memory
  • EPROM Programmable Read Only Memory
  • PROM Programmable Read Only Memory
  • ROM Read Only Memory
  • Magnetic Memory Flash Memory
  • Disk Disk or Optical Disk.
  • Power component 906 provides power to various components of device 900.
  • Power component 906 can include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for device 900.
  • the multimedia component 908 includes a screen between the device 900 and the user that provides an output interface.
  • the screen can include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen can be implemented as a touch screen to receive input signals from the user.
  • the touch panel includes one or more touch sensors to sense touches, slides, and gestures on the touch panel. The touch sensor can sense not only the boundaries of the touch or sliding action, but also the duration and pressure associated with the touch or slide operation.
  • the multimedia component 908 includes a front camera and/or a rear camera. When the device 900 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front and rear camera can be a fixed optical lens system or have focal length and optical zoom capabilities.
  • the audio component 910 is configured to output and/or input an audio signal.
  • audio component 910 includes a microphone (MIC) that is configured to receive an external audio signal when device 900 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode.
  • the received audio signal can be entered
  • One step is stored in memory 904 or transmitted via communication component 916.
  • the audio component 910 also includes a speaker for outputting an audio signal.
  • the I/O interface 912 provides an interface between the processing component 902 and the peripheral interface module, which may be a keyboard, a click wheel, a button, or the like. These buttons may include, but are not limited to, a home button, a volume button, a start button, and a lock button.
  • Sensor assembly 914 includes one or more sensors for providing device 900 with various aspects of status assessment.
  • sensor component 914 can detect an open/closed state of device 900, a relative positioning of components, such as a display and a keypad of device 900, and sensor component 914 can also detect a change in position of one component of device 900 or device 900, user The presence or absence of contact with device 900, device 900 orientation or acceleration/deceleration and temperature variation of device 900.
  • Sensor assembly 914 can include a proximity sensor configured to detect the presence of nearby objects without any physical contact.
  • Sensor assembly 914 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications.
  • the sensor component 914 can also include an acceleration sensor, a gyro sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
  • Communication component 916 is configured to facilitate wired or wireless communication between device 900 and other devices.
  • the device 900 can access a wireless network based on a communication standard, such as Wi-Fi, 2G or 3G, or a combination thereof.
  • communication component 916 receives broadcast signals or broadcast associated information from an external broadcast management system via a broadcast channel.
  • communication component 916 also includes a near field communication (NFC) module to facilitate short range communication.
  • NFC near field communication
  • the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.
  • RFID radio frequency identification
  • IrDA infrared data association
  • UWB ultra-wideband
  • Bluetooth Bluetooth
  • device 900 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable A gate array (FPGA), controller, microcontroller, microprocessor or other electronic component implementation for performing the above-described region identification method.
  • ASICs application specific integrated circuits
  • DSPs digital signal processors
  • DSPDs digital signal processing devices
  • PLDs programmable logic devices
  • FPGA field programmable A gate array
  • controller microcontroller, microprocessor or other electronic component implementation for performing the above-described region identification method.
  • non-transitory computer readable storage medium comprising instructions, such as a memory 904 comprising instructions executable by processor 918 of apparatus 900 to perform the area identification method described above.
  • the non-transitory computer readable storage medium can be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, and an optical data storage device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Geometry (AREA)
  • Computer Graphics (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Character Input (AREA)
  • Image Analysis (AREA)

Abstract

本公开揭示了一种区域识别方法及装置,属于图像处理领域。所述区域识别方法包括:对文字区域进行二值化,得到二值化后的文字区域;该文字区域包括属于同一行的若干个文字;对二值化后的文字区域按照竖直方向计算直方图,该直方图包括:每列像素点的横坐标和每列像素点中前景色像素点的累加值;根据直方图中的累加值的分布信息,识别文字区域中的文字的字符区域。通过对二值化后的文字区域按照竖直方向计算直方图,根据直方图中的分布信息,识别文字区域中的文字的字符区域;解决了相关技术中文字区域定位准确度较低问题;达到了可以根据直方图中的前景色像素点的累加值的分布信息来精确定位文字的字符区域的效果。

Description

区域识别方法及装置
本申请基于申请号为201510726153.9、申请日为2015年10月30目的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本公开涉及图像处理领域,特别涉及一种区域识别方法及装置。
背景技术
在图像处理领域,终端识别图像中的文字之前,终端需要先识别文字的字符区域。
相关技术中提供的一种区域识别方法包括:终端去除图像中的背景,提取得到前景图像;然后通过边缘增强技术识别提取得到的前景图像中的文字的边缘;根据识别得到的各个文字的边缘确定各个文字的字符区域。
在上述方案中,由于边缘增强技术只能提供粗略定位,所以上述方案定位得到的字符区域的准确度较低。
发明内容
为了解决相关技术中字符区域定位准确度低的问题,本公开提供一种区域识别方法及装置。所述技术方案如下:
根据本公开实施例的第一方面,提供一种区域识别方法,该方法包括:
对文字区域进行二值化,得到二值化后的文字区域,该文字区域包括属于同一行的若干个文字;
对二值化后的文字区域按照竖直方向计算直方图,该直方图包括:每列像素点的横坐标和每列像素点中前景色像素点的累加值;
根据直方图中的累加值的分布信息,识别文字区域中的文字的字符区域。
可选的,根据直方图中的累加值的分布信息,识别文字区域中的文字的字符区域,包括:
根据直方图中的累加值的分布信息确定若干组横坐标,每组横坐标包括第一横坐标以及位于第一横坐标右侧的首个第二横坐标;该第一横坐标以及第一横坐标右侧的相邻横坐标对应的累加值大于第一阈值,且第一横坐标左侧的相邻横坐标对应的累加值小于第二阈值;第二横坐标以及第二横坐标左侧的相邻横坐标对应的累加值大于第一阈值,且第二横坐标右侧的相邻横坐标对应的累加值小于第二阈值;
对于每组横坐标,将第一横坐标所在的像素列识别为一个字符区域的左边缘,将第二横坐标所在的像素列识别为字符区域的右边缘。
可选的,根据直方图中的累加值的分布信息确定若干组横坐标,包括:
根据累加值的分布信息,识别直方图中的第三横坐标,该第三横坐标为:若干个文字中的第一个有效文字的字符区域的左边缘在直方图中所对应的横坐标,或者,若干个文字中的最后一个有效文字的字符区域的右边缘在直方图中所对应的横坐标;
以第三横坐标为搜索起点,按照预定方向基于累加值的分布信息搜索出若干组横坐标。
可选的,第三横坐标为若干个文字中的第一个有效文字的字符区域的左边缘在直方图中所对应的横坐标,以第三横坐标为搜索起点,按照预定方向基于累加值的分布信息搜索出若干组横坐标,包括:
对于第i组横坐标,以直方图中的第i组横坐标中的第一横坐标为搜索起点,向右搜索首个第四横坐标,第四横坐标以及第四横坐标左侧的相邻横坐标对应的累加值大于第一阈值,第四横坐标右侧的相邻横坐标对应的累加值小于第二阈值;1≤i≤n,i为初始值为1的正整数,n为若干个文字中的有效文字的个数;第1组坐标中的第一横坐标为第三横坐标;
将第四横坐标确定为第i组坐标中的第二横坐标;
若i<n,则以直方图中的第四横坐标为搜索起点,向右侧搜索首个第五横坐标,第五横坐标以及第五横坐标右侧的相邻横坐标对应的累加值大于第一阈值,第五横坐标左侧的相邻横坐标对应的累加值小于第二阈值;
令i=i+1,将第五横坐标确定为第i组坐标中的第一横坐标。
可选的,第三横坐标为若干个文字中的最后一个有效文字的字符区域的右边缘在直方图中所对应的横坐标,以第三横坐标为搜索起点,按照预定方向基 于累加值的分布信息搜索出若干组横坐标,包括:
对于第j组坐标,以直方图中的第j组坐标中的第二横坐标为搜索起点,向左搜索首个第六横坐标,第六横坐标以及第六横坐标右侧的相邻横坐标对应的累加值大于第一阈值,第六横坐标左侧的相邻横坐标对应的累加值小于第二阈值;1≤j≤n,j是初始值为n的正整数,n为若干个文字中的有效文字的个数;第n组坐标中的第二横坐标为第三横坐标;
将第六横坐标确定为第j组坐标中的第一横坐标;
若j>0,则以直方图中的六横坐标为搜索起点,向左侧搜索首个第七横坐标,第七横坐标以及第七横坐标左侧的相邻横坐标对应的累加值大于第一阈值,第七横坐标右侧的相邻横坐标对应的累加值小于第二阈值;1≤j≤n,j为初始值为n的正整数;
将令j=j-1,将第七横坐标确定为第j组坐标中的第二横坐标。
可选的,第三横坐标为若干个文字中的第一个有效文字的字符区域的左边缘在直方图中所对应的横坐标,根据累加值的分布信息,识别直方图中的第三横坐标,包括:
若干个文字中包括有效文字和无效文字,有效文字与无效文字之间的第一距离大于相邻两个有效文字之间的第二距离;从直方图中的预设横坐标处开始,向左侧查询首个宽度大于第二距离的间隙,将位于间隙右侧的首个前景色像素点的累加值大于第一阈值的横坐标确定为第三横坐标;预设横坐标为属于预设区间的坐标,预设区间为根据经验值设置的区间;间隙的前景色像素点的累加值小于第二阈值;
或者,
若干个文字中均为有效文字,将位于直方图中的左侧的首个前景色像素点的累加值大于第一阈值的横坐标确定为第三横坐标。
可选的,第三横坐标为若干个文字中的最后一个有效文字的字符区域的右边缘在直方图中所对应的横坐标,根据累加值的分布信息,识别直方图中的第三横坐标,包括:
若干个文字中包括有效文字和无效文字,有效文字与无效文字之间的第一距离大于相邻两个有效文字之间的第二距离,从直方图中的预设横坐标处开始,向右侧查询宽度大于第二距离的间隙,将位于间隙左侧的首个前景色像素点的 累加值大于第一阈值的横坐标确定为第三横坐标;预设横坐标为属于预设区间的坐标,预设区间为根据经验值设置的区间;间隙的前景色像素点的累加值小于第二阈值;
或者,
若干个文字中均为有效文字,将位于直方图右侧的首个前景色像素点的累加值大于第一阈值的横坐标确定为第三横坐标。
可选的,该方法还包括:
对目标图像区域进行二值化,得到二值化后的目标图像区域;
对二值化后的目标图像区域按照水平方向计算水平直方图,水平直方图包括:每行像素点的竖坐标和每行像素点中前景色像素点的累加值;
根据水平直方图中的累加值的分布信息,确定若干组竖坐标,每组竖坐标包括第一竖坐标和位于第一竖坐标下侧的第二竖坐标;对于每组竖坐标,将第一竖坐标所在的像素行识别为一行文字区域的上边缘,将第二竖坐标所在的像素行识别为文字区域的下边缘;第一竖坐标以及第一竖坐标下侧的相邻竖坐标对应的累加值大于第一阈值,且第一竖坐标上侧的相邻竖坐标对应的累加值小于第二阈值;第二竖坐标以及第二竖坐标上侧的相邻竖坐标对应的累加值大于第一阈值,且第二竖坐标下侧的相邻竖坐标对应的累加值小于第二阈值;
对于第k行文字区域,执行对文字区域进行二值化,得到二值化后的文字区域的步骤,m≥k≥1,k为正整数,m为识别得到的总行数。
根据本公开实施例的第二方面,提供一种区域识别装置,该装置包括:
第一二值化模块,被配置为对文字区域进行二值化,得到二值化后的文字区域,该文字区域包括属于同一行的若干个文字;
第一计算模块,被配置为对二值化后的文字区域按照竖直方向计算直方图,直方图包括:每列像素点的横坐标和每列像素点中前景色像素点的累加值;
区域识别模块,被配置为根据直方图中的累加值的分布信息,识别文字区域中的文字的字符区域。
可选的,区域识别模块,包括:
坐标确定子模块,被配置为根据直方图中的累加值的分布信息确定若干组横坐标,每组横坐标包括第一横坐标以及位于第一横坐标右侧的首个第二横坐 标;第一横坐标以及第一横坐标右侧的相邻横坐标对应的累加值大于第一阈值,且第一横坐标左侧的相邻横坐标对应的累加值小于第二阈值;第二横坐标以及第二横坐标左侧的相邻横坐标对应的累加值大于第一阈值,且第二横坐标右侧的相邻横坐标对应的累加值小于第二阈值;
区域识别子模块,被配置为对于每组横坐标,将第一横坐标所在的像素列识别为一个字符区域的左边缘,将第二横坐标所在的像素列识别为字符区域的右边缘。
可选的,坐标确定子模块,包括:
坐标识别子模块,被配置为根据累加值的分布信息,识别直方图中的第三横坐标,第三横坐标为:若干个文字中的第一个有效文字的字符区域的左边缘在直方图中所对应的坐标,或者,若干个文字中的最后一个有效文字的字符区域的右边缘在直方图中所对应的坐标;
坐标搜索子模块,被配置为以第三横坐标为搜索起点,按照预定方向基于累加值的分布信息搜索出若干组横坐标。
可选的,第三横坐标为若干个文字中的第一个有效文字的字符区域的左边缘在直方图中所对应的横坐标;
坐标搜索子模块,还被配置为:
对于第i组横坐标,以直方图中的第i组横坐标中的第一横坐标为搜索起点,向右搜索首个第四横坐标,第四横坐标以及第四横坐标左侧的相邻横坐标对应的累加值大于第一阈值,第四横坐标右侧的相邻横坐标对应的累加值小于第二阈值;1≤i≤n,i为初始值为1的正整数,n为若干个文字中的有效文字的个数;第1组坐标中的第一横坐标为第三横坐标;
将第四横坐标确定为第i组坐标中的第二横坐标;
若i<n,则以直方图中的第四横坐标为搜索起点,向右侧搜索首个第五横坐标,第五横坐标以及第五横坐标右侧的相邻横坐标对应的累加值大于第一阈值,第五横坐标左侧的相邻横坐标对应的累加值小于第二阈值;
令i=i+1,将第五横坐标确定为第i组坐标中的第一横坐标。
可选的,第三横坐标为若干个文字中的最后一个有效文字的字符区域的右边缘在直方图中所对应的横坐标;
坐标搜索子模块,还被配置为:
对于第j组坐标,以直方图中的第j组坐标中的第二横坐标为搜索起点,向左搜索首个第六横坐标,第六横坐标以及第六横坐标右侧的相邻横坐标对应的累加值大于第一阈值,第六横坐标左侧的相邻横坐标对应的累加值小于第二阈值;1≤j≤n,j是初始值为n的正整数,n为若干个文字中的有效文字的个数;第n组坐标中的第二横坐标为第三横坐标;
将第六横坐标确定为第j组坐标中的第一横坐标;
若j>0,则以直方图中的六横坐标为搜索起点,向左侧搜索首个第七横坐标,第七横坐标以及第七横坐标左侧的相邻横坐标对应的累加值大于第一阈值,第七横坐标右侧的相邻横坐标对应的累加值小于第二阈值;1≤j≤n,j为初始值为n的正整数;
将令j=j-1,将第七横坐标确定为第j组坐标中的第二横坐标。
可选的,第三横坐标为若干个文字中的第一个有效文字的字符区域的左边缘在直方图中所对应的横坐标;
坐标识别子模块,还被配置为:
当若干个文字中包括有效文字和无效文字,有效文字与无效文字之间的第一距离大于相邻两个有效文字之间的第二距离时,从直方图中的预设横坐标处开始,向左侧查询首个宽度大于第二距离的间隙,将位于间隙右侧的首个前景色像素点的累加值大于第一阈值的横坐标确定为第三横坐标;预设横坐标为属于预设区间的坐标,预设区间为根据经验值设置的区间;间隙的前景色像素点的累加值小于第二阈值;
或者,
当若干个文字中均为有效文字时,将位于直方图中的左侧的首个前景色像素点的累加值大于第一阈值的横坐标确定为第三横坐标。
可选的,第三横坐标为若干个文字中的最后一个有效文字的字符区域的右边缘在直方图中所对应的横坐标;
坐标识别子模块,还被配置为:
当若干个文字中包括有效文字和无效文字,有效文字与无效文字之间的第一距离大于相邻两个有效文字之间的第二距离时,从直方图中的预设横坐标处开始,向右侧查询宽度大于第二距离的间隙,将位于间隙左侧的首个前景色像素点的累加值大于第一阈值的横坐标确定为第三横坐标;预设横坐标为属于预 设区间的坐标,预设区间为根据经验值设置的区间;间隙的前景色像素点的累加值小于第二阈值;
或者,
当若干个文字中均为有效文字时,将位于直方图右侧的首个前景色像素点的累加值大于第一阈值的横坐标确定为第三横坐标。
可选的,该装置还包括:
第二二值化模块,被配置为对目标图像区域进行二值化,得到二值化后的目标图像区域;
第二计算模块,被配置为对二值化后的目标图像区域按照水平方向计算水平直方图,水平直方图包括:每行像素点的竖坐标和每行像素点中前景色像素点的累加值;
边缘确定模块,被配置为根据水平直方图中的累加值的分布信息,确定若干组竖坐标,每组竖坐标包括第一竖坐标和位于第一竖坐标下侧的第二竖坐标;对于每组竖坐标,将第一竖坐标所在的像素行识别为一行文字区域的上边缘,将第二竖坐标所在的像素行识别为文字区域的下边缘;第一竖坐标以及第一竖坐标下侧的相邻竖坐标对应的累加值大于第一阈值,且第一竖坐标上侧的相邻竖坐标对应的累加值小于第二阈值;第二竖坐标以及第二竖坐标上侧的相邻竖坐标对应的累加值大于第一阈值,且第二竖坐标下侧的相邻竖坐标对应的累加值小于第二阈值;
第一二值化模块,还被配置为对于第k行文字区域,执行对文字区域进行二值化,得到二值化后的文字区域的步骤,m≥k≥1,k为正整数,m为识别得到的总行数。
根据本公开实施例的第三方面,提供一种区域识别装置,该装置包括:
处理器;
用于存储处理器可执行指令的存储器;
其中,处理器被配置为:
对文字区域进行二值化,得到二值化后的文字区域,该文字区域包括属于同一行的若干个文字;
对二值化后的文字区域按照竖直方向计算直方图,该直方图包括:每列像 素点的横坐标和每列像素点中前景色像素点的累加值;
根据直方图中的累加值的分布信息,识别文字区域中的文字的字符区域。
本公开的实施例提供的技术方案可以包括以下有益效果:
通过对二值化后的文字区域按照竖直方向计算直方图,根据直方图中的分布信息,识别文字区域中的文字的字符区域;解决了相关技术中文字区域定位准确度较低问题;达到了可以根据直方图中的前景色像素点的累加值的分布信息来精确定位文字的字符区域的效果。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性的,并不能限制本公开。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并于说明书一起用于解释本公开的原理。
图1是根据部分示例性实施例示出的一种文字区域的示意图;
图2是根据一示例性实施例示出的一种区域识别方法的流程图;
图3A是根据另一示例性实施例示出的一种区域识别方法的流程图;
图3B是根据另一示例性实施例示出的一种对文字区域二值化后的示意图;
图3C是根据另一示例性实施例示出的一种按照竖直方向计算得到的直方图的示意图;
图3D是根据另一示例性实施例示出的一种终端确定若干组横坐标的确定方法的流程图;
图3E是根据另一示例性实施例示出的终端识别得到的直方图中的第三横坐标的示意图;
图3F是根据另一示例性实施例示出的一种根据直方图确定得到的各组横坐标的示意图;
图3G是根据另一示例性实施例示出的一种直方图中的预设横坐标的示意图;
图3H是根据另一示例性实施例示出的一种根据直方图确定得到的第三横坐标的示意图;
图4是根据一示例性实施例示出的一种根据第三横坐标和累加值的分布信息来识别各组横坐标的识别方法的流程图;
图5是根据另一示例性实施例示出的另一种根据第三横坐标和累加值的分布信息来识别各组横坐标的识别方法的流程图;
图6是根据再一示例性实施例示出的终端识别得到文字区域的识别方法的流程图;
图7是根据一示例性实施例示出的一种区域识别装置的框图;
图8是根据另一示例性实施例示出的一种区域识别装置的框图;
图9是根据一示例性实施例示出的一种区域识别装置的框图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。
为了便于理解,首先对本公开各个示例性实施例所涉及的名词做简单介绍。
文字区域包括属于同一行的若干个文字,该文字区域可以是证件图像中的区域,也可以是文章扫描图像中的区域,还可以是电子文档中的区域,本实施例对此并不做限定。另外,本实施例中的文字区域的上边缘处于同一行中的文字的上边缘与上一行文字的下边缘之间;文字区域的下边缘处于该行文字的下边缘与下一行文字的上边缘之间。
比如,以文字区域为第二代身份证中的包含属于同一行的公民身份号码的区域来举例说明,请参考图1,文字区域的上边缘位于l1和l2之间,下边缘位于l3和l4之间。如,文字区域为图1中所示的区域11。
文字区域中的文字可以均是有效文字,也可以是有效文字和无效文字的组合。有效文字为需要识别字符区域的文字,无效文字为不需要识别字符区域的文字。例如,以需要识别图1中的公民身份号码中的每个数字的区域,也即公民身份号码为有效文字来举例说明,文字区域中的文字可以只包括其中的数字‘3 30421199012162834’,也可以在包括数字‘330421199012162834’的同时还包括位于数字左侧的至少一个汉字,如文字区域中的文字为‘号码330421199012162834’。其中,字符区域是指单个文字所对应的区域。
另外,本公开各个实施例中所说的文字可以为数字、字母、汉字、照片或者其他在计算得到的直方图中前景色像素点的值大于阈值的内容。
图2是根据一示例性实施例示出的一种区域识别方法的流程图,如图2所示,该区域识别方法包括以下步骤。
在步骤201中,对文字区域进行二值化,得到二值化后的文字区域,该文字区域包括属于同一行的若干个文字。
在步骤202中,对二值化后的文字区域按照竖直方向计算直方图,该直方图包括:每列像素点的横坐标和每列像素点中前景色像素点的累加值。
在步骤203中,根据直方图中的累加值的分布信息,识别文字区域中的文字的字符区域。
综上所述,本公开实施例中提供的区域识别方法,通过对二值化后的文字区域按照竖直方向计算直方图,根据直方图中的分布信息,识别文字区域中的文字的字符区域;解决了相关技术中文字区域定位准确度较低问题;达到了可以根据直方图中的前景色像素点的累加值的分布信息来精确定位文字的字符区域的效果。
图3A是根据另一示例性实施例示出的一种区域识别方法的流程图,如图3A所示,该区域识别方法包括以下步骤。
在步骤301中,对文字区域进行二值化,得到二值化后的文字区域,该文字区域包括属于同一行的若干个文字。
可选的,终端对该文字区域进行预处理,其中,预处理可以包括:去噪、滤波、取边缘等操作;将预处理后的文字区域进行二值化。
二值化是指将文字区域中的像素点的灰度值与预设灰度阈值比较,将文字区域中的像素点分成两部分:大于预设灰度阈值的像素群和小于预设灰度阈值的像素群,将两部分像素群在文字区域中分别呈现出黑和白两种不同的颜色,得到二值化后的文字区域,如图3B所示。其中,位于前景的一种颜色的像素点 称之为前景色像素点,也即图3B中的白色像素点;位于背景的一种颜色的像素点称之为背景色像素点,也即图3B中的黑色像素点。
在步骤302中,对二值化后的文字区域按照竖直方向计算直方图,该直方图包括:每列像素点的横坐标和每列像素点中前景色像素点的累加值。
对文字区域进行二值化之后,按照竖直方向计算直方图。该直方图在水平方向表示每列像素点的横坐标,在竖直方向表示每列像素点中前景色像素点的个数累加值;前景色像素点是指二值化后的文字区域中白色区域的像素点,其是相对于背景色像素点而言的。比如,终端计算得到如图3C所示的直方图。
在步骤303中,根据直方图中的累加值的分布信息确定若干组横坐标。
可选的,请参考图3D,本步骤可以包括:
在步骤303a中,根据累加值的分布信息,识别直方图中的第三横坐标。
其中,该第三横坐标为:若干个文字中的第一个有效文字的字符区域的左边缘在直方图中所对应的横坐标,或者,若干个文字中的最后一个有效文字的字符区域的右边缘在直方图中所对应的横坐标。
比如,以文字区域中的有效文字为图1中的公民身份号码来举例说明,请参考图3E,第三横坐标可以为:第一个有效数字‘3’的左边缘在直方图中所对应的横坐标X1,或者,最后一个有效数字‘4’的右边缘在直方图中所对应的横坐标X2
在步骤303b中,以第三横坐标为搜索起点,按照预定方向基于累加值的分布信息搜索出若干组横坐标。
在识别得到第三横坐标之后,终端可以以直方图中的第三横坐标为搜索起点,按照预定方向基于累加值的分布信息搜索出若干组横坐标。其中,在第三横坐标为第一个有效文字的字符区域的左边缘在直方图中所对应的横坐标时,预定方向为向右的方向;在第三横坐标为最后一个有效文字的字符区域的右边缘在直方图中所对应的横坐标时,预定方向为向左的方向。
横坐标的组数与文字区域中的有效文字的个数相对应,也即每组横坐标包括一个有效文字的字符区域的左边缘在直方图中所对应的第一横坐标以及该有效文字的字符区域的右边缘在直方图中所对应的第二横坐标。也即每组横坐标包括第一横坐标以及位于第一横坐标右侧的首个第二横坐标。其中,该第一横坐标以及第一横坐标右侧的相邻横坐标对应的累加值大于第一阈值,且第一横 坐标左侧的相邻横坐标对应的累加值小于第二阈值;第二横坐标以及第二横坐标左侧的相邻横坐标对应的累加值大于第一阈值,且第二横坐标右侧的相邻横坐标对应的累加值小于第二阈值。
比如,以第一横坐标为x1,第二横坐标为x2来举例说明,终端可以识别得到图3F所示各组横坐标。需要说明的是,图3F中只是以识别得到的某几组横坐标来做示例性说明,实际实现时还包括更多组的横坐标,本实施例对此并不做限定。
上述所说的第一阈值和第二阈值可以为数值较小的值。比如,第一阈值和第二阈值为略大于0的数值。可选的,第一阈值可以为0,第二阈值可以为接近0的数值。并且,实际实现时,第一横坐标以及第一横坐标右侧的相邻横坐标对应的累加值不为0,且第一横坐标左侧的相邻横坐标对应的累加值为0;第二横坐标以及第二横坐标左侧的相邻横坐标对应的累加值不为0,且第二横坐标右侧的相邻横坐标对应的累加值为0。
在步骤304中,对于每组横坐标,将第一横坐标所在的像素列识别为一个字符区域的左边缘,将第二横坐标所在的像素列识别为字符区域的右边缘。
综上所述,本公开实施例中提供的区域识别方法,通过对二值化后的文字区域按照竖直方向计算直方图,根据直方图中的分布信息,识别文字区域中的文字的字符区域;解决了相关技术中文字区域定位准确度较低问题;达到了可以根据直方图中的前景色像素点的累加值的分布信息来精确定位文字的字符区域的效果。
在图3A所示的实施例中,在文字区域中的文字同时包括有效文字和无效文字,且有效文字与无效文字之间的第一距离大于相邻两个有效文字之间的第二距离时,步骤303a可以包括:
终端从直方图中的预设横坐标处开始,向左侧查询首个宽度大于第二距离的间隙,并将位于间隙右侧的首个前景色像素点的累加值大于第一阈值的横坐标确定为第三横坐标。
预设横坐标为属于预设区间的坐标,预设区间属于文字区域中的有效文字在直方图中所对应的映射区间。比如,以有效文字为公民身份号码来举例说明,则结合图3E,预设区间属于图中的[X1,X2]的区间。该预设区间通常是根据经 验值设置的区间。且间隙的前景色像素点的累加值小于第二阈值。
以有效文字为图1中的公民身份号码来举例说明,由于第二代身份证中水平方向上的一半的位置处肯定是公民身份号码,所以预设横坐标可以为第二代身份证中水平方向上的一半的位置在直方图中所对应的横坐标。比如,以文字区域的宽度为第二代身份证的整个宽度为例,请参考图3G,预设横坐标即为图中的X0处。终端可以从X0处向左侧开始查询。由于公民身份号码中的相邻两个数字之间的第一距离远小于汉字‘码’与首位数字之间的第二距离,所以在终端向左侧查询到宽度大于第二距离的间隙,也即图中的间隙d之后,终端可以将位于间隙右侧的首个前景色像素点的累加值大于第一阈值的横坐标确定为第三横坐标,也即确定得到第三横坐标X1
上述只是以终端从预设横坐标处开始向左侧查询来举例说明,类似的,终端还可以从预设横坐标处开始向右侧查询,在查询到宽度大于第二距离的间隙后,将位于间隙左侧的首个前景色像素点的累加值大于第一阈值的横坐标确定为第三横坐标,本实施例在此不再举例说明。
在图3A所示的可选实施例中,若文字区域中的文字均为有效文字,则步骤303a可以包括如下步骤。
在文字区域中的文字均为有效文字时,终端在计算二值化后的文字区域的直方图之后,计算得到的直方图中的左侧的首个前景色像素点的累加值大于第一阈值的横坐标处即为第一个有效文字在直方图中所对应的横坐标,所以终端可以将位于直方图中的左侧的首个前景色像素点的累加值大于第一阈值的横坐标确定为第三横坐标。
比如,请参考图3H,终端可以将图中的X1确定为第三横坐标。
类似的,直方图中的右侧的首个前景色像素点的累加值大于第一阈值的横坐标处即为最后一个有效文字在直方图中所对应的横坐标,所以终端可以将位于直方图右侧的首个前景色像素点的累加值大于第一阈值的横坐标确定为第三横坐标。仍然参考图3H,终端可以将图中的X2确定为第三横坐标。
在图3A所示的实施例中,在终端通过上述计算方式计算得到第三横坐标之后,若计算得到的第三横坐标为若干个文字中的第一个有效文字的字符区域的 左边缘在直方图中所对应的横坐标,则请参考图4,步骤303b可以替换为步骤401~步骤404。
在步骤401中,对于第i组横坐标,以直方图中的第i组横坐标中的第一横坐标为搜索起点,向右搜索首个第四横坐标。
1≤i≤n,i为初始值为1的正整数,n为若干个文字中的有效文字的个数。比如,以有效文字为第二代身份证中的公民身份号码来举例说明,有效文字的个数n即为18。
此外,第1组坐标中的第一横坐标为第三横坐标,且第四横坐标以及第四横坐标左侧的相邻横坐标对应的累加值大于第一阈值,第四横坐标右侧的相邻横坐标对应的累加值小于第二阈值。
以i为1来举例说明,由于第一组横坐标中的第一横坐标x1为第三横坐标X1,则终端即可从直方图中的X1为搜索起点,向左侧搜索进而确定得到首个第四横坐标。
在步骤402中,将第四横坐标确定为第i组坐标中的第二横坐标。
在步骤403中,若i<n,则以直方图中的第四横坐标为搜索起点,向右侧搜索首个第五横坐标。
如果i<n,则终端可以确定右侧还有未识别字符区域的有效文字,此时,终端可以以确定的当前有效文字的字符区域的右边缘在直方图中所对应的横坐标为搜索起点,向右侧继续搜索下一个有效文字的字符区域的左边缘在直方图中所对应的横坐标。
可选的,终端可以以直方图中的第四横坐标为搜索起点,向右侧搜索首个第五横坐标,第五横坐标以及第五横坐标右侧的相邻横坐标对应的累加值大于第一阈值,第五横坐标左侧的相邻横坐标对应的累加值小于第二阈值。
在步骤404中,令i=i+1,将第五横坐标确定为第i组坐标中的第一横坐标。
终端将搜索得到的横坐标确定为下一个有效文字的字符区域的左边缘在直方图中所对应的横坐标。可选的,终端令i=i+1,将第五横坐标确定为第i组坐标中的第一横坐标。
此后,终端依据上述方法继续向右侧搜索进而确定得到各个有效文字的字符区域所对应的一组横坐标。
在图3A所示的实施例中,在终端通过上述计算方式计算得到第三横坐标之后,若计算得到的第三横坐标为:若干个文字中的最后一个有效文字的字符区域的右边缘在直方图中所对应的横坐标,则请参考图5,步骤303b可以替换为步骤501~步骤504。
在步骤501中,对于第j组坐标,以直方图中的第j组坐标中的第二横坐标为搜索起点,向左搜索首个第六横坐标。
第六横坐标以及第六横坐标右侧的相邻横坐标对应的累加值大于第一阈值,第六横坐标左侧的相邻横坐标对应的累加值小于第二阈值;1≤j≤n,j是初始值为n的正整数,n为若干个文字中的有效文字的个数;第n组坐标中的第二横坐标为第三横坐标。
在步骤502中,将第六横坐标确定为第j组坐标中的第一横坐标。
在步骤503中,若j>0,则以直方图中的六横坐标为搜索起点,向左侧搜索首个第七横坐标,第七横坐标以及第七横坐标左侧的相邻横坐标对应的累加值大于第一阈值,第七横坐标右侧的相邻横坐标对应的累加值小于第二阈值;1≤j≤n,j为初始值为n的正整数。
在步骤504中,将令j=j-1,将第七横坐标确定为第j组坐标中的第二横坐标。
需要补充说明的是,步骤501~步骤504与上述步骤401~步骤404类似,两者不同的是,步骤401~步骤404提供的方法为从左侧往右搜索,而步骤501~步骤504为从右往左搜索,所以本实施例在此将不再赘述。
基于上述提供的各个实施例,请参照图6,在对文字区域进行二值化,进而得到二值化后的文字区域之前,终端还可以执行如下步骤:
在步骤601中,对目标图像区域进行二值化,得到二值化后的目标图像区域。
目标图像区域可以是包括多行文字的区域。
本步骤与上述实施例中的步骤301类似,详细技术细节请参考上述实施例,本实施例对此并不做限定。
在步骤602中,对二值化后的目标图像区域按照水平方向计算水平直方图,水平直方图包括:每行像素点的竖坐标和每行像素点中前景色像素点的累加值。
本步骤与上述实施例中的步骤302类似,不同的是,步骤302中对二值化后的文字区域按照竖直文件计算直方图,而本步骤为对二值化后的目标图像区域按照水平方向计算直方图。
在步骤603中,根据水平直方图中的累加值的分布信息,确定若干组竖坐标,每组竖坐标包括第一竖坐标和位于第一竖坐标下侧的第二竖坐标;对于每组竖坐标,将第一竖坐标所在的像素行识别为一行文字区域的上边缘,将第二竖坐标所在的像素行识别为文字区域的下边缘。
在计算得到水平方向的水平直方图之后,终端可以根据水平直方图中的累加值的分布信息,确定若干组竖坐标,然后根据各组竖坐标确定每一行的区域。
本步骤与上述实施例中的根据竖直方向的直方图中的累加值的分布信息,确定若干组横坐标,进而根据每组横坐标确定文字的左边缘和右边缘类似,详细技术细节请参考上述实施例。
其中,第一竖坐标以及第一竖坐标下侧的相邻竖坐标对应的累加值大于第一阈值,且第一竖坐标上侧的相邻竖坐标对应的累加值小于第二阈值;第二竖坐标以及第二竖坐标上侧的相邻竖坐标对应的累加值大于第一阈值,且第二竖坐标下侧的相邻竖坐标对应的累加值小于第二阈值。
在步骤604中,对于第k行文字区域,执行对文字区域进行二值化,得到二值化后的文字区域的步骤,m≥k≥1,k为正整数,m为识别得到的总行数。
在步骤603中识别得到各行的文字区域之后,对于每一行文字区域,终端可以执行对文字区域进行二值化,得到二值化后的文字区域的操作。
至此,终端可以识别得到目标图像区域中的每一行中的每一个有效文字的字符区域。
上述实施例只是以终端根据水平方向的直方图来确定文字区域来举例说明,可选的,终端还可以通过其他确定方式来确定文字区域。比如,终端可以通过图像定位技术来定位得到文字区域。以文字区域为第二代身份证中的公民身份号码来举例说明,由于公民身份号码在第二代身份证中的位置比较固定,且公民身份号码与上方的地址和头像之间的距离较远,所以终端可以直接定位得到证件图像的下方1/5区域,将定位得到的图像区域作为文字区域,本实施例对此并不做限定。
下述为本公开装置实施例,可以用于执行本公开方法实施例。对于本公开装置实施例中未披露的细节,请参照本公开方法实施例。
图7是根据一示例性实施例示出的一种区域识别装置的框图,如图7所示,该区域识别装置包括但不限于:第一二值化模块710、第一计算模块720和区域识别模块730。
第一二值化模块710,被配置为对文字区域进行二值化,得到二值化后的文字区域,该文字区域包括属于同一行的若干个文字。
第一计算模块720,被配置为对二值化后的文字区域按照竖直方向计算直方图,直方图包括:每列像素点的横坐标和每列像素点中前景色像素点的累加值。
区域识别模块730,被配置为根据直方图中的累加值的分布信息,识别文字区域中的文字的字符区域。
综上所述,本公开实施例中提供的区域识别装置,通过对二值化后的文字区域按照竖直方向计算直方图,根据直方图中的分布信息,识别文字区域中的文字的字符区域;解决了相关技术中文字区域定位准确度较低问题;达到了可以根据直方图中的前景色像素点的累加值的分布信息来精确定位文字的字符区域的效果。
图8是根据另一示例性实施例示出的一种区域识别装置的框图,如图8所示,该区域识别装置包括但不限于:第一二值化模块810、第一计算模块820和区域识别模块830。
第一二值化模块810,被配置为对文字区域进行二值化,得到二值化后的文字区域,该文字区域包括属于同一行的若干个文字。
可选的,第一二值化模块810对文字区域进行预处理,其中,预处理可以包括:去噪、滤波、取边缘等操作;将预处理后的文字区域进行二值化。
二值化是指将文字区域中的像素点的灰度值与预设灰度阈值比较,将文字区域中的像素点分成两部分:大于预设灰度阈值的像素群和小于预设灰度阈值的像素群,将两部分像素群在文字区域中分别呈现出黑和白两种不同的颜色,得到二值化后的文字区域。
第一计算模块820,被配置为对二值化后的文字区域按照竖直方向计算直方 图,直方图包括:每列像素点的横坐标和每列像素点中前景色像素点的累加值。
第一二值化模块810对文字区域进行二值化之后,第一计算模块820按照竖直方向计算直方图。该直方图在水平方向表示每列像素点的横坐标,在竖直方向表示每列像素点中前景色像素点的个数累加值;前景色像素点是指二值化后的文字区域中白色区域的像素点,其是相对于背景色像素点而言的。
区域识别模块830,被配置为根据直方图中的累加值的分布信息,识别文字区域中的文字的字符区域。
在一种可能的实施方式中,区域识别模块830,包括:坐标确定子模块831和区域识别子模块832。
坐标确定子模块831,被配置为根据直方图中的累加值的分布信息确定若干组横坐标,每组横坐标包括第一横坐标以及位于第一横坐标右侧的首个第二横坐标;第一横坐标以及第一横坐标右侧的相邻横坐标对应的累加值大于第一阈值,且第一横坐标左侧的相邻横坐标对应的累加值小于第二阈值;第二横坐标以及第二横坐标左侧的相邻横坐标对应的累加值大于第一阈值,且第二横坐标右侧的相邻横坐标对应的累加值小于第二阈值。
区域识别子模块832,被配置为对于每组横坐标,将第一横坐标所在的像素列识别为一个字符区域的左边缘,将第二横坐标所在的像素列识别为字符区域的右边缘。
可选的,坐标确定子模块831,包括:坐标识别子模块831a和坐标搜索子模块831b。
坐标识别子模块831a,被配置为根据累加值的分布信息,识别直方图中的第三横坐标,第三横坐标为:若干个文字中的第一个有效文字的字符区域的左边缘在直方图中所对应的坐标,或者,若干个文字中的最后一个有效文字的字符区域的右边缘在直方图中所对应的坐标。
坐标搜索子模块831b,被配置为以第三横坐标为搜索起点,按照预定方向基于累加值的分布信息搜索出若干组横坐标。
坐标识别子模块831a在识别得到第三横坐标之后,坐标搜索子模块831b可以以直方图中的第三横坐标为搜索起点,按照预定方向基于累加值的分布信息搜索出若干组横坐标。其中,在第三横坐标为第一个有效文字的字符区域的左边缘在直方图中所对应的横坐标时,预定方向为向右的方向;在第三横坐标 为最后一个有效文字的字符区域的右边缘在直方图中所对应的横坐标时,预定方向为向左的方向。
横坐标的组数与文字区域中的有效文字的个数相对应,也即每组横坐标包括一个有效文字的字符区域的左边缘在直方图中所对应的第一横坐标以及该有效文字的字符区域的右边缘在直方图中所对应的第二横坐标。也即每组横坐标包括第一横坐标以及位于第一横坐标右侧的首个第二横坐标。其中,该第一横坐标以及第一横坐标右侧的相邻横坐标对应的累加值大于第一阈值,且第一横坐标左侧的相邻横坐标对应的累加值小于第二阈值;第二横坐标以及第二横坐标左侧的相邻横坐标对应的累加值大于第一阈值,且第二横坐标右侧的相邻横坐标对应的累加值小于第二阈值。
上述所说的第一阈值和第二阈值可以为数值较小的值。比如,第一阈值和第二阈值为略大于0的数值。可选的,第一阈值可以为0,第二阈值可以为接近0的数值。并且,实际实现时,第一横坐标以及第一横坐标右侧的相邻横坐标对应的累加值不为0,且第一横坐标左侧的相邻横坐标对应的累加值为0;第二横坐标以及第二横坐标左侧的相邻横坐标对应的累加值不为0,且第二横坐标右侧的相邻横坐标对应的累加值为0。
可选的,第三横坐标为若干个文字中的第一个有效文字的字符区域的左边缘在直方图中所对应的横坐标;
坐标搜索子模块831b,还被配置为:
对于第i组横坐标,以直方图中的第i组横坐标中的第一横坐标为搜索起点,向右搜索首个第四横坐标,第四横坐标以及第四横坐标左侧的相邻横坐标对应的累加值大于第一阈值,第四横坐标右侧的相邻横坐标对应的累加值小于第二阈值;1≤i≤n,i为初始值为1的正整数,n为若干个文字中的有效文字的个数;第1组坐标中的第一横坐标为第三横坐标。
将第四横坐标确定为第i组坐标中的第二横坐标。
若i<n,则以直方图中的第四横坐标为搜索起点,向右侧搜索首个第五横坐标,第五横坐标以及第五横坐标右侧的相邻横坐标对应的累加值大于第一阈值,第五横坐标左侧的相邻横坐标对应的累加值小于第二阈值。
令i=i+1,将第五横坐标确定为第i组坐标中的第一横坐标。
坐标搜索子模块831b将搜索得到的横坐标确定为下一个有效文字的字符区 域的左边缘在直方图中所对应的横坐标。可选的,坐标搜索子模块831b令i=i+1,将第五横坐标确定为第i组坐标中的第一横坐标。
此后,坐标搜索子模块831b依据上述方法继续向右侧搜索进而确定得到各个有效文字的字符区域所对应的一组横坐标。
可选的,第三横坐标为若干个文字中的最后一个有效文字的字符区域的右边缘在直方图中所对应的横坐标;
坐标搜索子模块831b,还被配置为:
对于第j组坐标,以直方图中的第j组坐标中的第二横坐标为搜索起点,向左搜索首个第六横坐标,第六横坐标以及第六横坐标右侧的相邻横坐标对应的累加值大于第一阈值,第六横坐标左侧的相邻横坐标对应的累加值小于第二阈值;1≤j≤n,j是初始值为n的正整数,n为若干个文字中的有效文字的个数;第n组坐标中的第二横坐标为第三横坐标。
将第六横坐标确定为第j组坐标中的第一横坐标。
若j>0,则以直方图中的六横坐标为搜索起点,向左侧搜索首个第七横坐标,第七横坐标以及第七横坐标左侧的相邻横坐标对应的累加值大于第一阈值,第七横坐标右侧的相邻横坐标对应的累加值小于第二阈值;1≤j≤n,j为初始值为n的正整数。
将令j=j-1,将第七横坐标确定为第j组坐标中的第二横坐标。
可选的,第三横坐标为若干个文字中的第一个有效文字的字符区域的左边缘在直方图中所对应的横坐标;
需要补充说明的是,当第三横坐标为若干个文字中的最后一个有效文字的字符区域的右边缘在直方图中所对应的横坐标时,坐标搜索子模块831b的执行步骤与上述第三横坐标为若干个文字中的第一个有效文字的字符区域的左边缘在直方图中所对应的横坐标时,执行的步骤类似,所以本实施例在此将不再赘述。
坐标识别子模块831a,还被配置为:
当若干个文字中包括有效文字和无效文字,有效文字与无效文字之间的第一距离大于相邻两个有效文字之间的第二距离时,从直方图中的预设横坐标处开始,向左侧查询首个宽度大于第二距离的间隙,将位于间隙右侧的首个前景色像素点的累加值大于第一阈值的横坐标确定为第三横坐标;预设横坐标为属 于预设区间的坐标,预设区间为根据经验值设置的区间;间隙的前景色像素点的累加值小于第二阈值。
或者,
当若干个文字中均为有效文字时,将位于直方图中的左侧的首个前景色像素点的累加值大于第一阈值的横坐标确定为第三横坐标。
可选的,在文字区域中的文字同时包括有效文字和无效文字,且有效文字与无效文字之间的第一距离大于相邻两个有效文字之间的第二距离时,坐标识别子模块831a从直方图中的预设横坐标处开始,向左侧查询首个宽度大于第二距离的间隙,并将位于间隙右侧的首个前景色像素点的累加值大于第一阈值的横坐标确定为第三横坐标。
预设横坐标为属于预设区间的坐标,预设区间属于文字区域中的有效文字在直方图中所对应的映射区间。
上述只是以坐标识别子模块831a从预设横坐标处开始向左侧查询来说明,类似的,坐标识别子模块831a还可以从预设横坐标处开始向右侧查询,在查询到宽度大于第二距离的间隙后,将位于间隙左侧的首个前景色像素点的累加值大于第一阈值的横坐标确定为第三横坐标,本实施例在此不再说明。
可选的,在文字区域中的文字均为有效文字时,在计算得到二值化后的文字区域的直方图之后,计算得到的直方图中的左侧的首个前景色像素点的累加值大于第一阈值的横坐标处即为第一个有效文字在直方图中所对应的横坐标,所以坐标识别子模块831a可以将位于直方图中的左侧的首个前景色像素点的累加值大于第一阈值的横坐标确定为第三横坐标。
可选的,第三横坐标为若干个文字中的最后一个有效文字的字符区域的右边缘在直方图中所对应的横坐标;
坐标识别子模块831a,还被配置为:
当若干个文字中包括有效文字和无效文字,有效文字与无效文字之间的第一距离大于相邻两个有效文字之间的第二距离时,从直方图中的预设横坐标处开始,向右侧查询宽度大于第二距离的间隙,将位于间隙左侧的首个前景色像素点的累加值大于第一阈值的横坐标确定为第三横坐标;预设横坐标为属于预设区间的坐标,预设区间为根据经验值设置的区间;间隙的前景色像素点的累加值小于第二阈值。
或者,
当若干个文字中均为有效文字时,将位于直方图右侧的首个前景色像素点的累加值大于第一阈值的横坐标确定为第三横坐标。
需要补充说明的是,当第三横坐标为若干个文字中的最后一个有效文字的字符区域的右边缘在直方图中所对应的横坐标时,坐标识别子模块831a的执行步骤与上述第三横坐标为若干个文字中的第一个有效文字的字符区域的左边缘在直方图中所对应的横坐标时,执行的步骤类似,所以本实施例在此将不再赘述。
可选的,该装置还包括:第二二值化模块840、第二计算模块850和边缘确定模块860。
第二二值化模块840,被配置为对目标图像区域进行二值化,得到二值化后的目标图像区域。
目标图像区域可以是包括多行文字的区域。
第二二值化模块840与第一二值化模块810类似,详细技术细节请参考第一二值化模块810,本实施例对第二二值化模块840不做限定。
第二计算模块850,被配置为对二值化后的目标图像区域按照水平方向计算水平直方图,水平直方图包括:每行像素点的竖坐标和每行像素点中前景色像素点的累加值。
第二计算模块850与第一计算模块820类似,不同的是,第一计算模块820是对二值化后的文字区域按照竖直文件计算直方图,而第二计算模块850是对二值化后的目标图像区域按照水平方向计算直方图。
边缘确定模块860,被配置为根据水平直方图中的累加值的分布信息,确定若干组竖坐标,每组竖坐标包括第一竖坐标和位于第一竖坐标下侧的第二竖坐标;对于每组竖坐标,将第一竖坐标所在的像素行识别为一行文字区域的上边缘,将第二竖坐标所在的像素行识别为文字区域的下边缘;第一竖坐标以及第一竖坐标右侧的相邻横坐标对应的累加值大于第一阈值,且第一竖坐标左侧的相邻横坐标对应的累加值小于第二阈值;第二竖坐标以及第二竖坐标左侧的相邻横坐标对应的累加值大于第一阈值,且第二竖坐标右侧的相邻横坐标对应的累加值小于第二阈值。
在计算得到水平方向的水平直方图之后,边缘确定模块860可以根据水平 直方图中的累加值的分布信息,确定若干组竖坐标,然后根据各组竖坐标确定每一行的区域。
第一二值化模块810,还被配置为对于第k行文字区域,执行对文字区域进行二值化,得到二值化后的文字区域的步骤,m≥k≥1,k为正整数,m为识别得到的总行数。
至此,该区域识别装置可以识别得到目标图像区域中的每一行中的每一个有效文字的字符区域。
综上所述,本公开实施例中提供的区域识别装置,通过对二值化后的文字区域按照竖直方向计算直方图,根据直方图中的分布信息,识别文字区域中的文字的字符区域;解决了相关技术中文字区域定位准确度较低问题;达到了可以根据直方图中的前景色像素点的累加值的分布信息来精确定位文字的字符区域的效果。
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
本公开一示例性实施例提供了一种区域识别装置,能够实现本公开提供的区域识别方法,该区域识别装置包括:处理器、用于存储处理器可执行指令的存储器;
其中,处理器被配置为:
对文字区域进行二值化,得到二值化后的文字区域,该文字区域包括属于同一行的若干个文字;
对二值化后的文字区域按照竖直方向计算直方图,该直方图包括:每列像素点的横坐标和每列像素点中前景色像素点的累加值;
根据直方图中的累加值的分布信息,识别文字区域中的文字的字符区域。
图9是根据一示例性实施例示出的一种区域识别装置的框图。例如,装置900可以是移动电话,计算机,数字广播终端,消息收发设备,游戏控制台,平板设备,医疗设备,健身设备,个人数字助理等。
参照图9,装置900可以包括以下一个或多个组件:处理组件902,存储器 904,电源组件906,多媒体组件908,音频组件910,输入/输出(I/O)接口912,传感器组件914,以及通信组件916。
处理组件902通常控制装置900的整体操作,诸如与显示,电话呼叫,数据通信,相机操作和记录操作相关联的操作。处理组件902可以包括一个或多个处理器918来执行指令,以完成上述的方法的全部或部分步骤。此外,处理组件902可以包括一个或多个模块,便于处理组件902和其他组件之间的交互。例如,处理组件902可以包括多媒体模块,以方便多媒体组件908和处理组件902之间的交互。
存储器904被配置为存储各种类型的数据以支持在装置900的操作。这些数据的示例包括用于在装置900上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。存储器904可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
电源组件906为装置900的各种组件提供电力。电源组件906可以包括电源管理系统,一个或多个电源,及其他与为装置900生成、管理和分配电力相关联的组件。
多媒体组件908包括在装置900和用户之间的提供一个输出接口的屏幕。在一些实施例中,屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与触摸或滑动操作相关的持续时间和压力。在一些实施例中,多媒体组件908包括一个前置摄像头和/或后置摄像头。当装置900处于操作模式,如拍摄模式或视频模式时,前置摄像头和/或后置摄像头可以接收外部的多媒体数据。每个前置摄像头和后置摄像头可以是一个固定的光学透镜系统或具有焦距和光学变焦能力。
音频组件910被配置为输出和/或输入音频信号。例如,音频组件910包括一个麦克风(MIC),当装置900处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进 一步存储在存储器904或经由通信组件916发送。在一些实施例中,音频组件910还包括一个扬声器,用于输出音频信号。
I/O接口912为处理组件902和外围接口模块之间提供接口,上述外围接口模块可以是键盘,点击轮,按钮等。这些按钮可包括但不限于:主页按钮、音量按钮、启动按钮和锁定按钮。
传感器组件914包括一个或多个传感器,用于为装置900提供各个方面的状态评估。例如,传感器组件914可以检测到装置900的打开/关闭状态,组件的相对定位,例如组件为装置900的显示器和小键盘,传感器组件914还可以检测装置900或装置900一个组件的位置改变,用户与装置900接触的存在或不存在,装置900方位或加速/减速和装置900的温度变化。传感器组件914可以包括接近传感器,被配置用来在没有任何的物理接触时检测附近物体的存在。传感器组件914还可以包括光传感器,如CMOS或CCD图像传感器,用于在成像应用中使用。在一些实施例中,该传感器组件914还可以包括加速度传感器,陀螺仪传感器,磁传感器,压力传感器或温度传感器。
通信组件916被配置为便于装置900和其他设备之间有线或无线方式的通信。装置900可以接入基于通信标准的无线网络,如Wi-Fi,2G或3G,或它们的组合。在一个示例性实施例中,通信组件916经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,通信组件916还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。
在示例性实施例中,装置900可以被一个或多个应用专用集成电路(ASIC)、数字信号处理器(DSP)、数字信号处理设备(DSPD)、可编程逻辑器件(PLD)、现场可编程门阵列(FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述区域识别方法。
在示例性实施例中,还提供了一种包括指令的非临时性计算机可读存储介质,例如包括指令的存储器904,上述指令可由装置900的处理器918执行以完成上述区域识别方法。例如,非临时性计算机可读存储介质可以是ROM、随机存取存储器(RAM)、CD-ROM、磁带、软盘和光数据存储设备等。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其它实施方案。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。

Claims (17)

  1. 一种区域识别方法,其特征在于,所述方法包括:
    对文字区域进行二值化,得到二值化后的文字区域,所述文字区域包括属于同一行的若干个文字;
    对所述二值化后的文字区域按照竖直方向计算直方图,所述直方图包括:每列像素点的横坐标和所述每列像素点中前景色像素点的累加值;
    根据所述直方图中的所述累加值的分布信息,识别所述文字区域中的所述文字的字符区域。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述直方图中的所述累加值的分布信息,识别所述文字区域中的所述文字的字符区域,包括:
    根据所述直方图中的所述累加值的分布信息确定若干组横坐标,每组所述横坐标包括第一横坐标以及位于所述第一横坐标右侧的首个第二横坐标;所述第一横坐标以及所述第一横坐标右侧的相邻横坐标对应的所述累加值大于第一阈值,且所述第一横坐标左侧的相邻横坐标对应的所述累加值小于第二阈值;所述第二横坐标以及所述第二横坐标左侧的相邻横坐标对应的所述累加值大于所述第一阈值,且所述第二横坐标右侧的相邻横坐标对应的所述累加值小于所述第二阈值;
    对于每组所述横坐标,将所述第一横坐标所在的像素列识别为一个字符区域的左边缘,将所述第二横坐标所在的像素列识别为所述字符区域的右边缘。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述直方图中的所述累加值的分布信息确定若干组横坐标,包括:
    根据所述累加值的分布信息,识别所述直方图中的第三横坐标,所述第三横坐标为:所述若干个文字中的第一个有效文字的字符区域的左边缘在所述直方图中所对应的横坐标,或者,所述若干个文字中的最后一个有效文字的字符区域的右边缘在所述直方图中所对应的横坐标;
    以所述第三横坐标为搜索起点,按照预定方向基于所述累加值的分布信息搜索出所述若干组横坐标。
  4. 根据权利要求3所述的方法,其特征在于,所述第三横坐标为所述若干个文字中的第一个有效文字的字符区域的左边缘在所述直方图中所对应的横坐标;
    所述以所述第三横坐标为搜索起点,按照预定方向基于所述累加值的分布信息搜索出所述若干组横坐标,包括:
    对于第i组横坐标,以所述直方图中的所述第i组横坐标中的所述第一横坐标为搜索起点,向右搜索首个第四横坐标,所述第四横坐标以及所述第四横坐标左侧的相邻横坐标对应的所述累加值大于所述第一阈值,所述第四横坐标右侧的相邻横坐标对应的所述累加值小于所述第二阈值;1≤i≤n,i为初始值为1的正整数,n为所述若干个文字中的有效文字的个数;所述第1组坐标中的所述第一横坐标为所述第三横坐标;
    将所述第四横坐标确定为所述第i组坐标中的所述第二横坐标;
    若i<n,则以所述直方图中的所述第四横坐标为搜索起点,向右侧搜索首个第五横坐标,所述第五横坐标以及所述第五横坐标右侧的相邻横坐标对应的所述累加值大于所述第一阈值,所述第五横坐标左侧的相邻横坐标对应的所述累加值小于所述第二阈值;
    令i=i+1,将所述第五横坐标确定为所述第i组坐标中的所述第一横坐标。
  5. 根据权利要求3所述的方法,其特征在于,所述第三横坐标为所述若干个文字中的最后一个有效文字的字符区域的右边缘在所述直方图中所对应的横坐标;
    所述以所述第三横坐标为搜索起点,按照预定方向基于所述累加值的分布信息搜索出所述若干组横坐标,包括:
    对于第j组坐标,以所述直方图中的所述第j组坐标中的所述第二横坐标为搜索起点,向左搜索首个第六横坐标,所述第六横坐标以及所述第六横坐标右侧的相邻横坐标对应的所述累加值大于所述第一阈值,所述第六横坐标左侧的相邻横坐标对应的所述累加值小于所述第二阈值;1≤j≤n,j是初始值为n的正整数,n为所述若干个文字中的有效文字的个数;所述第n组坐标中的所述第二横坐标为所述第三横坐标;
    将所述第六横坐标确定为所述第j组坐标中的所述第一横坐标;
    若j>0,则以所述直方图中的所述六横坐标为搜索起点,向左侧搜索首个第七横坐标,所述第七横坐标以及所述第七横坐标左侧的相邻横坐标对应的所述累加值大于所述第一阈值,所述第七横坐标右侧的相邻横坐标对应的所述累加值小于所述第二阈值;1≤j≤n,j为初始值为n的正整数;
    将令j=j-1,将所述第七横坐标确定为所述第j组坐标中的所述第二横坐标。
  6. 根据权利要求3所述的方法,其特征在于,所述第三横坐标为所述若干个文字中的第一个有效文字的字符区域的左边缘在所述直方图中所对应的横坐标;
    所述根据所述累加值的分布信息,识别所述直方图中的第三横坐标,包括:
    所述若干个文字中包括有效文字和无效文字,所述有效文字与所述无效文字之间的第一距离大于相邻两个有效文字之间的第二距离;从所述直方图中的预设横坐标处开始,向左侧查询首个宽度大于所述第二距离的间隙,将位于所述间隙右侧的首个前景色像素点的累加值大于所述第一阈值的横坐标确定为所述第三横坐标;所述预设横坐标为属于预设区间的坐标,所述预设区间为根据经验值设置的区间;所述间隙的前景色像素点的累加值小于所述第二阈值;
    或者,
    所述若干个文字中均为有效文字,将位于所述直方图中的左侧的首个前景色像素点的累加值大于所述第一阈值的横坐标确定为所述第三横坐标。
  7. 根据权利要求3所述的方法,其特征在于,所述第三横坐标为:所述若干个文字中的最后一个有效文字的字符区域的右边缘在所述直方图中所对应的横坐标;
    所述根据所述累加值的分布信息,识别所述直方图中的第三横坐标,包括:
    所述若干个文字中包括有效文字和无效文字,所述有效文字与所述无效文字之间的第一距离大于相邻两个有效文字之间的第二距离,从所述直方图中的预设横坐标处开始,向右侧查询宽度大于所述第二距离的间隙,将位于所述间隙左侧的首个前景色像素点的累加值大于所述第一阈值的横坐标确定为所述第三横坐标;所述预设横坐标为属于预设区间的坐标,所述预设区间为根据经验 值设置的区间;所述间隙的前景色像素点的累加值小于所述第二阈值;
    或者,
    所述若干个文字中均为有效文字,将位于所述直方图右侧的首个前景色像素点的累加值大于所述第一阈值的横坐标确定为所述第三横坐标。
  8. 根据权利要求1至7任一所述的方法,其特征在于,所述方法还包括:
    对目标图像区域进行二值化,得到二值化后的目标图像区域;
    对所述二值化后的目标图像区域按照水平方向计算水平直方图,所述水平直方图包括:每行像素点的竖坐标和所述每行像素点中前景色像素点的累加值;
    根据所述水平直方图中的所述累加值的分布信息,确定若干组竖坐标,每组竖坐标包括第一竖坐标和位于所述第一竖坐标下侧的第二竖坐标;对于每组竖坐标,将所述第一竖坐标所在的像素行识别为一行文字区域的上边缘,将所述第二竖坐标所在的像素行识别为所述文字区域的下边缘;所述第一竖坐标以及所述第一竖坐标下侧的相邻竖坐标对应的所述累加值大于第一阈值,且所述第一竖坐标上侧的相邻竖坐标对应的所述累加值小于第二阈值;所述第二竖坐标以及所述第二竖坐标上侧的相邻竖坐标对应的所述累加值大于所述第一阈值,且所述第二竖坐标下侧的相邻竖坐标对应的所述累加值小于所述第二阈值;
    对于第k行文字区域,执行所述对文字区域进行二值化,得到二值化后的文字区域的步骤,m≥k≥1,k为正整数,m为识别得到的总行数。
  9. 一种区域识别装置,其特征在于,所述装置包括:
    第一二值化模块,被配置为对文字区域进行二值化,得到二值化后的文字区域,所述文字区域包括属于同一行的若干个文字;
    第一计算模块,被配置为对所述二值化后的文字区域按照竖直方向计算直方图,所述直方图包括:每列像素点的横坐标和所述每列像素点中前景色像素点的累加值;
    区域识别模块,被配置为根据所述直方图中的所述累加值的分布信息,识别所述文字区域中的所述文字的字符区域。
  10. 根据权利要求9所述的装置,其特征在于,所述区域识别模块,包括:
    坐标确定子模块,被配置为根据所述直方图中的所述累加值的分布信息确定若干组横坐标,每组所述横坐标包括第一横坐标以及位于所述第一横坐标右侧的首个第二横坐标;所述第一横坐标以及所述第一横坐标右侧的相邻横坐标对应的所述累加值大于第一阈值,且所述第一横坐标左侧的相邻横坐标对应的所述累加值小于第二阈值;所述第二横坐标以及所述第二横坐标左侧的相邻横坐标对应的所述累加值大于所述第一阈值,且所述第二横坐标右侧的相邻横坐标对应的所述累加值小于所述第二阈值;
    区域识别子模块,被配置为对于每组所述横坐标,将所述第一横坐标所在的像素列识别为一个字符区域的左边缘,将所述第二横坐标所在的像素列识别为所述字符区域的右边缘。
  11. 根据权利要求10所述的装置,其特征在于,所述坐标确定子模块,包括:
    坐标识别子模块,被配置为根据所述累加值的分布信息,识别所述直方图中的第三横坐标,所述第三横坐标为所述若干个文字中的第一个有效文字的字符区域的左边缘在所述直方图中所对应的坐标,或者,所述若干个文字中的最后一个有效文字的字符区域的右边缘在所述直方图中所对应的坐标;
    坐标搜索子模块,被配置为以所述第三横坐标为搜索起点,按照预定方向基于所述累加值的分布信息搜索出所述若干组横坐标。
  12. 根据权利要求11所述的装置,其特征在于,所述第三横坐标为所述若干个文字中的第一个有效文字的字符区域的左边缘在所述直方图中所对应的横坐标;
    所述坐标搜索子模块,还被配置为:
    对于第i组横坐标,以所述直方图中的所述第i组横坐标中的所述第一横坐标为搜索起点,向右搜索首个第四横坐标,所述第四横坐标以及所述第四横坐标左侧的相邻横坐标对应的所述累加值大于所述第一阈值,所述第四横坐标右侧的相邻横坐标对应的所述累加值小于所述第二阈值;1≤i≤n,i为初始值为1的正整数,n为所述若干个文字中的有效文字的个数;所述第1组坐标中的所述第一横坐标为所述第三横坐标;
    将所述第四横坐标确定为所述第i组坐标中的所述第二横坐标;
    若i<n,则以所述直方图中的所述第四横坐标为搜索起点,向右侧搜索首个第五横坐标,所述第五横坐标以及所述第五横坐标右侧的相邻横坐标对应的所述累加值大于所述第一阈值,所述第五横坐标左侧的相邻横坐标对应的所述累加值小于所述第二阈值;
    令i=i+1,将所述第五横坐标确定为所述第i组坐标中的所述第一横坐标。
  13. 根据权利要求11所述的装置,其特征在于,所述第三横坐标为所述若干个文字中的最后一个有效文字的字符区域的右边缘在所述直方图中所对应的横坐标;
    所述坐标搜索子模块,还被配置为:
    对于第j组坐标,以所述直方图中的所述第j组坐标中的所述第二横坐标为搜索起点,向左搜索首个第六横坐标,所述第六横坐标以及所述第六横坐标右侧的相邻横坐标对应的所述累加值大于所述第一阈值,所述第六横坐标左侧的相邻横坐标对应的所述累加值小于所述第二阈值;1≤j≤n,j是初始值为n的正整数,n为所述若干个文字中的有效文字的个数;所述第n组坐标中的所述第二横坐标为所述第三横坐标;
    将所述第六横坐标确定为所述第j组坐标中的所述第一横坐标;
    若j>0,则以所述直方图中的所述六横坐标为搜索起点,向左侧搜索首个第七横坐标,所述第七横坐标以及所述第七横坐标左侧的相邻横坐标对应的所述累加值大于所述第一阈值,所述第七横坐标右侧的相邻横坐标对应的所述累加值小于所述第二阈值;1≤j≤n,j为初始值为n的正整数;
    将令j=j-1,将所述第七横坐标确定为所述第j组坐标中的所述第二横坐标。
  14. 根据权利要求11所述的装置,其特征在于,所述第三横坐标为所述若干个文字中的第一个有效文字的字符区域的左边缘在所述直方图中所对应的横坐标;
    所述坐标识别子模块,还被配置为:
    当所述若干个文字中包括有效文字和无效文字,所述有效文字与所述无效文字之间的第一距离大于相邻两个有效文字之间的第二距离时,从所述直方图 中的预设横坐标处开始,向左侧查询首个宽度大于所述第二距离的间隙,将位于所述间隙右侧的首个前景色像素点的累加值大于所述第一阈值的横坐标确定为所述第三横坐标;所述预设横坐标为属于预设区间的坐标,所述预设区间为根据经验值设置的区间;所述间隙的前景色像素点的累加值小于所述第二阈值;
    或者,
    当所述若干个文字中均为有效文字时,将位于所述直方图中的左侧的首个前景色像素点的累加值大于所述第一阈值的横坐标确定为所述第三横坐标。
  15. 根据权利要求11所述的装置,其特征在于,所述第三横坐标为所述若干个文字中的最后一个有效文字的字符区域的右边缘在所述直方图中所对应的横坐标;
    所述坐标识别子模块,还被配置为:
    当所述若干个文字中包括有效文字和无效文字,所述有效文字与所述无效文字之间的第一距离大于相邻两个有效文字之间的第二距离时,从所述直方图中的预设横坐标处开始,向右侧查询宽度大于所述第二距离的间隙,将位于所述间隙左侧的首个前景色像素点的累加值大于所述第一阈值的横坐标确定为所述第三横坐标;所述预设横坐标为属于预设区间的坐标,所述预设区间为根据经验值设置的区间;所述间隙的前景色像素点的累加值小于所述第二阈值;
    或者,
    当所述若干个文字中均为有效文字时,将位于所述直方图右侧的首个前景色像素点的累加值大于所述第一阈值的横坐标确定为所述第三横坐标。
  16. 根据权利要求9至15任一所述的装置,其特征在于,所述装置还包括:
    第二二值化模块,被配置为对目标图像区域进行二值化,得到二值化后的目标图像区域;
    第二计算模块,被配置为对所述二值化后的目标图像区域按照水平方向计算水平直方图,所述水平直方图包括:每行像素点的竖坐标和所述每行像素点中前景色像素点的累加值;
    边缘确定模块,被配置为根据所述水平直方图中的所述累加值的分布信息,确定若干组竖坐标,每组竖坐标包括第一竖坐标和位于所述第一竖坐标下侧的 第二竖坐标;对于每组竖坐标,将所述第一竖坐标所在的像素行识别为一行文字区域的上边缘,将所述第二竖坐标所在的像素行识别为所述文字区域的下边缘;所述第一竖坐标以及所述第一竖坐标下侧的相邻竖坐标对应的所述累加值大于第一阈值,且所述第一竖坐标上侧的相邻竖坐标对应的所述累加值小于第二阈值;所述第二竖坐标以及所述第二竖坐标上侧的相邻竖坐标对应的所述累加值大于所述第一阈值,且所述第二竖坐标下侧的相邻竖坐标对应的所述累加值小于所述第二阈值;
    所述第一二值化模块,还被配置为对于第k行文字区域,执行所述对文字区域进行二值化,得到二值化后的文字区域的步骤,m≥k≥1,k为正整数,m为识别得到的总行数。
  17. 一种区域识别装置,其特征在于,所述装置包括:
    处理器;
    用于存储所述处理器可执行指令的存储器;
    其中,所述处理器被配置为:
    对文字区域进行二值化,得到二值化后的文字区域,所述文字区域包括属于同一行的若干个文字;
    对所述二值化后的文字区域按照竖直方向计算直方图,所述直方图包括:
    每列像素点的横坐标和所述每列像素点中前景色像素点的累加值;
    根据所述直方图中的所述累加值的分布信息,识别所述文字区域中的所述文字的字符区域。
PCT/CN2015/099299 2015-10-30 2015-12-29 区域识别方法及装置 WO2017071063A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
RU2016110434A RU2639668C2 (ru) 2015-10-30 2015-12-29 Способ и устройство для идентификации области
JP2017547046A JP6392468B2 (ja) 2015-10-30 2015-12-29 領域認識方法及び装置
MX2016003679A MX2016003679A (es) 2015-10-30 2015-12-29 Metodo y dispositivo para identificacion de region.
KR1020167005567A KR101805090B1 (ko) 2015-10-30 2015-12-29 영역 인식 방법 및 장치

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510726153.9 2015-10-30
CN201510726153.9A CN105528606B (zh) 2015-10-30 2015-10-30 区域识别方法及装置

Publications (1)

Publication Number Publication Date
WO2017071063A1 true WO2017071063A1 (zh) 2017-05-04

Family

ID=55770820

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/099299 WO2017071063A1 (zh) 2015-10-30 2015-12-29 区域识别方法及装置

Country Status (8)

Country Link
US (1) US10157326B2 (zh)
EP (1) EP3163502A1 (zh)
JP (1) JP6392468B2 (zh)
KR (1) KR101805090B1 (zh)
CN (1) CN105528606B (zh)
MX (1) MX2016003679A (zh)
RU (1) RU2639668C2 (zh)
WO (1) WO2017071063A1 (zh)

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7873200B1 (en) 2006-10-31 2011-01-18 United Services Automobile Association (Usaa) Systems and methods for remote deposit of checks
US8708227B1 (en) 2006-10-31 2014-04-29 United Services Automobile Association (Usaa) Systems and methods for remote deposit of checks
US9058512B1 (en) 2007-09-28 2015-06-16 United Services Automobile Association (Usaa) Systems and methods for digital signature detection
US9159101B1 (en) 2007-10-23 2015-10-13 United Services Automobile Association (Usaa) Image processing
US10380562B1 (en) 2008-02-07 2019-08-13 United Services Automobile Association (Usaa) Systems and methods for mobile deposit of negotiable instruments
US10504185B1 (en) 2008-09-08 2019-12-10 United Services Automobile Association (Usaa) Systems and methods for live video financial deposit
US8452689B1 (en) 2009-02-18 2013-05-28 United Services Automobile Association (Usaa) Systems and methods of check detection
US10956728B1 (en) 2009-03-04 2021-03-23 United Services Automobile Association (Usaa) Systems and methods of check processing with background removal
US9779392B1 (en) 2009-08-19 2017-10-03 United Services Automobile Association (Usaa) Apparatuses, methods and systems for a publishing and subscribing platform of depositing negotiable instruments
US8977571B1 (en) 2009-08-21 2015-03-10 United Services Automobile Association (Usaa) Systems and methods for image monitoring of check during mobile deposit
US9129340B1 (en) 2010-06-08 2015-09-08 United Services Automobile Association (Usaa) Apparatuses, methods and systems for remote deposit capture with enhanced image detection
US10380565B1 (en) 2012-01-05 2019-08-13 United Services Automobile Association (Usaa) System and method for storefront bank deposits
US9286514B1 (en) * 2013-10-17 2016-03-15 United Services Automobile Association (Usaa) Character count determination for a digital image
US10506281B1 (en) 2015-12-22 2019-12-10 United Services Automobile Association (Usaa) System and method for capturing audio or video data
US11030752B1 (en) 2018-04-27 2021-06-08 United Services Automobile Association (Usaa) System, computing device, and method for document detection
CN109145891B (zh) * 2018-06-27 2022-08-02 上海携程商务有限公司 客户端及其识别身份证的方法、识别身份证的系统
CN109635807A (zh) * 2018-10-16 2019-04-16 深圳壹账通智能科技有限公司 信息录入方法、装置、设备及计算机可读存储介质
CN111104940A (zh) * 2018-10-26 2020-05-05 深圳怡化电脑股份有限公司 图像旋转校正方法、装置、电子设备及存储介质
CN111223104B (zh) * 2018-11-23 2023-10-10 杭州海康威视数字技术股份有限公司 一种包裹提取及跟踪方法、装置及电子设备
CN110533030B (zh) * 2019-08-19 2023-07-14 三峡大学 基于深度学习的太阳胶片图像时间戳信息提取方法
CN111291750B (zh) * 2020-01-21 2023-03-24 河南大学 一种基于空间近邻关系的甲骨文自动标注方法
CN111898602B (zh) * 2020-08-10 2024-04-16 赞同科技股份有限公司 一种图像中的凭证号码区域识别方法、装置及设备
US11900755B1 (en) 2020-11-30 2024-02-13 United Services Automobile Association (Usaa) System, computing device, and method for document detection and deposit processing
CN113723301B (zh) * 2021-08-31 2024-08-30 广州新丝路信息科技有限公司 一种进口货物报关单ocr识别分行处理方法及装置
CN117351438B (zh) * 2023-10-24 2024-06-04 武汉无线飞翔科技有限公司 一种基于图像识别的车辆实时位置跟踪方法及系统
CN117274267B (zh) * 2023-11-22 2024-04-05 合肥晶合集成电路股份有限公司 掩膜版图的自动检测方法、装置、处理器以及电子设备
CN117727059B (zh) * 2024-02-18 2024-05-03 蓝色火焰科技成都有限公司 汽车金融发票信息核验方法、装置、电子设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101408933A (zh) * 2008-05-21 2009-04-15 浙江师范大学 基于粗网格特征提取和bp神经网络的车牌字符识别方法
CN102184399A (zh) * 2011-03-31 2011-09-14 上海名图信息技术有限公司 基于水平投影和连通域分析的字符分割方法
CN103310435A (zh) * 2012-03-21 2013-09-18 华中科技大学 将垂直投影和最优路径相结合对车牌字符进行分割的方法
CN104156704A (zh) * 2014-08-04 2014-11-19 胡艳艳 一种新的车牌识别方法及系统
US9158986B2 (en) * 2013-02-06 2015-10-13 Nidec Sankyo Corporation Character segmentation device and character segmentation method

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR0186172B1 (ko) 1995-12-06 1999-05-15 구자홍 문자 인식장치의 접촉문자 분리 및 특징 추출방법
JP3452774B2 (ja) * 1997-10-16 2003-09-29 富士通株式会社 文字認識方法
WO2003051031A2 (en) * 2001-12-06 2003-06-19 The Trustees Of Columbia University In The City Of New York Method and apparatus for planarization of a material by growing and removing a sacrificial film
RU2234126C2 (ru) * 2002-09-09 2004-08-10 Аби Софтвер Лтд. Способ распознавания текста с применением настраиваемого классификатора
US7302098B2 (en) * 2004-12-03 2007-11-27 Motorola, Inc. Character segmentation method and apparatus
JP2007206985A (ja) * 2006-02-01 2007-08-16 Sharp Corp 文字列抽出装置、文字列抽出方法、そのプログラムおよび記録媒体
JP4991411B2 (ja) * 2006-07-28 2012-08-01 キヤノン株式会社 画像処理方法
JP5334042B2 (ja) * 2008-11-23 2013-11-06 日本電産サンキョー株式会社 文字列認識方法及び文字列認識装置
KR20110087620A (ko) 2010-01-26 2011-08-03 광주과학기술원 레이아웃 기반의 인쇄매체 페이지 인식방법
JP5591578B2 (ja) * 2010-04-19 2014-09-17 日本電産サンキョー株式会社 文字列認識装置および文字列認識方法
JP6161484B2 (ja) * 2013-09-19 2017-07-12 株式会社Pfu 画像処理装置、画像処理方法及びコンピュータプログラム

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101408933A (zh) * 2008-05-21 2009-04-15 浙江师范大学 基于粗网格特征提取和bp神经网络的车牌字符识别方法
CN102184399A (zh) * 2011-03-31 2011-09-14 上海名图信息技术有限公司 基于水平投影和连通域分析的字符分割方法
CN103310435A (zh) * 2012-03-21 2013-09-18 华中科技大学 将垂直投影和最优路径相结合对车牌字符进行分割的方法
US9158986B2 (en) * 2013-02-06 2015-10-13 Nidec Sankyo Corporation Character segmentation device and character segmentation method
CN104156704A (zh) * 2014-08-04 2014-11-19 胡艳艳 一种新的车牌识别方法及系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XIONG, ZHEYUAN ET AL.: "Segmentation Algorithm of License Plate Characters Based on Mathematical Morphology Edge Detection", COMPUTER SYSTEMS & APPLICATIONS, vol. 19, no. 9, 30 September 2010 (2010-09-30) *

Also Published As

Publication number Publication date
JP6392468B2 (ja) 2018-09-19
JP2018500705A (ja) 2018-01-11
EP3163502A1 (en) 2017-05-03
US20170124414A1 (en) 2017-05-04
KR20170061631A (ko) 2017-06-05
RU2639668C2 (ru) 2017-12-21
RU2016110434A (ru) 2017-09-26
MX2016003679A (es) 2018-06-22
CN105528606B (zh) 2019-08-06
US10157326B2 (en) 2018-12-18
CN105528606A (zh) 2016-04-27
KR101805090B1 (ko) 2017-12-05

Similar Documents

Publication Publication Date Title
WO2017071063A1 (zh) 区域识别方法及装置
US10127471B2 (en) Method, device, and computer-readable storage medium for area extraction
US20150332439A1 (en) Methods and devices for hiding privacy information
KR101782633B1 (ko) 영역 인식 방법 및 장치
JP6392467B2 (ja) 領域識別方法及び装置
US10007841B2 (en) Human face recognition method, apparatus and terminal
EP3125135A1 (en) Picture processing method and device
WO2017071064A1 (zh) 区域提取方法、模型训练方法及装置
US10216976B2 (en) Method, device and medium for fingerprint identification
CN105631803B (zh) 滤镜处理的方法和装置
CN106557759B (zh) 一种标志牌信息获取方法及装置
CN112927122A (zh) 水印去除方法、装置及存储介质
CN104899588A (zh) 识别图像中的字符的方法及装置
CN111797746B (zh) 人脸识别方法、装置及计算机可读存储介质
CN105956633B (zh) 搜索引擎类别的识别方法和装置
CN106227505A (zh) 图像检测方法、装置和用于图像检测的装置
CN110619257B (zh) 一种文字区域确定方法和装置
CN109002493A (zh) 指纹数据库更新方法、装置、终端及存储介质
CN110929548B (zh) 指纹识别方法、装置、设备及存储介质

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2017547046

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: MX/A/2016/003679

Country of ref document: MX

ENP Entry into the national phase

Ref document number: 2016110434

Country of ref document: RU

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15907124

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15907124

Country of ref document: EP

Kind code of ref document: A1