WO2017094156A1 - Character recognition device and character recognition method - Google Patents

Character recognition device and character recognition method Download PDF

Info

Publication number
WO2017094156A1
WO2017094156A1 PCT/JP2015/083948 JP2015083948W WO2017094156A1 WO 2017094156 A1 WO2017094156 A1 WO 2017094156A1 JP 2015083948 W JP2015083948 W JP 2015083948W WO 2017094156 A1 WO2017094156 A1 WO 2017094156A1
Authority
WO
WIPO (PCT)
Prior art keywords
character string
line
histogram
boundary
string region
Prior art date
Application number
PCT/JP2015/083948
Other languages
French (fr)
Japanese (ja)
Inventor
裕介 伊谷
Original Assignee
三菱電機株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 三菱電機株式会社 filed Critical 三菱電機株式会社
Priority to JP2017553564A priority Critical patent/JP6493559B2/en
Priority to PCT/JP2015/083948 priority patent/WO2017094156A1/en
Publication of WO2017094156A1 publication Critical patent/WO2017094156A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition

Definitions

  • the present invention relates to a character recognition device and a character recognition method.
  • a histogram indicating the frequency of black pixels in the row direction is created from the binarized image data, and a position that becomes a valley of the histogram is estimated as a position between lines that is a candidate for a line boundary.
  • the character recognition device estimates a character rectangular area in which a character area that is supposed to be recognized as one character is indicated by a rectangle, and calculates a certainty factor C indicating the character of the image in each character rectangular area. If the estimated line spacing does not divide the character rectangle having the high certainty factor C, the line spacing is determined as the boundary between the lines, thereby separating the lines.
  • the certainty factor C indicating the character character of the image in the character rectangular area estimated when the line boundary is determined is used.
  • the certainty factor C is used. If a high-character rectangular region straddles between lines, the line spacing position cannot be determined as a line boundary, and the line cannot be divided appropriately.
  • the present invention has been made to solve the above-described problem, and an object of the present invention is to obtain a character recognition device 1 that can accurately separate lines.
  • a histogram indicating the frequency of black pixels in the row direction of the character string region extracted by the character string region extraction unit extracted from the input image data is generated, and the generated histogram is used.
  • a line determination threshold is calculated, and a boundary between different lines in the character string region is determined based on the calculated line determination threshold.
  • the character string region is extracted from the image data, and the line determination threshold value is set from the histogram indicating the frequency of black pixels in the line direction. Therefore, the threshold value for determining the line boundary is obtained from the entire line direction.
  • the character string in the character string area can be separated into appropriate lines.
  • FIG. It is a block diagram of the character recognition apparatus which concerns on Embodiment 1.
  • FIG. It is explanatory drawing which shows the example of the operation
  • (A) is explanatory drawing which shows the example of the character type area
  • (b) is explanatory drawing which shows the example of the histogram which the histogram generation part of the character recognition apparatus which concerns on Embodiment 1 produces
  • FIG. 4 is a flowchart illustrating an operation of the character recognition device according to the first embodiment.
  • 6 is a detailed flowchart illustrating an operation in which a threshold value calculation unit of the character recognition device according to Embodiment 1 calculates a line determination threshold value.
  • 6 is a detailed flowchart illustrating an operation of estimating a line boundary in a line boundary determination unit of the character recognition device according to the first embodiment.
  • 10 is a detailed flowchart illustrating an operation of estimating a line boundary in a line boundary determination unit of the character recognition device according to the second embodiment. It is explanatory drawing of the example at the time of producing
  • FIG. 12 is a detailed flowchart illustrating an operation of estimating a line boundary in a line boundary determination unit of the character recognition device according to the third embodiment.
  • 14 is a detailed flowchart illustrating an operation of estimating a line boundary in a line boundary determination unit of the character recognition device according to the fourth embodiment.
  • It is a hardware block diagram for implement
  • FIG. 1 is a configuration diagram of a character recognition device 1 according to the first embodiment.
  • the character recognition device 1 includes: a binarization processing unit 2 that binarizes image data to generate binarized data; and a character string region data by extracting a character string region based on the binarized data
  • a character string region extraction unit 3 that generates a histogram
  • a histogram generation unit 4 that generates a histogram indicating the frequency of black pixels in the row direction from the binarized data and the character string region data, and a row determination threshold value calculated from the histogram
  • a threshold value calculating unit 5 a line boundary determining unit 6 that determines a boundary between different lines in the character string region using the line determination threshold value, and a character string based on the line boundary determined by the line boundary determining unit 6
  • a character recognition unit 7 for recognizing characters in the area.
  • the binarization processing unit 2 performs binarization processing on image data sent from an image capturing device such as a scanner or a camera to generate binarized data.
  • the binarization process is a process for converting a grayscale image into a two-tone image of white and black. For example, if the pixel value at a bit position in the binarized data is ⁇ (x, y), ⁇ (x, y) is converted to 1 when it is a black pixel and 0 when it is a white pixel.
  • a certain threshold value is determined, and processing is performed in which if the value of each pixel is above the threshold value, it is replaced with white, and if it is below, it is replaced with black.
  • the binarization processing method is not limited to the method of setting one threshold value, and the image is divided into a plurality according to the luminance range, and binarization is performed by providing different threshold values for each region. May be.
  • the margin of image data including the background is converted into white pixels, and characters, ruled lines, symbol figures, etc. other than the margin are converted into black pixels.
  • the image data to be binarized to be input to the binarization processing unit 2 may be any data that is represented in various formats representing images and characters. For example, JPEG (Joint Photographic Experts Group), TIFF ( This is an image represented in a data format related to images and characters such as Tagged Image File Format) and BMP (Bit MaP).
  • the character string area extraction unit 3 extracts a character string area from the binarized data generated by the binarization processing unit 2.
  • a character string refers to a set of black pixels of binarized data estimated to have characters.
  • the extraction of the character string region means that when a set of black pixels is detected in the binarized data, it is estimated that there is a character around the x-axis, which is a range including the detected set of black pixels.
  • a rectangular area including position information from the first xstart to the last xend on the upper side and from the first ystart to the last yend on the y-axis is extracted as a character string area.
  • the character string area extraction unit 3 extracts a character string area, and displays data indicating a rectangular range including position information of the extracted character string area, that is, coordinate data and binarized data of the xstart, xend, ystart, and yend. Generated as character string area data.
  • FIG. 2 is an explanatory diagram illustrating an example of an operation in which the character string region extraction unit 3 of the character recognition device 1 according to the first embodiment extracts the character string region 3b.
  • the character string region extraction unit 3 includes black in the binarized image 3a. A set of pixels is detected, a rectangular range including the detected set of black pixels is extracted as a character string region 3b, and data indicating the rectangular range of the extracted character string region 3b is generated as character region data.
  • the character string area data is data indicating, for example, the first coordinate xstart from the first coordinate xstart on the x axis and the last coordinate yend from the first coordinate ystart on the y axis of the character string area 3b in the binarized image 3a.
  • the extraction method of the character string region 3b described above is an example, and other methods may be used as long as the character string region is extracted. Further, the character string region extraction unit 3 estimates the direction of the line in the extracted character string region from the shape of the extracted character string region 3b or the attribute information of the binarized image.
  • the histogram generation unit 4 generates a histogram using the character string region extracted by the character string region extraction unit 3. Specifically, when the histogram generation unit 4 extracts a character string region from the binarized data generated by the binarization processing unit 2 and the character string region data generated by the character string region extraction unit 3, A histogram indicating the frequency of black pixels appearing in the estimated row direction is generated.
  • FIG. 3A is an explanatory diagram illustrating an example of a character expression area
  • FIG. 3B is an explanatory diagram illustrating an example of a histogram generated by the histogram generation unit 4 of the character recognition device 1 according to the first embodiment.
  • FIG. 3A is an explanatory diagram illustrating an example of a character expression area
  • FIG. 3B is an explanatory diagram illustrating an example of a histogram generated by the histogram generation unit 4 of the character recognition device 1 according to the first embodiment.
  • the histogram generation unit 4 determines the total number of black pixels in the x-axis direction for each coordinate of the y-axis pixel unit, and generates a histogram.
  • An example of an expression for generating this histogram H (y) is shown below.
  • ⁇ (x, y) represents the value of the binarized data at the coordinates (x, y), and is 1 when it is a black pixel and 0 when it is a white pixel.
  • the black pixel frequency in the y-axis direction is high in the histogram of FIG. Then, it can be seen that the black pixel frequency in the y-axis direction is as small as about 20.
  • the threshold value calculation unit 5 calculates a threshold value for determining a line boundary from the histogram in the character string region generated by the histogram generation unit 4.
  • a threshold for this determination from the histogram of each individual recognition target image, a threshold based on the characteristics of each image, in particular, a characteristic obtained from the entire row direction that is not a microscopic criterion such as character units.
  • the threshold calculation unit 5 calculates the row determination threshold th1 is shown below.
  • is a parameter indicating a weighting factor.
  • the weighting factor ⁇ is a parameter that is set by the user or automatically.
  • the weighting factor ⁇ is increased.
  • the threshold is lowered so that it is not considered as a line boundary. For this purpose, the weighting factor ⁇ is reduced.
  • the setting of the weighting factor ⁇ may be constant for the entire image, or the image may be divided into regions and changed for each region.
  • As a method of dividing the area when the weighting coefficient ⁇ is changed for each area for example, there are an area surrounded by a ruled line and an area surrounded by a blank. In order to automatically detect these areas, it is possible to perform ruled line detection, blank detection, symbol detection, and the like, and there are various conventional methods. For example, there is a method of Reference 1 below for ruled line detection and blank detection, and a method of Reference 2 below for symbol detection.
  • Reference 1 Takashi Hirano, Yasuhiro Okada, Fumio Yoda, “Rule Extraction Method from Document Images”, IEICE General Conference, March 1998
  • Reference 2 Noboru Yoneyama, Takashi Hirano, Yasuhiro Okada, “Drawing “Examination of Image Symbol Extraction Method”, IEICE General Conference, March 2006
  • FIG. 4 shows an example of an image when it is useful to change the weighting factor ⁇ for each region.
  • the line determination difficulty region 5a in which the line space is narrowed and it is estimated that the line boundary is difficult to determine, and the line determination is easy to determine that the line boundary is wide and the line boundary is easily determined.
  • the user previously increases the weighting factor ⁇ of the row determination difficult region 5a and sets the weighting factor ⁇ of the row determination easy region 5b small.
  • By setting such a weighting factor ⁇ it is possible to set an appropriate line determination threshold according to an area such as an area where it is difficult to determine a line boundary according to the width of a character line.
  • the weighting factor ⁇ may be automatically set based on the frequency of black pixels in the region and the tendency of distribution. In the example of FIG. 3 described above, the case where the peak value P of the histogram is 102 and the weight coefficient ⁇ is 0.22, the result is that the row determination threshold th1 is 22.
  • the line boundary determination unit 6 determines a boundary between different lines in the character string area based on the line determination threshold th1 calculated by the threshold calculation unit 5.
  • the determined line boundary indicates position information of a line determined to have a line boundary.
  • the determination condition that the line boundary determination unit 6 uses as a line boundary is the following expression. When H (y) is smaller than the line determination threshold th1, it is estimated that there is a line boundary at the coordinate y, and when H (y) is greater than or equal to the line determination threshold th1, it is estimated that there is no line boundary at the coordinate y. To do.
  • the coordinate y is present. If there are a plurality of estimated coordinates adjacent to each other, the coordinate y located at the center among the plurality of coordinates y. Is a line boundary. Which of the plurality of adjacent coordinates y determined to have a line boundary is determined as a line boundary is not limited to the center, and may be selected from a plurality of adjacent coordinates y. If coordinates y determined to have a line boundary are not adjacent to each other, the coordinates y are determined to be a line boundary. The line boundary is not limited to one in one character string area, and there may be a plurality of line boundaries.
  • the character recognition unit 7 performs character recognition processing in the character string region based on the line boundary determination determined by the line boundary determination unit 6 and the character string region extracted by the character string region extraction unit 3.
  • various methods for performing character recognition For example, there is a technique described in the following reference in which robustness against image degradation is improved by using run length correction.
  • Reference 3 Minoru Mori, Minako Sawaki, Norihiro Hamada, Hiroshi Murase, Naoki Takekawa, “Robust Feature Extraction for Image Degradation Using Run-Length Correction”, IEICE Transactions, Vol. J86-D2, No. 7, pp. 1049-1057, July 2003
  • the character recognition unit 7 outputs a character recognition result.
  • the above is the configuration related to the character recognition device 1.
  • FIG. 5 is a flowchart showing the operation of the character recognition device 1 according to the present embodiment.
  • step S1 the binarization processing unit 2 performs binarization processing on image data to generate binarized data.
  • the generated binarized data is sent to the character string region extraction unit 3.
  • step S2 the character string region extraction unit 3 extracts a character string region from the binarized data generated in step S1, and generates character string region data indicating the extracted character string region.
  • the character string region extraction unit 3 estimates the direction of the line in the extracted character string region from the shape of the extracted character string region or the attribute information of the binarized image.
  • the character string region data generated by the character string region extraction unit 3 is sent to the character recognition unit 7 together with the input binarized data and data indicating the estimated line direction.
  • the number of character string regions is not limited to one for one binarized data, and a plurality of character string regions may be extracted. In the following steps, one character recognition process in the character string area extracted in step S2 will be described.
  • step S3 the histogram generation unit 4 generates a histogram using the character string region extracted in step S2.
  • the histogram generation unit 4 generates black pixels in the direction of the row estimated when extracting the character string region from the binarized data generated in step S1 and the character string region data extracted in step S2.
  • a histogram indicating the frequency is generated.
  • FIG. 3A shows an example of an image normalized with the estimated row direction as the x-axis, and an example of a histogram generated based on this image is shown in FIG.
  • Data indicating the histogram generated by the histogram generation unit 4 is sent to the threshold value calculation unit 5 together with the character string region data generated in step S2.
  • step S4 the threshold value calculation unit 5 calculates a row determination threshold value th1 for determining a line boundary using the histogram generated in step S3.
  • the calculated line determination threshold th1 is sent to the line boundary determination unit 6 together with the character string region data generated in step S2 and the histogram data generated in step S3.
  • FIG. 6 is a detailed flowchart illustrating an operation in which the threshold calculation unit 5 calculates a row determination threshold.
  • step S41 the threshold value calculation unit 5 detects the peak value P in the histogram generated by the histogram generation unit 4 in step S3.
  • step S42 the threshold value calculation unit 5 calculates the row determination threshold value th1 using the peak value P calculated in step S41 and the weighting coefficient ⁇ .
  • the example shown in FIG. 3 shows a case where the peak value P of the histogram is 102, and the row determination threshold th1 is 22 as a result of setting the weighting coefficient ⁇ to 0.22.
  • FIG. 3B there is a line boundary corresponding to the line boundary area 4a of the character area shown in FIG. 3A, corresponding to the line boundary area 4a where H (y) is smaller than 22 as th1. Recognize.
  • the weight coefficient ⁇ can be adjusted to a value suitable for each type of image or for each region in the image, so that more appropriate row determination can be performed.
  • step S5 the line boundary determination unit 6 determines the boundary of different lines in the character string region using the line determination threshold th1 calculated in step S4.
  • the line boundary determination unit 6 stores one or a plurality of coordinates y estimated to be a line boundary by comparing the line boundary threshold th1 and the value of the histogram H (y).
  • FIG. 7 is a detailed flowchart showing the operation of estimating the line boundary in the line boundary determination unit 6.
  • ystart included in the character string area data that is, the coordinate y at which the character string area starts is set as an initial value.
  • the line determination threshold th1 is compared with the histogram value H (y) corresponding to the current coordinate y.
  • H (y) is smaller than the row determination threshold th1 (H (y) ⁇ th1), there is a high possibility that there is a row boundary at this coordinate y. In this case, the process proceeds to step S51-3.
  • step S54 the coordinate y is stored as a coordinate that is estimated to have a line boundary, and the process proceeds to step S51-4.
  • step S51-3 After step S51-3 or when H (y) is greater than or equal to the row determination threshold th1 in step S51-2, the process proceeds to step S51-4, where y is incremented to the next coordinate y, and in step S51-5 The operation from step S51-2 to step S51-5 is repeated until it is determined that y is the coordinate yield at which the character string region 32 ends.
  • step S51-5 By such an operation, one or a plurality of coordinates y estimated as a line boundary in the character string region are extracted and stored.
  • step S51-6 the line boundary determination unit 6 determines that the coordinate y estimated to be a line boundary is one, and if there are a plurality of estimated coordinates adjacent to each other, Of the plurality of coordinates y, the coordinate y located at the center is determined to be the boundary of the row.
  • the y coordinate determined to be a line boundary is sent to the character recognition unit 7 as line boundary data together with the character string area data generated in step S2.
  • step S6 the character string recognition unit performs character recognition processing based on the character string region data generated in step S2 and the line boundary data determined in step S5.
  • the character recognition unit 7 outputs a character recognition result.
  • line boundary determination and character recognition are performed by the character recognition device 1 according to the present embodiment.
  • the frequency of black pixels in the row direction of the character string region extracted by the character string region extraction unit 3 extracted from the input image data is shown.
  • a histogram is generated, a line determination threshold value is calculated from the generated histogram, and a boundary between different lines in the character string region is determined based on the calculated line determination threshold value.
  • the threshold value for determining the boundary between lines is appropriately set based on the characteristics obtained from the entire line direction, and the character string in the character string region can be separated into appropriate lines.
  • Embodiment 2 the character recognition device 1 according to Embodiment 2 will be described.
  • the row determination is performed using the row determination threshold th1 for the frequency of black pixels as the determination criterion of the row boundary determination unit 6, but in the second embodiment, in addition to the frequency of black pixels, The line boundary determination is performed by using the gradient g (y) of the histogram as the line determination reference.
  • the detailed configurations and operations of the threshold value calculation unit 5 and the row boundary determination unit 6 are different from those in the first embodiment, and other parts are the same as those in the first embodiment.
  • the threshold calculation unit 5 calculates a line determination threshold th1 for determining a line boundary from the histogram in the character string region generated by the histogram generation unit 4 as in the first embodiment.
  • the threshold value calculation unit 5 stores in advance a row determination threshold value th2 relating to the histogram inclination g (y).
  • the slope g (y) of the histogram is dH (y) / dy. Since the slope of the histogram g (y) becomes steep at the boundary of the line, the numerical value becomes large. In the area where the character exists, the numerical value becomes small and becomes small. Therefore, by setting a threshold value for the slope of the histogram, Can determine the boundary.
  • a threshold based on the characteristics of each image in particular, in character units.
  • the line boundary determination unit 6 determines a boundary between different lines in the character string area based on the line determination threshold th2 in addition to the line determination threshold th1 calculated by the threshold calculation unit 5.
  • the coordinate y estimated that there is a line boundary based on the line determination threshold th1 calculated by the threshold calculation unit 5 is further determined by the following formula. It is estimated that there is a line boundary in y. In other cases, it is estimated that there is no line boundary in the coordinate y.
  • H (y) is smaller than the row determination threshold th1 and H (y) -H (y-1) is larger than the row determination threshold th2, it is estimated that there is a row boundary at the coordinate y, and otherwise In the case of, it is estimated that there is no line boundary at the position y.
  • the difference value between H (y) and H (y ⁇ 1) is applied to the gradient g (y) of the histogram according to the following equation.
  • the line boundary determination unit 6 indicates that there is one coordinate y estimated to have a line boundary, and when there are a plurality of adjacent coordinates y that are estimated to have a line boundary, It is determined that the center position among the plurality of coordinates y is the boundary of the row. Which of the plurality of adjacent coordinates y determined to have a line boundary is determined as a line boundary is not limited to the center, and may be selected from a plurality of adjacent coordinates y. If coordinates y determined to have a line boundary are not adjacent to each other, the coordinates y are determined to be a line boundary.
  • a line boundary is not limited to one per character string area, and there may be a plurality of lines.
  • the determined line boundary indicates position information of a line determined to have a line boundary, and is sent to the character recognition unit 7 together with the character string area data as line boundary data. Other configurations are the same as those in the first embodiment.
  • step S4 and step S5 are different from those in the first embodiment.
  • step S4 the threshold value calculation unit 5 calculates a row determination threshold value th1 for determining a line boundary using the histogram generated in step S3.
  • the method for calculating the row determination threshold th1 is the same as in the first embodiment.
  • the threshold calculation unit 5 has a row determination threshold th2 in addition to the row determination threshold th1.
  • the row determination threshold value th2 is a fixed value set in advance by the user, and is stored in the threshold value calculation unit 5 in advance.
  • the threshold calculation unit 5 sends the row determination thresholds th1 and th2 to the row boundary determination unit 6.
  • FIG. 8 is a detailed flowchart showing the operation of estimating the line boundary in the line boundary determination unit 6.
  • ystart included in the character string area data that is, the coordinate y at which the character string area starts is set as an initial value.
  • the line determination threshold th1 is compared with the histogram value H (y) corresponding to the current coordinate y. If H (y) is smaller than the line determination threshold th1 (H (y) ⁇ th1), there is a high possibility that there is a line boundary at this coordinate y. In this case, the process proceeds to step S52-3.
  • step S52-5 the line determination threshold th2 is compared with the slope H (y) -H (y-1) of the histogram corresponding to the current coordinate y.
  • H (y) -H (y-1) is larger than the row determination threshold th2 (H (y) -H (y-1)> th2), the histogram has a steep slope, so this coordinate y is set. It can be estimated that there is a high possibility that there is a line boundary.
  • step S52-4 the process proceeds to step S52-4.
  • H (y) ⁇ H (y ⁇ 1) is equal to or less than the row determination threshold value th2 (H (y) ⁇ H (y ⁇ 1) ⁇ th2)
  • a black pixel is detected at the coordinate y from the row determination threshold value th1.
  • the process proceeds to step S52-5.
  • the coordinate y is stored as a coordinate that is estimated to have a line boundary, and the process proceeds to step S52-5.
  • step S52-4 H (y) is determined to be equal to or higher than the row determination threshold th1, or in step S52-3, H (y) -H (y-1) is determined to be equal to or lower than the row determination threshold th2. If it is determined, the process proceeds to step S52-5, where y is incremented to the next coordinate y, and from step S52-2 until it is determined in step S52-6 that the coordinate string end of the character string area 32 of y ends. The operation up to step S52-5 is repeated. By such an operation, one or a plurality of coordinates y estimated as a line boundary in the character string region are extracted and stored.
  • the line boundary determination unit 6 indicates that the coordinate y estimated to have a line boundary is one, and if there are a plurality of estimated coordinates adjacent to each other, Of the plurality of coordinates y, the coordinate y located at the center is determined to be the boundary of the row.
  • the estimation based on the number of black pixels performed in step S52-2 and the estimation based on the slope of the histogram performed in step S52-3 may be interchanged.
  • the coordinates of the candidate for the line boundary are first estimated based on the inclination of the histogram, and the candidate for the line boundary is determined by further estimating the estimated candidate based on the number of black pixels.
  • the estimation based on the number of black pixels performed in step S52-2 and the estimation based on the histogram inclination performed in step S52-3 can be integrated, and the following expression of the cost function C can be used as a determination expression. It is.
  • the frequency of black pixels in the row direction of the character string region extracted by the character string region extraction unit 3 extracted from the input image data is shown.
  • a histogram is generated, a line determination threshold value is calculated from the generated histogram, and a boundary between different lines in the character string region is estimated based on the calculated line determination threshold value th1.
  • the threshold value for determining the line boundary is appropriately set based on the characteristics obtained from the entire line direction, and the character string in the character string area Can be separated into appropriate lines.
  • the row determination is performed using the row determination threshold th1 for the frequency of black pixels as the determination criterion of the row boundary determination unit 6, but in the third embodiment, in addition to the frequency of black pixels, The line boundary determination is performed by using the peak value P (n) detected from the histogram as the line determination criterion.
  • the detailed configurations and operations of the threshold value calculation unit 5 and the row boundary determination unit 6 are different from those in the first embodiment, and other parts are the same as those in the first embodiment.
  • the threshold calculation unit 5 calculates a line determination threshold th1 for determining a line boundary from the histogram in the character string region generated by the histogram generation unit 4 as in the first embodiment. Further, the threshold value calculation unit 5 stores in advance a row determination threshold value th3 relating to the difference P (n) ⁇ P (n ⁇ 1) between the peak values of the histogram.
  • FIG. 9 shows an example of an image and a histogram when a histogram of a character string region when the lengths of two lines are different is generated. When there are a plurality of rows and their lengths are different, the difference P (n) ⁇ P (n ⁇ 1) in the peak value of the histogram is increased.
  • Threshold By using the row determination threshold th1 obtained from the histogram of each individual recognition target image and also using the row determination threshold th3 related to the peak value difference P (n) ⁇ P (n ⁇ 1) of the histogram, Threshold based on features, especially not based on microscopic judgment criteria such as character units, but also based on features obtained from the entire line direction, it is also accurate when multiple lines have different lengths. It is possible to estimate the boundaries of good lines.
  • the line boundary determination unit 6 determines a boundary between different lines in the character string area based on the line determination threshold th3 in addition to the line determination threshold th1 calculated by the threshold calculation unit 5. First, the line boundary determination unit 6 estimates that there is a line boundary at the coordinate y when H (y) is smaller than the line determination threshold th1, and the coordinate when H (y) is greater than or equal to the line determination threshold th1. Estimate that there is no row boundary at position y. Then, the line boundary determination unit 6 determines that the line boundary determination unit 6 has one coordinate y that is estimated to have a line boundary, and one coordinate y that is estimated to have a line boundary. In this case, the central position among the plurality of coordinates y is determined to be a line boundary.
  • Which of the plurality of adjacent coordinates y determined to have a line boundary is determined as a line boundary is not limited to the center, and may be selected from a plurality of adjacent coordinates y. If coordinates y determined to have a line boundary are not adjacent to each other, the coordinates y are determined to be a line boundary.
  • a line boundary is not limited to one per character string area, and there may be a plurality of lines.
  • the row boundary determination unit 6 calculates a peak difference P (n) ⁇ P (n ⁇ 1). As shown in FIG. 9, the row boundary determination unit 6 detects all the peaks of the generated histogram, and calculates the difference between the peak value P (n) and the adjacent peak value P (n ⁇ 1). The row boundary determination unit 6 then calculates P (n) and P (n ⁇ 1) when the peak value difference P (n) ⁇ P (n ⁇ 1) is larger than the row determination threshold th3 according to the following equation. If the difference P (n) ⁇ P (n ⁇ 1) is smaller than the row determination threshold th3, the P (n) and P (n ⁇ Estimate that there is no line boundary between each y coordinate taking 1).
  • the peak value difference P (n) -P (n-1) is larger than the row determination threshold th3, it is estimated that there is a row boundary between P (n) and P (n-1). determines the center position of the coordinate y n-1 taking coordinate y n and the peak value P (n-1) to a peak value P (n) to be the boundary line.
  • the position of the line boundary between the coordinate y n of P (n) and the coordinate y n ⁇ 1 of P (n ⁇ 1) is not limited to the center, but from a plurality of adjacent coordinates y. It only has to be selected.
  • the determined line boundary indicates position information of a line determined to have a line boundary, and is sent to the character recognition unit 7 together with the character string area data as line boundary data.
  • Other configurations are the same as those in the first embodiment.
  • step S4 and step S5 are different from those in the first embodiment.
  • step S4 the threshold value calculation unit 5 calculates a row determination threshold value th1 for determining a line boundary using the histogram generated in step S3.
  • the method for calculating the row determination threshold th1 is the same as in the first embodiment.
  • the threshold calculation unit 5 has a row determination threshold th3 in addition to the row determination threshold th1.
  • the row determination threshold th3 is a fixed value set in advance by the user, and is stored in the threshold calculation unit 5 in advance.
  • the threshold calculation unit 5 sends the row determination thresholds th1 and th3 to the row boundary determination unit 6.
  • FIG. 10 is a detailed flowchart showing the operation of estimating the line boundary in the line boundary determination unit 6.
  • the line boundary determination using the line boundary threshold th1 is the same as in the first embodiment.
  • the row determination threshold th3 is compared with the adjacent peak value difference P (n) -P (n-1).
  • step S53-3 When the peak value difference P (n) ⁇ P (n ⁇ 1) is larger than the row determination threshold th3 (P (n) ⁇ P (n ⁇ 1)> th3), the peak values P (n) and P (n It is highly likely that there is a line boundary between each y coordinate taking -1), and in this case, the process proceeds to step S53-3.
  • the peak value difference P (n) ⁇ P (n ⁇ 1) is equal to or less than the row determination threshold th3 (P (n) ⁇ P (n ⁇ 1) ⁇ th3), the peak value difference P (n) ⁇ The presence / absence of a row cannot be determined from P (n ⁇ 1), and in this case, the process proceeds to step S53-4.
  • step S53-3 the y-coordinates between the peak values P (n) and P (n-1) are stored as coordinates estimated to have a line boundary, and the process proceeds to step S53-4. .
  • step S53-3 is completed, or when P (n) -P (n-1) is determined to be equal to or less than the row determination threshold th3 in step S53-2, the process proceeds to step S53-4, where n is incremented, The operation from step S53-2 to step S53-5 is repeated until it is determined in step S53-5 that n is nend which is the count value of the last peak value P (nend) of the histogram.
  • step S53-6 the line boundary determination unit 6 determines that the coordinate y estimated to have a line boundary is one, and the peak value P (n ⁇ ) from the peak value P (n).
  • the line boundary data obtained by the determination based on the line boundary threshold th1 by the flowchart of FIG. 7 and the line boundary data obtained by the determination based on the line boundary threshold th3 by the flowchart of FIG. It is sent to the character recognition unit 7.
  • the character recognition unit 7 performs character recognition using both of these line boundary data as in the first embodiment. Note that the order of the operation of the flowchart of FIG. 7 using the row determination threshold th1 and the operation of the flowchart of FIG. 10 using the row determination threshold th3 may be reversed. Other operations are the same as those in the first embodiment.
  • the frequency of black pixels in the row direction of the character string region extracted by the character string region extraction unit 3 extracted from the input image data is shown.
  • a histogram is generated, a line determination threshold value is calculated from the generated histogram, and a boundary between different lines in the character string region is estimated based on the calculated line determination threshold value th1.
  • a boundary between lines is estimated based on the difference between adjacent peak values in the histogram, it is possible to more clearly separate character strings in the character string area into appropriate lines when the line lengths are different. Can do.
  • the line boundary is determined by comparing the line determination threshold th3 and the difference P (n) ⁇ P (n ⁇ 1) between the peak values.
  • the line boundary data obtained from the operation of the flowchart of FIG. 10 using the line determination threshold th3 does not indicate only the coordinates but may include data indicating the probability that the line boundary exists at the coordinates. .
  • This probability is calculated, for example, by subtracting the peak value difference P (n) ⁇ P (n ⁇ 1) from the row determination threshold th3 and dividing by the row determination threshold th3. Based on this probability, the character recognition unit 7 can select whether or not to adopt the coordinates indicated in the line boundary data as the line boundary.
  • the line boundary determination is performed by using the line determination threshold th1 for the frequency of black pixels and the gradient g (y) of the histogram as the line determination reference as the determination criterion of the line boundary determination unit 6.
  • a line boundary is determined using only the gradient g (y) of the histogram as a line determination criterion.
  • the detailed configurations and operations of the threshold value calculation unit 5 and the row boundary determination unit 6 are different from those in the second embodiment, and other parts are the same as those in the second embodiment.
  • the threshold value calculation unit 5 stores in advance a row determination threshold value th2 related to the gradient g (y) of the histogram.
  • the slope g (y) of the histogram is dH (y) / dy. Since the slope of the histogram g (y) becomes steep at the boundary of the line, the numerical value becomes large. In the area where the character exists, the numerical value becomes small and becomes small. Therefore, by setting a threshold value for the slope of the histogram, Can determine the boundary.
  • the line boundary determination unit 6 determines a boundary between different lines in the character string area based on the line determination threshold th2. First, the line boundary determination unit 6 estimates that there is a line boundary at the coordinate y when H (y) ⁇ H (y ⁇ 1) is greater than th2, and H (y) ⁇ H (y ⁇ 1) is If it is smaller than th2, it is estimated that there is no line boundary in the coordinate y.
  • the slope g (y) of the histogram can be calculated assuming that it is a difference value between H (y) and H (y ⁇ 1).
  • the line boundary determination unit 6 indicates that there is one coordinate y estimated to have a line boundary, and when there are a plurality of adjacent coordinates y that are estimated to have a line boundary, It is determined that the center position among the plurality of coordinates y is the boundary of the row. Which of the plurality of adjacent coordinates y determined to have a line boundary is determined as a line boundary is not limited to the center, and may be selected from a plurality of adjacent coordinates y. If coordinates y determined to have a line boundary are not adjacent to each other, the coordinates y are determined to be a line boundary.
  • a line boundary is not limited to one per character string area, and there may be a plurality of lines.
  • the determined line boundary indicates the position information of the line determined to have a line boundary, and is sent to the character recognition unit together with the character string area data as line boundary data. Other configurations are the same as those in the first embodiment.
  • step S4 and step S5 are different from those in the first embodiment.
  • step S4 the threshold calculation unit 5 sends the row determination threshold th2 to the row boundary determination unit 6.
  • the row determination threshold value th2 is a fixed value set in advance by the user, and is stored in the threshold value calculation unit 5 in advance.
  • FIG. 11 shows a detailed flowchart showing the operation of estimating the line boundary in the line boundary determination unit 6.
  • step S54-1 ystart included in the character string area data, that is, the coordinate y at which the character string area starts is set as an initial value.
  • step S54-2 the line determination threshold th2 is compared with the slope H (y) -H (y-1) of the histogram corresponding to the current coordinate y.
  • H (y) -H (y-1) is larger than the row determination threshold th2 (H (y) -H (y-1)> th2)
  • the histogram has a steep slope, so this coordinate y is set. It can be estimated that there is a high possibility that there is a line boundary.
  • step S54-3 the process proceeds to step S54-3.
  • H (y) ⁇ H (y ⁇ 1) is equal to or less than the row determination threshold th2 (H (y) ⁇ H (y ⁇ 1) ⁇ th2)
  • a black pixel is detected at the coordinate y from the row determination threshold th2.
  • the process proceeds to step S54-4.
  • step S54-3 the coordinate y is stored as a coordinate that is estimated to have a line boundary, and the process proceeds to step S54-4.
  • step S54-3 or when it is determined in step S54-2 that P (n) -P (n-1) is equal to or less than the row determination threshold th2, the process proceeds to step S54-4, and y is incremented and the next The operation from step S54-2 to step S54-5 is repeated until it is determined in step S54-5 that the coordinate yend is the end of the character string area 32 of y.
  • step S54-6 the line boundary determination unit 6 determines that there is one coordinate y estimated to have a line boundary, and if there are a plurality of estimated coordinates adjacent to each other.
  • the coordinate y located at the center among the plurality of coordinates y is determined to be a line boundary.
  • Other operations are the same as those in the first embodiment.
  • the frequency of black pixels in the row direction of the character string region extracted by the character string region extraction unit 3 extracted from the input image data is shown.
  • the threshold for determining the line boundary is appropriately set based on the characteristics obtained from the entire line direction, The character string in the character string area can be separated into appropriate lines.
  • the binarization processing unit 2 generates and uses binarized data, but the target data is not limited to the binarized data, and the boundary between the character part and the line is As long as the data can be distinguished, it is also possible to use, for example, multi-value data representing pixels in multiple values, or data indicating chromaticity.
  • the character string region data includes binarized data in the character region extraction unit.
  • the binarized data may be sent directly from the binarization processing unit 2 to each unit that requires the binarized data. Not only binarized data but also other data may be sent directly from each unit to each unit that requires the data.
  • the center position of the plurality of coordinates y is set to the line position only when there are a plurality of adjacent coordinates y that are estimated to be line boundaries by the line boundary determination unit 6. Judged to be a boundary. However, if a certain range is provided and the coordinates y estimated to have a line boundary are within the certain range, it is estimated that the coordinates y are adjacent to each other, and the center position is determined to be the line boundary. May be.
  • FIG. 12 is a hardware configuration diagram for realizing the character recognition apparatus according to the first embodiment by hardware.
  • An image is input by an image capturing device 8 including a scanner and a camera.
  • the binarization processing unit 2, the character string extraction unit 3, the histogram generation unit 4, the threshold value calculation unit 5, the line boundary determination unit 6, and the character recognition unit 7 are realized by a processing circuit 9.
  • the processing circuit 9 may be realized by, for example, a single circuit, a composite circuit, a programmed processor, a parallel programmed processor, or various electronic circuits combining these.
  • the display 10 displays the progress of the process.
  • Each program is stored in the hard disk 11.
  • the processor 12 is converted into the binarization processing unit 2, the character string extraction unit 3, the histogram generation unit 4, the threshold value calculation unit 5, the line boundary determination unit 6, and the character recognition unit. Function as 7.
  • the processor 12 is realized by software, firmware, or a combination of software and firmware. These software and firmware are stored in the hard disk 11 and function by being taken out from the hard disk 11 to the memory 13 when executed.

Abstract

According to the present invention, a character recognition device generates a histogram representing the number of black pixels in each pixel row in a character string region that is extracted from input image data by a character string region extraction unit, then calculates a row determination threshold value from the generated histogram, and determines the boundaries between adjacent rows of characters in the character string region on the basis of the calculated row determination threshold value. Thus, according to the present invention, the threshold value for determining the boundaries between adjacent rows of characters is appropriately set taking into account a feature of each pixel row, making it possible to appropriately divide the character string region into rows comprising individual character strings.

Description

文字認識装置及び文字認識方法Character recognition device and character recognition method
 本発明は、文字認識装置及び文字認識方法に関するものである。 The present invention relates to a character recognition device and a character recognition method.
 紙で作成されていた文書等を、例えばスキャナで読み込むことにより画像データとして電子化し保存する需要があり、これらの電子化された文書を活用する際には、キーワードで文書を検索することができるようにした文書検索及び管理システムがある。このようなシステムにおいて文書等を電子化する際に、文書内の文字を自動的に認識してキーワード化するため、文字認識技術が使用される。従来の文字認識技術においては、まず、画像データのどこに文字列が存在するかを大まかな範囲で特定し、その範囲の中で異なる行があるか否かを判別し、複数の行がある場合に適切に行を切り分ける処理を行うことにより文字認識の精度を向上させていた。 There is a demand to digitize and store documents created on paper as image data by reading them with a scanner, for example, and when using these digitized documents, the documents can be searched with keywords. There is a document retrieval and management system. When a document or the like is digitized in such a system, a character recognition technique is used to automatically recognize characters in the document and convert them into keywords. In the conventional character recognition technology, first, specify where the character string exists in the image data in a rough range, determine whether there are different lines in the range, and if there are multiple lines The character recognition accuracy was improved by performing the process of separating lines appropriately.
 特許文献1の文字認識装置では、二値化された画像データから行方向に黒画素の頻度を示すヒストグラムを作成し、ヒストグラムの谷となる位置を行の境目の候補である行間位置として推定する。また、文字認識装置は予め1文字と認識されると思われる文字の領域を矩形で示す文字矩形領域を推定し、それぞれの文字矩形領域内の画像の文字らしさを示す確信度Cを算出する。そして、上記のように推定した行間位置が上記確信度Cの高い文字矩形を分断しなければその行間位置を行の境目と判定することで、複数の行がある場合に行を切り分けている。 In the character recognition device of Patent Document 1, a histogram indicating the frequency of black pixels in the row direction is created from the binarized image data, and a position that becomes a valley of the histogram is estimated as a position between lines that is a candidate for a line boundary. . In addition, the character recognition device estimates a character rectangular area in which a character area that is supposed to be recognized as one character is indicated by a rectangle, and calculates a certainty factor C indicating the character of the image in each character rectangular area. If the estimated line spacing does not divide the character rectangle having the high certainty factor C, the line spacing is determined as the boundary between the lines, thereby separating the lines.
特開2009-211432号公報JP 2009-211142 A
 上記のような行の切り分け方法では、行の境目を決定する際に推定した文字矩形領域内の画像の文字らしさを示す確信度Cを用いているが、行間が狭い場合などに確信度Cの高い文字矩形領域が行間を跨いでいると、行間位置を行の境目として判定できず、適切に行を分断することができない。 In the line segmentation method as described above, the certainty factor C indicating the character character of the image in the character rectangular area estimated when the line boundary is determined is used. However, when the line spacing is narrow, the certainty factor C is used. If a high-character rectangular region straddles between lines, the line spacing position cannot be determined as a line boundary, and the line cannot be divided appropriately.
 本発明は、上記課題を解決するためになされたものであり、精度よく行を切り分けられる文字認識装置1を得ることを目的とする。 The present invention has been made to solve the above-described problem, and an object of the present invention is to obtain a character recognition device 1 that can accurately separate lines.
 この発明に係る文字認識装置においては、入力された画像データから抽出した文字列領域抽出部で抽出された文字列領域の行方向の黒画素の頻度を示すヒストグラムを生成し、生成されたヒストグラムから行判定閾値を算出し、算出された行判定閾値に基づいて文字列領域における異なる行の境界を判定するようにしたものである。 In the character recognition device according to the present invention, a histogram indicating the frequency of black pixels in the row direction of the character string region extracted by the character string region extraction unit extracted from the input image data is generated, and the generated histogram is used. A line determination threshold is calculated, and a boundary between different lines in the character string region is determined based on the calculated line determination threshold.
 本発明によれば、画像データから文字列領域を抽出し、行方向の黒画素の頻度を示すヒストグラムから行判定閾値を設定するので、行の境目を判断するための閾値を行方向全体から得られる特徴を踏まえて適切に設定され、文字列領域内の文字列を適切な行に分離することができる。 According to the present invention, the character string region is extracted from the image data, and the line determination threshold value is set from the histogram indicating the frequency of black pixels in the line direction. Therefore, the threshold value for determining the line boundary is obtained from the entire line direction. The character string in the character string area can be separated into appropriate lines.
実施の形態1に係る文字認識装置の構成図である。It is a block diagram of the character recognition apparatus which concerns on Embodiment 1. FIG. 本実施の形態1にかかる文字認識装置の文字列領域抽出部が文字列領域を抽出する動作の例を示す説明図である。It is explanatory drawing which shows the example of the operation | movement which the character string area | region extraction part of the character recognition apparatus concerning this Embodiment 1 extracts a character string area | region. (a)は実施の形態1に係る文字式領域の例を示す説明図、(b)は実施の形態1に係る文字認識装置のヒストグラム生成部が生成するヒストグラムの例を示す説明図である。(A) is explanatory drawing which shows the example of the character type area | region which concerns on Embodiment 1, (b) is explanatory drawing which shows the example of the histogram which the histogram generation part of the character recognition apparatus which concerns on Embodiment 1 produces | generates. 実施の形態1に係る文字認識装置が領域ごとに重み係数ρを変更することが有用である場合の画像の例の説明図である。It is explanatory drawing of the example of an image in case it is useful for the character recognition apparatus which concerns on Embodiment 1 to change the weighting coefficient (rho) for every area | region. 実施の形態1に係る文字認識装置の動作を示すフローチャートである。4 is a flowchart illustrating an operation of the character recognition device according to the first embodiment. 実施の形態1に係る文字認識装置の閾値算出部が行判定閾値を算出する動作を示す詳細フローチャートである。6 is a detailed flowchart illustrating an operation in which a threshold value calculation unit of the character recognition device according to Embodiment 1 calculates a line determination threshold value. 実施の形態1に係る文字認識装置の行境界判定部における行の境目を推定する動作を示す詳細フローチャートである。6 is a detailed flowchart illustrating an operation of estimating a line boundary in a line boundary determination unit of the character recognition device according to the first embodiment. 実施の形態2に係る文字認識装置の行境界判定部における行の境目を推定する動作を示す詳細フローチャートである。10 is a detailed flowchart illustrating an operation of estimating a line boundary in a line boundary determination unit of the character recognition device according to the second embodiment. 実施の形態3に係る2行の長さが異なる場合の文字列領域のヒストグラムを生成した場合の画像及びヒストグラムの例の説明図である。It is explanatory drawing of the example at the time of producing | generating the histogram of the character string area | region when the length of 2 lines which concerns on Embodiment 3 differs. 実施の形態3に係る文字認識装置の行境界判定部における行の境目を推定する動作を示す詳細フローチャートである。12 is a detailed flowchart illustrating an operation of estimating a line boundary in a line boundary determination unit of the character recognition device according to the third embodiment. 実施の形態4に係る文字認識装置の行境界判定部における行の境目を推定する動作を示す詳細フローチャートである。14 is a detailed flowchart illustrating an operation of estimating a line boundary in a line boundary determination unit of the character recognition device according to the fourth embodiment. 実施の形態1に係る文字認識装置をハードウェアで実現するためのハードウェア構成図である。It is a hardware block diagram for implement | achieving the character recognition apparatus which concerns on Embodiment 1 with hardware. 実施の形態1に係る文字認識装置をソフトウェアで実現する場合のハードウェア構成図である。It is a hardware block diagram in the case of implement | achieving the character recognition apparatus which concerns on Embodiment 1 with software.
実施の形態1.
 図1は、実施の形態1に係る文字認識装置1の構成図である。文字認識装置1は、画像データを二値化処理して二値化データを生成する二値化処理部2と、前記二値化データをもとに文字列領域を抽出して文字列領域データを生成する文字列領域抽出部3と、前記二値化データと前記文字列領域データから行方向の黒画素の頻度を示すヒストグラムを生成するヒストグラム生成部4と、前記ヒストグラムから行判定閾値を算出する閾値算出部5と、前記行判定閾値を用いて前記文字列領域における異なる行の境界を判定する行境界判定部6と、前記行境界判定部6が判定した行の境界に基づいて文字列領域の文字を認識する文字認識部7とを有して構成されている。
Embodiment 1 FIG.
FIG. 1 is a configuration diagram of a character recognition device 1 according to the first embodiment. The character recognition device 1 includes: a binarization processing unit 2 that binarizes image data to generate binarized data; and a character string region data by extracting a character string region based on the binarized data A character string region extraction unit 3 that generates a histogram, a histogram generation unit 4 that generates a histogram indicating the frequency of black pixels in the row direction from the binarized data and the character string region data, and a row determination threshold value calculated from the histogram A threshold value calculating unit 5, a line boundary determining unit 6 that determines a boundary between different lines in the character string region using the line determination threshold value, and a character string based on the line boundary determined by the line boundary determining unit 6 And a character recognition unit 7 for recognizing characters in the area.
 ここで、二値化処理部2は、スキャナやカメラなどの画像取り込み装置から送られてくる画像データの二値化処理を行い二値化データを生成する。二値化処理とは、濃淡のある画像を白と黒の2階調の画像に変換する処理であり、例えば二値化データのあるビット位置の画素値をα(x,y)とすると、α(x,y)は黒画素である場合に1、白画素である場合に0と変換する。具体的にはある閾値を定めて、各画素の値が閾値を上回っていれば白、下回っていれば黒に置き換えるという処理を行う。二値化処理の方法については上述したように1つの閾値を設定する方法に限らず、画像を輝度の範囲に応じて複数に分け、領域ごとに異なる閾値を設けることで二値化する等してもよい。文字認識のために、背景を含む画像データの余白を白画素に変換し、余白以外の文字、罫線、シンボル図形などを黒画素に変換する二値化処理も存在する。なお、二値化処理部2に入力する二値化対象の画像データは、画像や文字を表す種々のフォーマットで表されているものであればよく、例えばJPEG(Joint Photographic Experts Group)、TIFF(Tagged Image File Format)、BMP(Bit MaP)などの画像や文字に関するデータフォーマットで表される画像である。 Here, the binarization processing unit 2 performs binarization processing on image data sent from an image capturing device such as a scanner or a camera to generate binarized data. The binarization process is a process for converting a grayscale image into a two-tone image of white and black. For example, if the pixel value at a bit position in the binarized data is α (x, y), α (x, y) is converted to 1 when it is a black pixel and 0 when it is a white pixel. Specifically, a certain threshold value is determined, and processing is performed in which if the value of each pixel is above the threshold value, it is replaced with white, and if it is below, it is replaced with black. As described above, the binarization processing method is not limited to the method of setting one threshold value, and the image is divided into a plurality according to the luminance range, and binarization is performed by providing different threshold values for each region. May be. For character recognition, there is also a binarization process in which the margin of image data including the background is converted into white pixels, and characters, ruled lines, symbol figures, etc. other than the margin are converted into black pixels. The image data to be binarized to be input to the binarization processing unit 2 may be any data that is represented in various formats representing images and characters. For example, JPEG (Joint Photographic Experts Group), TIFF ( This is an image represented in a data format related to images and characters such as Tagged Image File Format) and BMP (Bit MaP).
 文字列領域抽出部3では、二値化処理部2で生成された二値化データから文字列領域を抽出する。文字列とは、文字があると推定される二値化データの黒画素の集合を指すものである。また、文字列領域を抽出するとは、二値化データの中で黒画素の集合が検出された場合にその辺りに文字が存在すると推定し、検出した黒画素の集合を含む範囲であるx軸上の最初xstartから最後xendまでとy軸上の最初ystartから最後yendまでの位置情報を含む矩形状の領域を文字列領域として抽出することである。文字列領域抽出部3は文字列領域を抽出し、抽出した文字列領域の位置情報を含む矩形状の範囲を示すデータ、すなわち前記xstart、xend、ystart、yendの座標データ及び二値化データを文字列領域データとして生成する。 The character string area extraction unit 3 extracts a character string area from the binarized data generated by the binarization processing unit 2. A character string refers to a set of black pixels of binarized data estimated to have characters. The extraction of the character string region means that when a set of black pixels is detected in the binarized data, it is estimated that there is a character around the x-axis, which is a range including the detected set of black pixels. A rectangular area including position information from the first xstart to the last xend on the upper side and from the first ystart to the last yend on the y-axis is extracted as a character string area. The character string area extraction unit 3 extracts a character string area, and displays data indicating a rectangular range including position information of the extracted character string area, that is, coordinate data and binarized data of the xstart, xend, ystart, and yend. Generated as character string area data.
 図2は、本実施の形態1にかかる文字認識装置1の文字列領域抽出部3が文字列領域3bを抽出する動作の例を示す説明図である。二値化処理部2で生成された二値化データが図2に示すような二値化画像3aを表している場合、文字列領域抽出部3は、この二値化画像3aの中で黒画素の集合を検出し、検出した黒画素の集合を含む矩形状の範囲を文字列領域3bとして抽出し、抽出した文字列領域3bの矩形状の範囲を示すデータを文字領域データとして生成する。文字列領域データは、例えば二値化画像3aにおける文字列領域3bのx軸上の最初の座標xstartから最後の座標xend及びy軸上の最初の座標ystartから最後の座標yendを示すデータである。上述した文字列領域3bの抽出方法は一例であり、文字列領域が抽出されるものであれば他の方法を用いてもよい。
 また、文字列領域抽出部3は、抽出した文字列領域3bの形状、あるいは二値化画像の属性情報等から、抽出した文字列領域内の行の方向を推定する。
FIG. 2 is an explanatory diagram illustrating an example of an operation in which the character string region extraction unit 3 of the character recognition device 1 according to the first embodiment extracts the character string region 3b. When the binarized data generated by the binarization processing unit 2 represents the binarized image 3a as shown in FIG. 2, the character string region extraction unit 3 includes black in the binarized image 3a. A set of pixels is detected, a rectangular range including the detected set of black pixels is extracted as a character string region 3b, and data indicating the rectangular range of the extracted character string region 3b is generated as character region data. The character string area data is data indicating, for example, the first coordinate xstart from the first coordinate xstart on the x axis and the last coordinate yend from the first coordinate ystart on the y axis of the character string area 3b in the binarized image 3a. . The extraction method of the character string region 3b described above is an example, and other methods may be used as long as the character string region is extracted.
Further, the character string region extraction unit 3 estimates the direction of the line in the extracted character string region from the shape of the extracted character string region 3b or the attribute information of the binarized image.
 ヒストグラム生成部4では、文字列領域抽出部3で抽出された文字列領域を用いてヒストグラムを生成する。具体的には、ヒストグラム生成部4は、二値化処理部2で生成された二値化データと文字列領域抽出部3で生成された文字列領域データから、文字列領域を抽出する際に推定した行の方向において黒画素が出現する頻度を示すヒストグラムを生成する。図3(a)は文字式領域の例を示す説明図、図3(b)は実施の形態1に係る文字認識装置1のヒストグラム生成部4が生成するヒストグラムの例を示す説明図である。図3(a)では、二値化データのx軸と文字列領域3bのx軸の方向は一致しているが、文字式領域の推定した行の方向は二値化データのx軸と合わせるようにしてもよい。ヒストグラム生成部4はy軸の画素単位の座標ごとにx軸方向の黒画素数の合計を求め、ヒストグラムを生成する。このヒストグラムH(y)を生成する式の例を以下に示す。α(x,y)は座標(x,y)における二値化データの値を表しており、黒画素である場合に1、白画素である場合に0となっている。座標yの関数であるH(y)は、各座標yにおいて、文字列領域の行方向(=x軸方向)の座標の最初xstartから最後xendまでの範囲における黒画素の合計である。 The histogram generation unit 4 generates a histogram using the character string region extracted by the character string region extraction unit 3. Specifically, when the histogram generation unit 4 extracts a character string region from the binarized data generated by the binarization processing unit 2 and the character string region data generated by the character string region extraction unit 3, A histogram indicating the frequency of black pixels appearing in the estimated row direction is generated. FIG. 3A is an explanatory diagram illustrating an example of a character expression area, and FIG. 3B is an explanatory diagram illustrating an example of a histogram generated by the histogram generation unit 4 of the character recognition device 1 according to the first embodiment. In FIG. 3A, the x-axis direction of the binarized data and the x-axis direction of the character string area 3b match, but the estimated line direction of the character expression area matches the x-axis of the binarized data. You may do it. The histogram generation unit 4 determines the total number of black pixels in the x-axis direction for each coordinate of the y-axis pixel unit, and generates a histogram. An example of an expression for generating this histogram H (y) is shown below. α (x, y) represents the value of the binarized data at the coordinates (x, y), and is 1 when it is a black pixel and 0 when it is a white pixel. H (y), which is a function of the coordinate y, is the total number of black pixels in the range from the first xstart to the last xend of the coordinates in the row direction (= x-axis direction) of the character string region at each coordinate y.
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
 図3の例では、図3(a)に示す文字列領域3b内において文字が存在する部分では図(b)のヒストグラムにおいてy軸方向の黒画素頻度が多く、点線で挟まれる行境界領域4aでは、y軸方向の黒画素頻度が20程度と小さい様子がわかる。 In the example of FIG. 3, in the portion of the character string region 3b shown in FIG. 3 (a), the black pixel frequency in the y-axis direction is high in the histogram of FIG. Then, it can be seen that the black pixel frequency in the y-axis direction is as small as about 20.
 閾値算出部5は、ヒストグラム生成部4で生成された文字列領域内におけるヒストグラムから、行の境目を判断するための閾値を算出する。文字が存在する場合はH(y)の値は大きく、行の境目ではH(y)は小さくなるため、H(y)の値をもって行の境目を判定することができる。この判定のための閾値を個別の認識対象画像それぞれのヒストグラムから求めることで、画像ごとの特徴を踏まえた閾値、特に文字単位のような微視的な判断基準でない、行方向全体から得られる特徴に基づいた閾値を設定することができる。この実施の形態1では、ヒストグラム生成部4で生成されたヒストグラムのピーク値Pを検出し、ピーク値Pから行判定閾値th1を算出する。閾値算出部5が、行判定閾値th1を算出する際に使用する式を以下に示す。ここでρは重み係数を示すパラメータである。 The threshold value calculation unit 5 calculates a threshold value for determining a line boundary from the histogram in the character string region generated by the histogram generation unit 4. When a character is present, the value of H (y) is large, and H (y) is small at the line boundary. Therefore, the line boundary can be determined from the value of H (y). By obtaining a threshold for this determination from the histogram of each individual recognition target image, a threshold based on the characteristics of each image, in particular, a characteristic obtained from the entire row direction that is not a microscopic criterion such as character units. It is possible to set a threshold based on In the first embodiment, the peak value P of the histogram generated by the histogram generation unit 4 is detected, and the row determination threshold th1 is calculated from the peak value P. An expression used when the threshold calculation unit 5 calculates the row determination threshold th1 is shown below. Here, ρ is a parameter indicating a weighting factor.
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
 重み係数ρは、ユーザまたは自動的に設定されるパラメータである。処理対象の画像における文字列の行間が狭い場合は、文字が存在する箇所と行の境目と推定される箇所との間でH(y)の値の差が小さく、行の境目付近のH(y)の値が十分小さくならない可能性があるため、適切な判定のために閾値を高めに設定する必要があり、このために重み係数ρを大きくする。一方、行間が広い場合は、文字が存在する箇所でH(y)のばらつきにより多少H(y)が低くなる部分があっても、そこを行の境目とみなしてしまわないよう閾値を低めに設定する必要があり、このために重み係数ρを小さくする。重み係数ρの設定は、画像全体で一定でも良いし、画像を領域に分割し、領域ごとに変更させてもよい。重み係数ρを領域ごとに変更する場合の領域の分け方としては、例えば、罫線で囲まれた領域や空白で囲まれた領域などがある。これらの領域を自動的に検出するために、罫線検出や空白検出、およびシンボル検出等をすることも可能であり、その手法は、従来より種々の方法が存在する。例えば、罫線検出及び空白検出には下記参考文献1の手法が、シンボル検出には下記参考文献2の手法がある。
参考文献1:平野敬、岡田康裕、依田文夫、“文書画像からの罫線抽出方式”、電子情報通信学会総合大会、1998年3月
参考文献2:米山昇吾、平野敬、岡田康裕、“図面画像内シンボル抽出方式の検討”、電子情報通信学会総合大会、2006年3月
The weighting factor ρ is a parameter that is set by the user or automatically. When the line spacing of the character string in the image to be processed is narrow, the difference in the value of H (y) is small between the position where the character exists and the estimated position of the line, and the H ( Since the value of y) may not be sufficiently small, it is necessary to set the threshold value higher for an appropriate determination. For this purpose, the weighting factor ρ is increased. On the other hand, if the line spacing is wide, even if there is a part where H (y) is somewhat lower due to variations in H (y) where there is a character, the threshold is lowered so that it is not considered as a line boundary. For this purpose, the weighting factor ρ is reduced. The setting of the weighting factor ρ may be constant for the entire image, or the image may be divided into regions and changed for each region. As a method of dividing the area when the weighting coefficient ρ is changed for each area, for example, there are an area surrounded by a ruled line and an area surrounded by a blank. In order to automatically detect these areas, it is possible to perform ruled line detection, blank detection, symbol detection, and the like, and there are various conventional methods. For example, there is a method of Reference 1 below for ruled line detection and blank detection, and a method of Reference 2 below for symbol detection.
Reference 1: Takashi Hirano, Yasuhiro Okada, Fumio Yoda, “Rule Extraction Method from Document Images”, IEICE General Conference, March 1998 Reference 2: Noboru Yoneyama, Takashi Hirano, Yasuhiro Okada, “Drawing “Examination of Image Symbol Extraction Method”, IEICE General Conference, March 2006
 領域ごとに重み係数ρを変更することが有用である場合の画像の例を図4に示す。図4のような画像の場合、行間が狭くなり行境界の行判定が難しいと推定される行判定難領域5aと、行間は広く行境界の行判定が容易であると推定される行判定容易領域5bがある。この場合、ユーザが予め行判定難領域5aの重み係数ρを大きくし、行判定容易領域5bの重み係数ρを小さく設定する。このような重み係数ρの設定により、文字の行間の広さによって行の境界を判定しやすい領域と判定しにくい領域など領域に合わせて適切な行判定閾値を設定することができる。なお、重み係数ρの設定は領域における黒画素の頻度や分布の傾向から自動的に設定されても良い。
 上述した図3の例では、ヒストグラムのピーク値Pが102であり、重み係数ρを0.22とした結果、行判定閾値th1が22となった場合を示している。
FIG. 4 shows an example of an image when it is useful to change the weighting factor ρ for each region. In the case of the image as shown in FIG. 4, the line determination difficulty region 5a, in which the line space is narrowed and it is estimated that the line boundary is difficult to determine, and the line determination is easy to determine that the line boundary is wide and the line boundary is easily determined. There is a region 5b. In this case, the user previously increases the weighting factor ρ of the row determination difficult region 5a and sets the weighting factor ρ of the row determination easy region 5b small. By setting such a weighting factor ρ, it is possible to set an appropriate line determination threshold according to an area such as an area where it is difficult to determine a line boundary according to the width of a character line. The weighting factor ρ may be automatically set based on the frequency of black pixels in the region and the tendency of distribution.
In the example of FIG. 3 described above, the case where the peak value P of the histogram is 102 and the weight coefficient ρ is 0.22, the result is that the row determination threshold th1 is 22.
 行境界判定部6では、閾値算出部5によって算出された行判定閾値th1をもとに文字列領域における異なる行の境界を判定する。判定した行の境界は、行の境界が存在すると判定された行の位置情報を示すものである。行境界判定部6が行の境界とする判定条件は、以下の式である。H(y)が行判定閾値th1よりも小さい場合は座標yに行の境界があると推定し、H(y)が行判定閾値th1以上である場合は座標yに行の境界がないと推定する。 The line boundary determination unit 6 determines a boundary between different lines in the character string area based on the line determination threshold th1 calculated by the threshold calculation unit 5. The determined line boundary indicates position information of a line determined to have a line boundary. The determination condition that the line boundary determination unit 6 uses as a line boundary is the following expression. When H (y) is smaller than the line determination threshold th1, it is estimated that there is a line boundary at the coordinate y, and when H (y) is greater than or equal to the line determination threshold th1, it is estimated that there is no line boundary at the coordinate y. To do.
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000003
 そして、行の境界があると推定された座標yが1つの場合はその座標yを、また推定された座標が複数隣り合って存在する場合は、複数の座標yのうち中央に位置する座標yを行の境界であると判定する。行の境界があると判定された隣り合う複数の座標yのどこを行の境界と判定するかについては、中央に限らず、隣り合う複数の座標yの中から選定されれば良い。また、行の境界があると判定された座標yが隣り合わない場合にはその座標yを行境界であると判定する。行境界は1つの文字列領域に1つとは限らず複数存在する場合がある。
 文字認識部7では、行境界判定部6が判定した行の境界判定と文字列領域抽出部3が抽出した文字列領域に基づいて文字列領域内の文字認識処理を行う。文字認識を行う方法として、従来より種々の方法が存在する。例えばランレングス補正を用いて画像劣化に対するロバスト性を向上させた下記参考文献の手法がある。
参考文献3:森稔、澤木美奈子、萩田紀博、村瀬洋、武川直樹、“ランレングス補正を用いた画像劣化にロバストな特徴抽出”、電子情報通信学会論文誌、Vol.J86-D2、No.7、pp.1049-1057、2003年7月
文字認識部7は文字認識処理が完了すると、文字認識結果を出力する。以上が文字認識装置1に係る構成である。
If there is one coordinate y estimated to have a line boundary, the coordinate y is present. If there are a plurality of estimated coordinates adjacent to each other, the coordinate y located at the center among the plurality of coordinates y. Is a line boundary. Which of the plurality of adjacent coordinates y determined to have a line boundary is determined as a line boundary is not limited to the center, and may be selected from a plurality of adjacent coordinates y. If coordinates y determined to have a line boundary are not adjacent to each other, the coordinates y are determined to be a line boundary. The line boundary is not limited to one in one character string area, and there may be a plurality of line boundaries.
The character recognition unit 7 performs character recognition processing in the character string region based on the line boundary determination determined by the line boundary determination unit 6 and the character string region extracted by the character string region extraction unit 3. Conventionally, there are various methods for performing character recognition. For example, there is a technique described in the following reference in which robustness against image degradation is improved by using run length correction.
Reference 3: Minoru Mori, Minako Sawaki, Norihiro Hamada, Hiroshi Murase, Naoki Takekawa, “Robust Feature Extraction for Image Degradation Using Run-Length Correction”, IEICE Transactions, Vol. J86-D2, No. 7, pp. 1049-1057, July 2003 When the character recognition processing is completed, the character recognition unit 7 outputs a character recognition result. The above is the configuration related to the character recognition device 1.
 次に、本実施の形態1に係る文字認識装置1の動作について説明する。図5は、本実施の形態に係る文字認識装置1の動作を示すフローチャートである。 Next, the operation of the character recognition device 1 according to the first embodiment will be described. FIG. 5 is a flowchart showing the operation of the character recognition device 1 according to the present embodiment.
 まず、ステップS1にて、二値化処理部2は画像データの二値化処理を行い二値化データを生成する。生成された二値化データは、文字列領域抽出部3に送られる。 First, in step S1, the binarization processing unit 2 performs binarization processing on image data to generate binarized data. The generated binarized data is sent to the character string region extraction unit 3.
 ステップS2にて、文字列領域抽出部3はステップS1で生成された二値化データから文字列領域を抽出し、抽出した文字列領域を示す文字列領域データを生成する。また、文字列領域抽出部3は、抽出した文字列領域の形状、あるいは二値化画像の属性情報から、抽出した文字列領域内の行の方向を推定する。文字列領域抽出部3で生成された文字列領域データは入力した二値化データ及び推定した行方向を示すデータとともに文字認識部7に送られる。ここで、文字列領域は一つの二値化データに対して1つのみとは限らず、複数抽出される場合もある。以下のステップでは、ステップS2で抽出された文字列領域のうちの1つの文字認識処理について記載する。 In step S2, the character string region extraction unit 3 extracts a character string region from the binarized data generated in step S1, and generates character string region data indicating the extracted character string region. In addition, the character string region extraction unit 3 estimates the direction of the line in the extracted character string region from the shape of the extracted character string region or the attribute information of the binarized image. The character string region data generated by the character string region extraction unit 3 is sent to the character recognition unit 7 together with the input binarized data and data indicating the estimated line direction. Here, the number of character string regions is not limited to one for one binarized data, and a plurality of character string regions may be extracted. In the following steps, one character recognition process in the character string area extracted in step S2 will be described.
 ステップS3にて、ヒストグラム生成部4はステップS2で抽出された文字列領域を用いてヒストグラムを生成する。ヒストグラム生成部4は、ステップS1にて生成された二値化データとステップS2にて抽出された文字列領域データから、文字列領域を抽出する際に推定した行の方向において黒画素が出現する頻度を示すヒストグラムを生成する。図3(a)は推定した行の方向をx軸として正規化した画像の例であり、この画像に基づいて生成されるヒストグラムの例が図3(b)に示したものである。
 ヒストグラム生成部4で生成されたヒストグラムを示すデータは、ステップS2にて生成された文字列領域データと共に閾値算出部5に送られる。
In step S3, the histogram generation unit 4 generates a histogram using the character string region extracted in step S2. The histogram generation unit 4 generates black pixels in the direction of the row estimated when extracting the character string region from the binarized data generated in step S1 and the character string region data extracted in step S2. A histogram indicating the frequency is generated. FIG. 3A shows an example of an image normalized with the estimated row direction as the x-axis, and an example of a histogram generated based on this image is shown in FIG.
Data indicating the histogram generated by the histogram generation unit 4 is sent to the threshold value calculation unit 5 together with the character string region data generated in step S2.
 ステップS4にて、閾値算出部5はステップS3にて生成されたヒストグラムを用いて、行の境目を判断するための行判定閾値th1を算出する。算出された行判定閾値th1は、ステップS2にて生成された文字列領域データ及びステップS3で生成されたヒストグラムデータと共に行境界判定部6へ送られる。
 ここで、この行判定閾値th1を算出するためのステップS4の詳細の動作について説明する。図6は、閾値算出部5が行判定閾値を算出する動作を示す詳細フローチャートである。
In step S4, the threshold value calculation unit 5 calculates a row determination threshold value th1 for determining a line boundary using the histogram generated in step S3. The calculated line determination threshold th1 is sent to the line boundary determination unit 6 together with the character string region data generated in step S2 and the histogram data generated in step S3.
Here, the detailed operation of step S4 for calculating the row determination threshold value th1 will be described. FIG. 6 is a detailed flowchart illustrating an operation in which the threshold calculation unit 5 calculates a row determination threshold.
 ステップS41にて、閾値算出部5はステップS3のヒストグラム生成部4で生成されたヒストグラム内のピーク値Pを検出する。
 ステップS42にて、閾値算出部5はステップS41にて算出したピーク値Pと重み係数ρを用いて行判定閾値th1を算出する。
 図3に示した例では、ヒストグラムのピーク値Pが102であり、重み係数ρを0.22とした結果、行判定閾値th1が22となった場合を示している。図3(b)においてH(y)がth1としての22より小さい行境界領域4aに対応して、図3(a)に示す文字領域の行境界領域4aのあたりに行の境界がある様子がわかる。重み係数ρは、上述したとおり、画像の種類ごと、あるいは画像内の領域ごとに適した値に調整することで、より適切な行判定を行うことができる。
In step S41, the threshold value calculation unit 5 detects the peak value P in the histogram generated by the histogram generation unit 4 in step S3.
In step S42, the threshold value calculation unit 5 calculates the row determination threshold value th1 using the peak value P calculated in step S41 and the weighting coefficient ρ.
The example shown in FIG. 3 shows a case where the peak value P of the histogram is 102, and the row determination threshold th1 is 22 as a result of setting the weighting coefficient ρ to 0.22. In FIG. 3B, there is a line boundary corresponding to the line boundary area 4a of the character area shown in FIG. 3A, corresponding to the line boundary area 4a where H (y) is smaller than 22 as th1. Recognize. As described above, the weight coefficient ρ can be adjusted to a value suitable for each type of image or for each region in the image, so that more appropriate row determination can be performed.
 ステップS5にて、行境界判定部6はステップS4で算出した行判定閾値th1を用いて文字列領域内における異なる行の境界を判定する。行境界判定部6は、行境界閾値th1とヒストグラムの値H(y)の値を比較することにより行の境目であると推定される座標yを1つまたは複数記憶する。 In step S5, the line boundary determination unit 6 determines the boundary of different lines in the character string region using the line determination threshold th1 calculated in step S4. The line boundary determination unit 6 stores one or a plurality of coordinates y estimated to be a line boundary by comparing the line boundary threshold th1 and the value of the histogram H (y).
 この行境界判定部6における行の境目を推定する動作を示す詳細フローチャートを図7に示す。
 まず、ステップS51-1にて文字列領域データに含まれるystartすなわち文字列領域が始まる座標yを初期値として設定する。
 次に、ステップS51-2にて行判定閾値th1と現在の座標yに対応するヒストグラムの値H(y)とを比較する。H(y)が行判定閾値th1よりも小さい場合(H(y)<th1)はこの座標yに行の境目がある可能性が高く、この場合はステップS51-3に進む。一方、H(y)が行判定閾値th1以上(H(y)≧th1)の場合はこの座標yには文字列が存在する可能性が高く、この場合はステップS54に進む。
 ステップS51-3では、座標yを行の境界が存在すると推定される座標として記憶して、ステップS51-4に進む。
 ステップS51-3終了後あるいはステップS51-2でH(y)が行判定閾値th1以上である場合は、ステップS51-4に進み、yをインクリメントして次の座標yとし、ステップS51-5でyが文字列領域32が終わる座標yendと判定されるまで、ステップS51-2からステップS51-5までの動作を繰り返す。
 このような動作により、文字列領域内において行の境目と推定される座標yが1つまたは複数抽出され、記憶される。
 そして、行境界判定部6はステップS51-6で、行の境目があると推定された座標yが1つの場合はその座標yを、また推定された座標が複数隣り合って存在する場合は、複数の座標yのうち中央に位置する座標yを行の境界であると判定する。
 行の境界であると判定したy座標は行境界データとして、ステップS2にて生成された文字列領域データと共に文字認識部7に送られる。
FIG. 7 is a detailed flowchart showing the operation of estimating the line boundary in the line boundary determination unit 6.
First, in step S51-1, ystart included in the character string area data, that is, the coordinate y at which the character string area starts is set as an initial value.
In step S51-2, the line determination threshold th1 is compared with the histogram value H (y) corresponding to the current coordinate y. When H (y) is smaller than the row determination threshold th1 (H (y) <th1), there is a high possibility that there is a row boundary at this coordinate y. In this case, the process proceeds to step S51-3. On the other hand, if H (y) is greater than or equal to the line determination threshold th1 (H (y) ≧ th1), there is a high possibility that a character string exists at this coordinate y. In this case, the process proceeds to step S54.
In step S51-3, the coordinate y is stored as a coordinate that is estimated to have a line boundary, and the process proceeds to step S51-4.
After step S51-3 or when H (y) is greater than or equal to the row determination threshold th1 in step S51-2, the process proceeds to step S51-4, where y is incremented to the next coordinate y, and in step S51-5 The operation from step S51-2 to step S51-5 is repeated until it is determined that y is the coordinate yield at which the character string region 32 ends.
By such an operation, one or a plurality of coordinates y estimated as a line boundary in the character string region are extracted and stored.
In step S51-6, the line boundary determination unit 6 determines that the coordinate y estimated to be a line boundary is one, and if there are a plurality of estimated coordinates adjacent to each other, Of the plurality of coordinates y, the coordinate y located at the center is determined to be the boundary of the row.
The y coordinate determined to be a line boundary is sent to the character recognition unit 7 as line boundary data together with the character string area data generated in step S2.
 ステップS6にて、文字列認識部はステップS2にて生成された文字列領域データ及びステップS5にて判定した行境界データに基づいて文字認識処理を行う。文字認識部7は文字認識処理が完了すると、文字認識結果を出力する。以上のようにして本実施の形態に係る文字認識装置1により、行の境界の判定、文字認識が行われる。 In step S6, the character string recognition unit performs character recognition processing based on the character string region data generated in step S2 and the line boundary data determined in step S5. When the character recognition process is completed, the character recognition unit 7 outputs a character recognition result. As described above, line boundary determination and character recognition are performed by the character recognition device 1 according to the present embodiment.
 以上のように、実施の形態1に係る文字認識装置1によれば、入力された画像データから抽出した文字列領域抽出部3で抽出された文字列領域の行方向の黒画素の頻度を示すヒストグラムを生成し、生成されたヒストグラムから行判定閾値を算出し、算出された行判定閾値に基づいて文字列領域における異なる行の境界を判定する。これにより、行の境目を判断するための閾値を行方向全体から得られる特徴を踏まえて適切に設定され、文字列領域内の文字列を適切な行に分離することができる。 As described above, according to the character recognition device 1 according to the first embodiment, the frequency of black pixels in the row direction of the character string region extracted by the character string region extraction unit 3 extracted from the input image data is shown. A histogram is generated, a line determination threshold value is calculated from the generated histogram, and a boundary between different lines in the character string region is determined based on the calculated line determination threshold value. Thereby, the threshold value for determining the boundary between lines is appropriately set based on the characteristics obtained from the entire line direction, and the character string in the character string region can be separated into appropriate lines.
実施の形態2.
 次に、実施の形態2に係る文字認識装置1について説明する。実施の形態1では、行境界判定部6の判定基準として、黒画素の頻度を対象とした行判定閾値th1を用いて行判定を行ったが、実施の形態2では黒画素の頻度に加え、ヒストグラムの傾きg(y)を行判定基準として用いることで行境界判定を行う。
 この実施の形態2では、閾値算出部5と行境界判定部6の詳細構成及び動作が実施の形態1と異なり、他の部分は実施の形態1と同様である。
Embodiment 2. FIG.
Next, the character recognition device 1 according to Embodiment 2 will be described. In the first embodiment, the row determination is performed using the row determination threshold th1 for the frequency of black pixels as the determination criterion of the row boundary determination unit 6, but in the second embodiment, in addition to the frequency of black pixels, The line boundary determination is performed by using the gradient g (y) of the histogram as the line determination reference.
In the second embodiment, the detailed configurations and operations of the threshold value calculation unit 5 and the row boundary determination unit 6 are different from those in the first embodiment, and other parts are the same as those in the first embodiment.
 閾値算出部5は、実施の形態1と同様にしてヒストグラム生成部4で生成された文字列領域内におけるヒストグラムから、行の境目を判断するための行判定閾値th1を算出する。また、閾値算出部5には、ヒストグラムの傾きg(y)に関する行判定閾値th2があらかじめ記憶されている。ヒストグラムの傾きg(y)は、dH(y)/dyである。ヒストグラムの傾きg(y)は、行の境目では急になるので数値が大きくなり、文字が存在する領域では緩やかになるので数値が小さくなるため、ヒストグラムの傾きに関する閾値を設定することにより行の境目を判定することができる。 The threshold calculation unit 5 calculates a line determination threshold th1 for determining a line boundary from the histogram in the character string region generated by the histogram generation unit 4 as in the first embodiment. In addition, the threshold value calculation unit 5 stores in advance a row determination threshold value th2 relating to the histogram inclination g (y). The slope g (y) of the histogram is dH (y) / dy. Since the slope of the histogram g (y) becomes steep at the boundary of the line, the numerical value becomes large. In the area where the character exists, the numerical value becomes small and becomes small. Therefore, by setting a threshold value for the slope of the histogram, Can determine the boundary.
 個別の認識対象画像それぞれのヒストグラムから求めた行判定閾値th1を用いるとともに、ヒストグラムの傾きg(y)に関する行判定閾値th2をも用いることで、画像ごとの特徴を踏まえた閾値、特に文字単位のような微視的な判断基準でない、行方向全体から得られる特徴に基づいた閾値を設定するとともに、黒画素の頻度によって行の境目であると推定される座標yについて、さらにヒストグラムの傾きg(y)を対象とした行判定閾値th2を用いることにより、行の境界でない可能性が高い座標yを除去し、より精度のよい行の境目を推定することができる。 By using the line determination threshold th1 obtained from the histogram of each individual recognition target image and also using the line determination threshold th2 regarding the inclination g (y) of the histogram, a threshold based on the characteristics of each image, in particular, in character units. A threshold based on characteristics obtained from the entire row direction, which is not such a microscopic judgment criterion, is set, and a histogram gradient g (for a coordinate y estimated to be a line boundary based on the frequency of black pixels is further added. By using the row determination threshold th2 for y), it is possible to remove the coordinate y that has a high possibility of not being a line boundary, and estimate the boundary between lines with higher accuracy.
 行境界判定部6では、閾値算出部5によって算出された行判定閾値th1に加え、行判定閾値th2をもとに文字列領域における異なる行の境界を判定する。
 行境界判定部6では、閾値算出部5によって算出された行判定閾値th1に基づいて行境界があると推定された座標yについて、さらに以下の式による判断を行い、この式が成り立つ場合、座標yに行境界があると推定し、これ以外の場合は座標yに行の境界がないと推定する。
The line boundary determination unit 6 determines a boundary between different lines in the character string area based on the line determination threshold th2 in addition to the line determination threshold th1 calculated by the threshold calculation unit 5.
In the line boundary determination unit 6, the coordinate y estimated that there is a line boundary based on the line determination threshold th1 calculated by the threshold calculation unit 5 is further determined by the following formula. It is estimated that there is a line boundary in y. In other cases, it is estimated that there is no line boundary in the coordinate y.
Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000004
 H(y)が行判定閾値th1よりも小さい、かつ、H(y)―H(y―1)が行判定閾値th2よりも大きい場合は座標yに行の境界があると推定し、それ以外の場合は位置yに行の境界がないと推定する。なお、ここではヒストグラムの傾きg(y)は、以下の式により、H(y)とH(y―1)の差分値を準用している。 If H (y) is smaller than the row determination threshold th1 and H (y) -H (y-1) is larger than the row determination threshold th2, it is estimated that there is a row boundary at the coordinate y, and otherwise In the case of, it is estimated that there is no line boundary at the position y. Here, the difference value between H (y) and H (y−1) is applied to the gradient g (y) of the histogram according to the following equation.
Figure JPOXMLDOC01-appb-M000005
Figure JPOXMLDOC01-appb-M000005
 そして、行境界判定部6は行の境目があると推定された座標yが1つの場合はその座標yを、行の境界があると推定された座標yが複数隣り合って存在する場合は、複数の座標yのうち中央の位置を行の境界であると判定する。行の境界があると判定された隣り合う複数の座標yのどこを行の境界と判定するかについては、中央に限らず、隣り合う複数の座標yの中から選定されれば良い。また、行の境界があると判定された座標yが隣り合わない場合にはその座標yを行の境界であると判定する。行の境界は1つの文字列領域に1つとは限らず複数存在する場合がある。
 判定した行の境界は、行の境界が存在すると判定された行の位置情報を示すものであり、行境界データとして、文字列領域データと共に文字認識部7に送られる。
 その他の構成については実施の形態1と同様である。
Then, the line boundary determination unit 6 indicates that there is one coordinate y estimated to have a line boundary, and when there are a plurality of adjacent coordinates y that are estimated to have a line boundary, It is determined that the center position among the plurality of coordinates y is the boundary of the row. Which of the plurality of adjacent coordinates y determined to have a line boundary is determined as a line boundary is not limited to the center, and may be selected from a plurality of adjacent coordinates y. If coordinates y determined to have a line boundary are not adjacent to each other, the coordinates y are determined to be a line boundary. A line boundary is not limited to one per character string area, and there may be a plurality of lines.
The determined line boundary indicates position information of a line determined to have a line boundary, and is sent to the character recognition unit 7 together with the character string area data as line boundary data.
Other configurations are the same as those in the first embodiment.
 次に、本実施の形態2に係る文字認識装置1の動作について説明する。動作についてはステップS4とステップS5の詳細動作が実施の形態1と異なる。 Next, the operation of the character recognition device 1 according to the second embodiment will be described. As for the operation, the detailed operations in step S4 and step S5 are different from those in the first embodiment.
 ステップS4では、閾値算出部5はステップS3にて生成されたヒストグラムを用いて、行の境目を判断するための行判定閾値th1を算出する。行判定閾値th1の算出方法については実施の形態1と同様である。閾値算出部5は、行判定閾値th1の他に行判定閾値th2を持つ。行判定閾値th2は、ユーザによって予め設定された固定値であり、閾値算出部5に予め記憶されている。閾値算出部5は、行判定閾値th1及びth2を行境界判定部6に送る。 In step S4, the threshold value calculation unit 5 calculates a row determination threshold value th1 for determining a line boundary using the histogram generated in step S3. The method for calculating the row determination threshold th1 is the same as in the first embodiment. The threshold calculation unit 5 has a row determination threshold th2 in addition to the row determination threshold th1. The row determination threshold value th2 is a fixed value set in advance by the user, and is stored in the threshold value calculation unit 5 in advance. The threshold calculation unit 5 sends the row determination thresholds th1 and th2 to the row boundary determination unit 6.
 この行境界判定部6における行の境目を推定する動作を示す詳細フローチャートを図8に示す。
 まず、ステップS52-1にて文字列領域データに含まれるystartすなわち文字列領域が始まる座標yを初期値として設定する。
 次に、ステップS52-2にて行判定閾値th1と現在の座標yに対応するヒストグラムの値H(y)とを比較する。H(y)が行判定閾値th1よりも小さい場合(H(y)<th1)はこの座標yに行の境目がある可能性が高く、この場合はステップS52-3に進む。一方、H(y)が行判定閾値th1以上(H(y)≧th1)の場合はこの座標yには文字列が存在する可能性が高く、この場合はステップS52-5に進む。
 ステップS52-3にて行判定閾値th2と現在の座標yに対応するヒストグラムの傾きH(y)―H(y―1)とを比較する。H(y)―H(y―1)が行判定閾値th2よりも大きい場合(H(y)―H(y―1)>th2)は、ヒストグラムの傾きが急であることからこの座標yに行の境目がある可能性が高いと推定でき、この場合はステップS52-4に進む。一方、H(y)―H(y―1)が行判定閾値th2以下(H(y)―H(y―1)≦th2)の場合は、この座標yには行判定閾値th1より黒画素数が少ないものの、ヒストグラムの傾きが緩やかであることから行の境目ではない可能性が高く、この場合はステップS52-5に進む。
 ステップS52-4では、座標yを行の境界が存在すると推定される座標として記憶して、ステップS52-5に進む。
 ステップS52-4終了後、あるいはステップS52-2でH(y)が行判定閾値th1以上と判断、あるいはステップS52-3でH(y)―H(y―1)が行判定閾値th2以下と判断された場合は、ステップS52-5に進み、yをインクリメントして次の座標yとし、ステップS52-6でyの文字列領域32が終わる座標yendと判定されるまで、ステップS52-2からステップS52-5までの動作を繰り返す。
 このような動作により、文字列領域内において行の境目と推定される座標yが1つまたは複数抽出され、記憶される。
 そして、行境界判定部6はステップS52-7で、行の境目があると推定された座標yが1つの場合はその座標yを、また推定された座標が複数隣り合って存在する場合は、複数の座標yのうち中央に位置する座標yを行の境界であると判定する。
 なお、ステップS52-2で行った黒画素数による推定と、ステップS52-3で行ったヒストグラムの傾きによる推定は、入れ替えて行っても良い。この場合、まずヒストグラムの傾きにより行の境目の候補の座標を推定し、その推定された候補についてさらに黒画素数による推定を行うことで、行の境目の候補の座標が決まる。
 また、ステップS52-2で行った黒画素数による推定と、ステップS52-3で行ったヒストグラムの傾きによる推定とを統合し、以下のコスト関数Cの式を判定用の式として用いることも可能である。
FIG. 8 is a detailed flowchart showing the operation of estimating the line boundary in the line boundary determination unit 6.
First, in step S52-1, ystart included in the character string area data, that is, the coordinate y at which the character string area starts is set as an initial value.
In step S52-2, the line determination threshold th1 is compared with the histogram value H (y) corresponding to the current coordinate y. If H (y) is smaller than the line determination threshold th1 (H (y) <th1), there is a high possibility that there is a line boundary at this coordinate y. In this case, the process proceeds to step S52-3. On the other hand, if H (y) is greater than or equal to the line determination threshold th1 (H (y) ≧ th1), there is a high possibility that a character string exists at this coordinate y. In this case, the process proceeds to step S52-5.
In step S52-3, the line determination threshold th2 is compared with the slope H (y) -H (y-1) of the histogram corresponding to the current coordinate y. When H (y) -H (y-1) is larger than the row determination threshold th2 (H (y) -H (y-1)> th2), the histogram has a steep slope, so this coordinate y is set. It can be estimated that there is a high possibility that there is a line boundary. In this case, the process proceeds to step S52-4. On the other hand, when H (y) −H (y−1) is equal to or less than the row determination threshold value th2 (H (y) −H (y−1) ≦ th2), a black pixel is detected at the coordinate y from the row determination threshold value th1. Although the number is small, there is a high possibility that it is not a line boundary because the slope of the histogram is gentle. In this case, the process proceeds to step S52-5.
In step S52-4, the coordinate y is stored as a coordinate that is estimated to have a line boundary, and the process proceeds to step S52-5.
After step S52-4 is completed, or in step S52-2, H (y) is determined to be equal to or higher than the row determination threshold th1, or in step S52-3, H (y) -H (y-1) is determined to be equal to or lower than the row determination threshold th2. If it is determined, the process proceeds to step S52-5, where y is incremented to the next coordinate y, and from step S52-2 until it is determined in step S52-6 that the coordinate string end of the character string area 32 of y ends. The operation up to step S52-5 is repeated.
By such an operation, one or a plurality of coordinates y estimated as a line boundary in the character string region are extracted and stored.
Then, in step S52-7, the line boundary determination unit 6 indicates that the coordinate y estimated to have a line boundary is one, and if there are a plurality of estimated coordinates adjacent to each other, Of the plurality of coordinates y, the coordinate y located at the center is determined to be the boundary of the row.
Note that the estimation based on the number of black pixels performed in step S52-2 and the estimation based on the slope of the histogram performed in step S52-3 may be interchanged. In this case, the coordinates of the candidate for the line boundary are first estimated based on the inclination of the histogram, and the candidate for the line boundary is determined by further estimating the estimated candidate based on the number of black pixels.
Further, the estimation based on the number of black pixels performed in step S52-2 and the estimation based on the histogram inclination performed in step S52-3 can be integrated, and the following expression of the cost function C can be used as a determination expression. It is.
Figure JPOXMLDOC01-appb-M000006
Figure JPOXMLDOC01-appb-M000006
 以上のように、実施の形態2に係る文字認識装置1によれば、入力された画像データから抽出した文字列領域抽出部3で抽出された文字列領域の行方向の黒画素の頻度を示すヒストグラムを生成し、生成されたヒストグラムから行判定閾値を算出し、算出された行判定閾値th1に基づいて文字列領域における異なる行の境界を推定する。さらに、ヒストグラムの傾きに基づいて行の境界を推定することにより、より行の境目を判断するための閾値を行方向全体から得られる特徴を踏まえて適切に設定され、文字列領域内の文字列を適切な行に分離することができる。 As described above, according to the character recognition device 1 according to the second embodiment, the frequency of black pixels in the row direction of the character string region extracted by the character string region extraction unit 3 extracted from the input image data is shown. A histogram is generated, a line determination threshold value is calculated from the generated histogram, and a boundary between different lines in the character string region is estimated based on the calculated line determination threshold value th1. Furthermore, by estimating the line boundary based on the slope of the histogram, the threshold value for determining the line boundary is appropriately set based on the characteristics obtained from the entire line direction, and the character string in the character string area Can be separated into appropriate lines.
実施の形態3.
 次に、実施の形態3に係る文字認識装置1について説明する。実施の形態1では、行境界判定部6の判定基準として、黒画素の頻度を対象とした行判定閾値th1を用いて行判定を行ったが、実施の形態3では黒画素の頻度に加え、ヒストグラムから検出されるピーク値P(n)を行判定基準として用いることで行境界判定を行う。
 この実施の形態3では、閾値算出部5と行境界判定部6の詳細構成及び動作が実施の形態1と異なり、他の部分は実施の形態1と同様である。
Embodiment 3 FIG.
Next, the character recognition device 1 according to Embodiment 3 will be described. In the first embodiment, the row determination is performed using the row determination threshold th1 for the frequency of black pixels as the determination criterion of the row boundary determination unit 6, but in the third embodiment, in addition to the frequency of black pixels, The line boundary determination is performed by using the peak value P (n) detected from the histogram as the line determination criterion.
In the third embodiment, the detailed configurations and operations of the threshold value calculation unit 5 and the row boundary determination unit 6 are different from those in the first embodiment, and other parts are the same as those in the first embodiment.
 閾値算出部5は、実施の形態1と同様にしてヒストグラム生成部4で生成された文字列領域内におけるヒストグラムから、行の境目を判断するための行判定閾値th1を算出する。また、閾値算出部5には、ヒストグラムのピーク値の差P(n)-P(n-1)に関する行判定閾値th3があらかじめ記憶されている。
 図9に、2行の長さが異なる場合の文字列領域のヒストグラムを生成した場合の画像及びヒストグラムの例を示す。複数の行が存在し、かつ、その長さが異なる場合、ヒストグラムのピーク値の差P(n)-P(n-1)が大きくなる。この性質を利用して、ヒストグラムのピーク値の差に関する閾値を設定することにより、異なる長さの複数の行が存在する場合に行の境目を判定することができる。
The threshold calculation unit 5 calculates a line determination threshold th1 for determining a line boundary from the histogram in the character string region generated by the histogram generation unit 4 as in the first embodiment. Further, the threshold value calculation unit 5 stores in advance a row determination threshold value th3 relating to the difference P (n) −P (n−1) between the peak values of the histogram.
FIG. 9 shows an example of an image and a histogram when a histogram of a character string region when the lengths of two lines are different is generated. When there are a plurality of rows and their lengths are different, the difference P (n) −P (n−1) in the peak value of the histogram is increased. By using this property and setting a threshold value regarding the difference between the peak values of the histogram, it is possible to determine the boundary between lines when there are a plurality of lines having different lengths.
 個別の認識対象画像それぞれのヒストグラムから求めた行判定閾値th1を用いるとともに、ヒストグラムのピーク値の差P(n)―P(n-1)に関する行判定閾値th3をも用いることで、画像ごとの特徴を踏まえた閾値、特に文字単位のような微視的な判断基準でない、行方向全体から得られる特徴に基づいた閾値を設定するだけでなく、複数行の長さがそれぞれ異なる場合にも精度のよい行の境目を推定することができる。 By using the row determination threshold th1 obtained from the histogram of each individual recognition target image and also using the row determination threshold th3 related to the peak value difference P (n) −P (n−1) of the histogram, Threshold based on features, especially not based on microscopic judgment criteria such as character units, but also based on features obtained from the entire line direction, it is also accurate when multiple lines have different lengths. It is possible to estimate the boundaries of good lines.
 行境界判定部6では、閾値算出部5によって算出された行判定閾値th1に加え、行判定閾値th3をもとに文字列領域における異なる行の境界を判定する。
 まず、行境界判定部6は、H(y)が行判定閾値th1よりも小さい場合は座標yに行の境界があると推定し、H(y)が行判定閾値th1以上である場合は座標位置yに行の境界がないと推定する。
 そして、行境界判定部6は、行境界判定部6は行の境目があると推定された座標yが1つの場合はその座標yを、行の境目があると推定された座標yが1つの場合は、複数の座標yのうち中央の位置を行の境界であると判定する。行の境界があると判定された隣り合う複数の座標yのどこを行の境界と判定するかについては、中央に限らず、隣り合う複数の座標yの中から選定されれば良い。また、行の境界があると判定された座標yが隣り合わない場合にはその座標yを行の境界であると判定する。行の境界は1つの文字列領域に1つとは限らず複数存在する場合がある。
The line boundary determination unit 6 determines a boundary between different lines in the character string area based on the line determination threshold th3 in addition to the line determination threshold th1 calculated by the threshold calculation unit 5.
First, the line boundary determination unit 6 estimates that there is a line boundary at the coordinate y when H (y) is smaller than the line determination threshold th1, and the coordinate when H (y) is greater than or equal to the line determination threshold th1. Estimate that there is no row boundary at position y.
Then, the line boundary determination unit 6 determines that the line boundary determination unit 6 has one coordinate y that is estimated to have a line boundary, and one coordinate y that is estimated to have a line boundary. In this case, the central position among the plurality of coordinates y is determined to be a line boundary. Which of the plurality of adjacent coordinates y determined to have a line boundary is determined as a line boundary is not limited to the center, and may be selected from a plurality of adjacent coordinates y. If coordinates y determined to have a line boundary are not adjacent to each other, the coordinates y are determined to be a line boundary. A line boundary is not limited to one per character string area, and there may be a plurality of lines.
 一方、行境界判定部6は、ピーク差P(n)―P(n-1)を算出する。図9に示すように、行境界判定部6は生成したヒストグラムのピークを全て検出し、ピーク値P(n)と隣り合うピーク値P(n-1)との差を算出する。そして、行境界判定部6は以下の式により、ピーク値の差P(n)―P(n-1)が行判定閾値th3よりも大きい場合はそのP(n)とP(n-1)をとるそれぞれのy座標の間に行の境界があると推定し、差P(n)―P(n-1)が行判定閾値th3よりも小さい場合はそのP(n)とP(n-1)をとるそれぞれのy座標の間に行の境界がないと推定する。ピーク値の差P(n)―P(n-1)が行判定閾値th3よりも大きい場合は、P(n)とP(n-1)の間に行の境界があると推定されるため、ピーク値P(n)をとる座標yとピーク値P(n-1)をとる座標yn-1の中央の位置を行の境界であると判定する。P(n)の座標yとP(n-1)の座標yn-1との間でどこを行の境界と判定するかについては、中央に限らず、隣り合う複数の座標yの中から選定されれば良い。 On the other hand, the row boundary determination unit 6 calculates a peak difference P (n) −P (n−1). As shown in FIG. 9, the row boundary determination unit 6 detects all the peaks of the generated histogram, and calculates the difference between the peak value P (n) and the adjacent peak value P (n−1). The row boundary determination unit 6 then calculates P (n) and P (n−1) when the peak value difference P (n) −P (n−1) is larger than the row determination threshold th3 according to the following equation. If the difference P (n) −P (n−1) is smaller than the row determination threshold th3, the P (n) and P (n− Estimate that there is no line boundary between each y coordinate taking 1). When the peak value difference P (n) -P (n-1) is larger than the row determination threshold th3, it is estimated that there is a row boundary between P (n) and P (n-1). determines the center position of the coordinate y n-1 taking coordinate y n and the peak value P (n-1) to a peak value P (n) to be the boundary line. The position of the line boundary between the coordinate y n of P (n) and the coordinate y n−1 of P (n−1) is not limited to the center, but from a plurality of adjacent coordinates y. It only has to be selected.
Figure JPOXMLDOC01-appb-M000007
Figure JPOXMLDOC01-appb-M000007
 判定した行の境界は、行の境界が存在すると判定された行の位置情報を示すものであり、行境界データとして、文字列領域データと共に文字認識部7に送られる。
 その他の構成については実施の形態1と同様である。
The determined line boundary indicates position information of a line determined to have a line boundary, and is sent to the character recognition unit 7 together with the character string area data as line boundary data.
Other configurations are the same as those in the first embodiment.
 次に、本実施の形態2に係る文字認識装置1の動作について説明する。動作についてはステップS4とステップS5の詳細動作が実施の形態1と異なる。 Next, the operation of the character recognition device 1 according to the second embodiment will be described. As for the operation, the detailed operations in step S4 and step S5 are different from those in the first embodiment.
 ステップS4では、閾値算出部5はステップS3にて生成されたヒストグラムを用いて、行の境目を判断するための行判定閾値th1を算出する。行判定閾値th1の算出方法については実施の形態1と同様である。閾値算出部5は、行判定閾値th1の他に行判定閾値th3を持つ。行判定閾値th3は、ユーザによって予め設定された固定値であり、閾値算出部5に予め記憶されている。閾値算出部5は、行判定閾値th1及びth3を行境界判定部6に送る。 In step S4, the threshold value calculation unit 5 calculates a row determination threshold value th1 for determining a line boundary using the histogram generated in step S3. The method for calculating the row determination threshold th1 is the same as in the first embodiment. The threshold calculation unit 5 has a row determination threshold th3 in addition to the row determination threshold th1. The row determination threshold th3 is a fixed value set in advance by the user, and is stored in the threshold calculation unit 5 in advance. The threshold calculation unit 5 sends the row determination thresholds th1 and th3 to the row boundary determination unit 6.
 この行境界判定部6における行の境目を推定する動作を示す詳細フローチャートを図10に示す。
 行境界閾値th1を用いた行境界判定は、実施の形態1と同様である。行境界判定部は実施の形態1で説明した図7のフローチャートによる動作を終えると、図10に記載したフローチャートによる動作を行う。
 まず、ステップS53-1にて文字列領域データに含まれるピーク値のカウント初期値としてn=1を設定する。
 次に、ステップS53-2にて行判定閾値th3と隣り合うピーク値の差P(n)-P(n-1)とを比較する。ピーク値の差P(n)-P(n-1)が行判定閾値th3よりも大きい場合(P(n)-P(n-1)>th3)はピーク値P(n)、P(n-1)をとるそれぞれのy座標の間に行の境目がある可能性が高く、この場合はステップS53-3に進む。一方、ピーク値の差P(n)-P(n-1)が行判定閾値th3以下(P(n)-P(n-1)≦th3)の場合はピーク値の差P(n)―P(n-1)からは行の有無を判断できず、この場合はステップS53-4に進む。
 ステップS53-3では、ピーク値P(n)、P(n-1)をとるそれぞれのy座標の間を、行の境界が存在すると推定される座標として記憶して、ステップS53-4に進む。
 ステップS53-3終了後、あるいはステップS53-2でP(n)-P(n-1)が行判定閾値th3以下と判断された場合は、ステップS53-4に進んで、nをインクリメントし、ステップS53-5でnがヒストグラムの最後のピーク値P(nend)のカウント値であるnendと判定されるまで、ステップS53-2からステップS53-5までの動作を繰り返す。
 このような動作により、文字列領域内において行の境目と推定される座標yが1つまたは複数抽出され、記憶される。
 そして、行境界判定部6はステップS53-6にて、行の境目があると推定された座標yが1つの場合はその座標yを、またピーク値P(n)からピーク値P(n-1)までの座標yが複数存在する場合は、複数の座標yのうち中央に位置する座標yを行の境界であると判定する。
 以上のようにして、図7のフローチャートにより行境界閾値th1に基づいた判定で得られた行境界データと、図10のフローチャートにより行境界閾値th3に基づいた判定で得られた行境界データは、文字認識部7に送られる。文字認識部7では、これら両方の行境界データを用いて、実施の形態1と同様に文字認識を行う。
 なお、行判定閾値th1を用いる図7のフローチャートの動作と行判定閾値th3を用いる図10のフローチャートの動作の順序は逆でも良い。
 このほかの動作については実施の形態1と同様である。
FIG. 10 is a detailed flowchart showing the operation of estimating the line boundary in the line boundary determination unit 6.
The line boundary determination using the line boundary threshold th1 is the same as in the first embodiment. When the operation according to the flowchart of FIG. 7 described in the first embodiment is completed, the row boundary determination unit performs the operation according to the flowchart illustrated in FIG.
First, in step S53-1, n = 1 is set as the count initial value of the peak value included in the character string area data.
Next, in step S53-2, the row determination threshold th3 is compared with the adjacent peak value difference P (n) -P (n-1). When the peak value difference P (n) −P (n−1) is larger than the row determination threshold th3 (P (n) −P (n−1)> th3), the peak values P (n) and P (n It is highly likely that there is a line boundary between each y coordinate taking -1), and in this case, the process proceeds to step S53-3. On the other hand, when the peak value difference P (n) −P (n−1) is equal to or less than the row determination threshold th3 (P (n) −P (n−1) ≦ th3), the peak value difference P (n) − The presence / absence of a row cannot be determined from P (n−1), and in this case, the process proceeds to step S53-4.
In step S53-3, the y-coordinates between the peak values P (n) and P (n-1) are stored as coordinates estimated to have a line boundary, and the process proceeds to step S53-4. .
After step S53-3 is completed, or when P (n) -P (n-1) is determined to be equal to or less than the row determination threshold th3 in step S53-2, the process proceeds to step S53-4, where n is incremented, The operation from step S53-2 to step S53-5 is repeated until it is determined in step S53-5 that n is nend which is the count value of the last peak value P (nend) of the histogram.
By such an operation, one or a plurality of coordinates y estimated as a line boundary in the character string region are extracted and stored.
In step S53-6, the line boundary determination unit 6 determines that the coordinate y estimated to have a line boundary is one, and the peak value P (n−) from the peak value P (n). When there are a plurality of coordinates y up to 1), it is determined that the coordinate y located at the center among the plurality of coordinates y is the boundary of the row.
As described above, the line boundary data obtained by the determination based on the line boundary threshold th1 by the flowchart of FIG. 7 and the line boundary data obtained by the determination based on the line boundary threshold th3 by the flowchart of FIG. It is sent to the character recognition unit 7. The character recognition unit 7 performs character recognition using both of these line boundary data as in the first embodiment.
Note that the order of the operation of the flowchart of FIG. 7 using the row determination threshold th1 and the operation of the flowchart of FIG. 10 using the row determination threshold th3 may be reversed.
Other operations are the same as those in the first embodiment.
 以上のように、実施の形態3に係る文字認識装置1によれば、入力された画像データから抽出した文字列領域抽出部3で抽出された文字列領域の行方向の黒画素の頻度を示すヒストグラムを生成し、生成されたヒストグラムから行判定閾値を算出し、算出された行判定閾値th1に基づいて文字列領域における異なる行の境界を推定する。さらに、ヒストグラムの隣り合うピーク値の差に基づいて行の境界を推定することにより、行の長さが異なる場合に、より明確に、文字列領域内の文字列を適切な行に分離することができる。 As described above, according to the character recognition device 1 according to the third embodiment, the frequency of black pixels in the row direction of the character string region extracted by the character string region extraction unit 3 extracted from the input image data is shown. A histogram is generated, a line determination threshold value is calculated from the generated histogram, and a boundary between different lines in the character string region is estimated based on the calculated line determination threshold value th1. In addition, by estimating the boundary between lines based on the difference between adjacent peak values in the histogram, it is possible to more clearly separate character strings in the character string area into appropriate lines when the line lengths are different. Can do.
 なお、本実施の形態では行判定閾値th3とピーク値の差P(n)-P(n-1)を比較して行の境界を判定した。しかし、行判定閾値th3を用いる図10のフローチャートの動作から得られた行境界データは、座標のみを示すのではなく、その座標に行の境界がある確率を示す情報も含めたデータとしてもよい。この確率は、例えば、行判定閾値th3からピーク値の差P(n)-P(n-1)を引いて行判定閾値th3で割ることによって算出する。文字認識部7では、この確率に基づいて、行境界データに示された座標を行の境目として採用するか否かを選択することが可能となる。 In this embodiment, the line boundary is determined by comparing the line determination threshold th3 and the difference P (n) −P (n−1) between the peak values. However, the line boundary data obtained from the operation of the flowchart of FIG. 10 using the line determination threshold th3 does not indicate only the coordinates but may include data indicating the probability that the line boundary exists at the coordinates. . This probability is calculated, for example, by subtracting the peak value difference P (n) −P (n−1) from the row determination threshold th3 and dividing by the row determination threshold th3. Based on this probability, the character recognition unit 7 can select whether or not to adopt the coordinates indicated in the line boundary data as the line boundary.
実施の形態4.
 次に、実施の形態4に係る文字認識装置1について説明する。実施の形態2では、行境界判定部6の判定基準として、黒画素の頻度を対象とした行判定閾値th1及びヒストグラムの傾きg(y)を行判定基準として用いることで行境界判定を行ったが、本実施の形態ではヒストグラムの傾きg(y)のみを行判定基準として行の境界を判定する。
 この実施の形態4では、閾値算出部5と行境界判定部6の詳細構成及び動作が実施の形態2と異なり、他の部分は実施の形態2と同様である。
Embodiment 4 FIG.
Next, the character recognition device 1 according to Embodiment 4 will be described. In the second embodiment, the line boundary determination is performed by using the line determination threshold th1 for the frequency of black pixels and the gradient g (y) of the histogram as the line determination reference as the determination criterion of the line boundary determination unit 6. However, in the present embodiment, a line boundary is determined using only the gradient g (y) of the histogram as a line determination criterion.
In the fourth embodiment, the detailed configurations and operations of the threshold value calculation unit 5 and the row boundary determination unit 6 are different from those in the second embodiment, and other parts are the same as those in the second embodiment.
 閾値算出部5は、ヒストグラムの傾きg(y)に関する行判定閾値th2があらかじめ記憶されている。ヒストグラムの傾きg(y)は、dH(y)/dyである。ヒストグラムの傾きg(y)は、行の境目では急になるので数値が大きくなり、文字が存在する領域では緩やかになるので数値が小さくなるため、ヒストグラムの傾きに関する閾値を設定することにより行の境目を判定することができる。 The threshold value calculation unit 5 stores in advance a row determination threshold value th2 related to the gradient g (y) of the histogram. The slope g (y) of the histogram is dH (y) / dy. Since the slope of the histogram g (y) becomes steep at the boundary of the line, the numerical value becomes large. In the area where the character exists, the numerical value becomes small and becomes small. Therefore, by setting a threshold value for the slope of the histogram, Can determine the boundary.
 行境界判定部6では、行判定閾値th2をもとに文字列領域における異なる行の境界を判定する。
 まず、行境界判定部6では、H(y)-H(y-1)がth2よりも大きい場合に座標yに行境界があると推定し、H(y)-H(y-1)がth2よりも小さい場合は座標yに行の境界がないと推定する。
The line boundary determination unit 6 determines a boundary between different lines in the character string area based on the line determination threshold th2.
First, the line boundary determination unit 6 estimates that there is a line boundary at the coordinate y when H (y) −H (y−1) is greater than th2, and H (y) −H (y−1) is If it is smaller than th2, it is estimated that there is no line boundary in the coordinate y.
Figure JPOXMLDOC01-appb-M000008
 ヒストグラムの傾きg(y)は、H(y)とH(y―1)の差分値として仮定して算出することができる。
Figure JPOXMLDOC01-appb-M000008
The slope g (y) of the histogram can be calculated assuming that it is a difference value between H (y) and H (y−1).
Figure JPOXMLDOC01-appb-M000009
Figure JPOXMLDOC01-appb-M000009
 そして、行境界判定部6は行の境目があると推定された座標yが1つの場合はその座標yを、行の境界があると推定された座標yが複数隣り合って存在する場合は、複数の座標yのうち中央の位置を行の境界であると判定する。行の境界があると判定された隣り合う複数の座標yのどこを行の境界と判定するかについては、中央に限らず、隣り合う複数の座標yの中から選定されれば良い。また、行の境界があると判定された座標yが隣り合わない場合にはその座標yを行の境界であると判定する。行の境界は1つの文字列領域に1つとは限らず複数存在する場合がある。
 判定した行の境界は、行の境界が存在すると判定された行の位置情報を示すものであり、行境界データとして、文字列領域データと共に文字認識部に送られる。
 その他の構成については実施の形態1と同様である。
Then, the line boundary determination unit 6 indicates that there is one coordinate y estimated to have a line boundary, and when there are a plurality of adjacent coordinates y that are estimated to have a line boundary, It is determined that the center position among the plurality of coordinates y is the boundary of the row. Which of the plurality of adjacent coordinates y determined to have a line boundary is determined as a line boundary is not limited to the center, and may be selected from a plurality of adjacent coordinates y. If coordinates y determined to have a line boundary are not adjacent to each other, the coordinates y are determined to be a line boundary. A line boundary is not limited to one per character string area, and there may be a plurality of lines.
The determined line boundary indicates the position information of the line determined to have a line boundary, and is sent to the character recognition unit together with the character string area data as line boundary data.
Other configurations are the same as those in the first embodiment.
 次に、本実施の形態4に係る文字認識装置1の動作について説明する。動作についてはステップS4とステップS5の詳細動作が実施の形態1と異なる。 Next, the operation of the character recognition device 1 according to the fourth embodiment will be described. As for the operation, the detailed operations in step S4 and step S5 are different from those in the first embodiment.
 ステップS4では、閾値算出部5は、行判定閾値th2を行境界判定部6に送る。行判定閾値th2は、ユーザによって予め設定された固定値であり、閾値算出部5に予め記憶されている。 In step S4, the threshold calculation unit 5 sends the row determination threshold th2 to the row boundary determination unit 6. The row determination threshold value th2 is a fixed value set in advance by the user, and is stored in the threshold value calculation unit 5 in advance.
 この行境界判定部6における行の境目を推定する動作を示す詳細フローチャートを図11に示す。
 まず、ステップS54-1にて文字列領域データに含まれるystartすなわち文字列領域が始まる座標yを初期値として設定する。
 次に、ステップS54-2にて行判定閾値th2と現在の座標yに対応するヒストグラムの傾きH(y)―H(y―1)とを比較する。H(y)―H(y―1)が行判定閾値th2よりも大きい場合(H(y)―H(y―1)>th2)は、ヒストグラムの傾きが急であることからこの座標yに行の境目がある可能性が高いと推定でき、この場合はステップS54-3に進む。一方、H(y)―H(y―1)が行判定閾値th2以下(H(y)―H(y―1)≦th2)の場合は、この座標yには行判定閾値th2より黒画素数が少ないものの、ヒストグラムの傾きが緩やかであることから、行の境目ではない可能性が高く、この場合はステップS54-4に進む。
 ステップS54-3では、座標yを行の境界が存在すると推定される座標として記憶して、ステップS54-4に進む。
 ステップS54-3終了後、あるいはステップS54-2でP(n)―P(n-1)が行判定閾値th2以下と判断された場合は、ステップS54-4に進み、yをインクリメントして次の座標yとし、ステップS54-5でyの文字列領域32が終わる座標yendと判定されるまで、ステップS54-2からステップS54-5までの動作を繰り返す。
 このような動作により、文字列領域内において行の境目と推定される座標yが1つまたは複数抽出され、記憶される。
 そして、行境界判定部6はステップS54-6にて、行の境目があると推定された座標yが1つの場合はその座標yを、また推定された座標が複数隣り合って存在する場合は、複数の座標yのうち中央に位置する座標yを行の境界であると判定する。
 このほかの動作については実施の形態1と同様である。
FIG. 11 shows a detailed flowchart showing the operation of estimating the line boundary in the line boundary determination unit 6.
First, in step S54-1, ystart included in the character string area data, that is, the coordinate y at which the character string area starts is set as an initial value.
In step S54-2, the line determination threshold th2 is compared with the slope H (y) -H (y-1) of the histogram corresponding to the current coordinate y. When H (y) -H (y-1) is larger than the row determination threshold th2 (H (y) -H (y-1)> th2), the histogram has a steep slope, so this coordinate y is set. It can be estimated that there is a high possibility that there is a line boundary. In this case, the process proceeds to step S54-3. On the other hand, when H (y) −H (y−1) is equal to or less than the row determination threshold th2 (H (y) −H (y−1) ≦ th2), a black pixel is detected at the coordinate y from the row determination threshold th2. Although the number is small, the slope of the histogram is gentle, so there is a high possibility that it is not a line boundary. In this case, the process proceeds to step S54-4.
In step S54-3, the coordinate y is stored as a coordinate that is estimated to have a line boundary, and the process proceeds to step S54-4.
After step S54-3, or when it is determined in step S54-2 that P (n) -P (n-1) is equal to or less than the row determination threshold th2, the process proceeds to step S54-4, and y is incremented and the next The operation from step S54-2 to step S54-5 is repeated until it is determined in step S54-5 that the coordinate yend is the end of the character string area 32 of y.
By such an operation, one or a plurality of coordinates y estimated as a line boundary in the character string region are extracted and stored.
In step S54-6, the line boundary determination unit 6 determines that there is one coordinate y estimated to have a line boundary, and if there are a plurality of estimated coordinates adjacent to each other. The coordinate y located at the center among the plurality of coordinates y is determined to be a line boundary.
Other operations are the same as those in the first embodiment.
 以上のように、実施の形態4に係る文字認識装置1によれば、入力された画像データから抽出した文字列領域抽出部3で抽出された文字列領域の行方向の黒画素の頻度を示すヒストグラムを生成し、生成されたヒストグラムからヒストグラムの傾きに基づいて行の境界を推定することにより、行の境目を判断するための閾値を行方向全体から得られる特徴を踏まえて適切に設定され、文字列領域内の文字列を適切な行に分離することができる。 As described above, according to the character recognition device 1 according to the fourth embodiment, the frequency of black pixels in the row direction of the character string region extracted by the character string region extraction unit 3 extracted from the input image data is shown. By generating a histogram and estimating the line boundary from the generated histogram based on the slope of the histogram, the threshold for determining the line boundary is appropriately set based on the characteristics obtained from the entire line direction, The character string in the character string area can be separated into appropriate lines.
 なお、上記実施の形態はいずれも、二値化処理部2にて二値化データを生成して用いているが、対象データは二値化データに限らず、文字部分と行の境界部分が区別できるデータであれば、例えば画素を多値で表す多値化データ、あるいは色度を示すデータを用いることも可能である。 In any of the above embodiments, the binarization processing unit 2 generates and uses binarized data, but the target data is not limited to the binarized data, and the boundary between the character part and the line is As long as the data can be distinguished, it is also possible to use, for example, multi-value data representing pixels in multiple values, or data indicating chromaticity.
 また、上記実施の形態はいずれも、文字式領域抽出部にて文字列領域データは二値化データを含むものとした。しかし、二値化データは二値化処理部2からそれぞれ二値化データを必要とする各部へ直接送ってもよい。二値化データに限らず、他のデータも各部からそのデータを必要とする各部へ直接送ってよい。 In any of the above embodiments, the character string region data includes binarized data in the character region extraction unit. However, the binarized data may be sent directly from the binarization processing unit 2 to each unit that requires the binarized data. Not only binarized data but also other data may be sent directly from each unit to each unit that requires the data.
 また、上記実施の形態はいずれも、行境界判定部6にて行の境目があると推定された座標yが複数隣り合って存在する場合のみ、複数の座標yのうち中央の位置を行の境界であると判定した。しかし、一定の範囲を設け、行の境目があると推定された座標yが一定の範囲内であるならば座標yが隣り合って存在すると推定し、中央の位置を行の境界であると判定してもよい。 In any of the above embodiments, the center position of the plurality of coordinates y is set to the line position only when there are a plurality of adjacent coordinates y that are estimated to be line boundaries by the line boundary determination unit 6. Judged to be a boundary. However, if a certain range is provided and the coordinates y estimated to have a line boundary are within the certain range, it is estimated that the coordinates y are adjacent to each other, and the center position is determined to be the line boundary. May be.
 なお、上記実施の形態はいずれも、図12及び図13に示す構成において実現される。図12は、実施の形態1に係る文字認識装置をハードウェアで実現するためのハードウェア構成図である。スキャナやカメラで構成される画像取り込み装置8で画像の入力が行われる。二値化処理部2、文字列抽出部3、ヒストグラム生成部4、閾値算出部5、行境界判定部6及び文字認識部7は処理回路9により実現される。処理回路9は、例えば単一回路、複合回路、プログラム化したプロセッサ、並列プログラム化したプロセッサまたはこれらを組み合わせた各種電子回路で実現してもよい。ディスプレイ10は処理の途中経過などを表示するものである。また、それぞれのプログラムはハードディスク11に格納される。
 図13は、実施の形態1に係る文字認識装置をソフトウェアで実現する場合のハードウェア構成図である。このように文字認識装置1をコンピュータで構成した場合は、プロセッサ12を二値化処理部2、文字列抽出部3、ヒストグラム生成部4、閾値算出部5、行境界判定部6及び文字認識部7として機能させる。プロセッサ12は、ソフトウェアやファームウェア、またはソフトウェアとファームウェアとの組み合わせにより実現される。これらのソフトウェアやファームウェアはハードディスク11に格納され、実行する際にはハードディスク11からメモリ13に取り出されることにより機能する。

Note that all of the above embodiments are realized in the configuration shown in FIGS. FIG. 12 is a hardware configuration diagram for realizing the character recognition apparatus according to the first embodiment by hardware. An image is input by an image capturing device 8 including a scanner and a camera. The binarization processing unit 2, the character string extraction unit 3, the histogram generation unit 4, the threshold value calculation unit 5, the line boundary determination unit 6, and the character recognition unit 7 are realized by a processing circuit 9. The processing circuit 9 may be realized by, for example, a single circuit, a composite circuit, a programmed processor, a parallel programmed processor, or various electronic circuits combining these. The display 10 displays the progress of the process. Each program is stored in the hard disk 11.
FIG. 13 is a hardware configuration diagram when the character recognition apparatus according to the first embodiment is realized by software. When the character recognition device 1 is configured as a computer as described above, the processor 12 is converted into the binarization processing unit 2, the character string extraction unit 3, the histogram generation unit 4, the threshold value calculation unit 5, the line boundary determination unit 6, and the character recognition unit. Function as 7. The processor 12 is realized by software, firmware, or a combination of software and firmware. These software and firmware are stored in the hard disk 11 and function by being taken out from the hard disk 11 to the memory 13 when executed.

1.文字認識装置
2.二値化処理部
3.文字列領域抽出部
3a.二値化画像
3b文字列領域
4.ヒストグラム生成部
4a.行境界領域
5.閾値算出部
5a.行判定難領域
5b.行判定容易領域
6.行境界判定部
7.文字認識部
8.画像取り込み装置
9.処理回路
10.ディスプレイ
11.ハードディスク
1. 1. Character recognition device 2. binarization processing unit; Character string region extraction unit 3a. 3. binarized image 3b character string area Histogram generator 4a. 4. Line boundary region Threshold calculation unit 5a. Line determination difficult area 5b. Line determination easy area 6. 6. Line boundary determination unit Character recognition unit 8. 8. Image capturing device Processing circuit 10. Display 11. hard disk

Claims (6)

  1.  入力された画像データから文字列領域を抽出する文字列領域抽出部と、
     前記文字列領域抽出部で抽出された前記文字列領域の行方向の黒画素の頻度を示すヒストグラムを生成するヒストグラム生成部と、
     前記ヒストグラム生成部で生成された前記ヒストグラムから行判定閾値を算出する閾値算出部と、
     前記閾値算出部で算出された前記行判定閾値に基づいて前記文字列領域における異なる行の境界を判定する行境界判定部と、
     前記文字列領域抽出部で抽出された前記文字列領域と前記行境界判定部で判定された行の境界に基づいて前記文字列領域の文字を認識する文字認識部と、
    を備えた文字認識装置。
    A character string region extraction unit that extracts a character string region from input image data;
    A histogram generation unit that generates a histogram indicating the frequency of black pixels in the row direction of the character string region extracted by the character string region extraction unit;
    A threshold calculation unit that calculates a row determination threshold from the histogram generated by the histogram generation unit;
    A line boundary determination unit that determines a boundary between different lines in the character string region based on the line determination threshold calculated by the threshold calculation unit;
    A character recognition unit that recognizes characters in the character string region based on a boundary between the character string region extracted by the character string region extraction unit and a line determined by the line boundary determination unit;
    A character recognition device.
  2.  前記閾値算出部により前記ヒストグラムから算出される前記行判定閾値は、黒画素の頻度に対する閾値であることを特徴とする請求項1に記載の文字認識装置。 The character recognition device according to claim 1, wherein the line determination threshold calculated from the histogram by the threshold calculation unit is a threshold for the frequency of black pixels.
  3.  前記黒画素の頻度の閾値は、前記ヒストグラムのピーク値に係数をかけて求めることを特徴とする請求項2に記載の文字認識装置。 3. The character recognition device according to claim 2, wherein the black pixel frequency threshold is obtained by multiplying a peak value of the histogram by a coefficient.
  4.  前記閾値算出部により前記ヒストグラムから算出される前記行判定閾値は、前記ヒストグラムの傾きに対する閾値であることを特徴とする請求項1に記載の文字認識装置。 The character recognition device according to claim 1, wherein the line determination threshold calculated from the histogram by the threshold calculation unit is a threshold with respect to an inclination of the histogram.
  5.  前記閾値算出部により前記ヒストグラムから算出される前記行判定閾値は、前記ヒストグラムの複黒画素の頻度の数のピーク値の差分に対する閾値であることを特徴とする請求項1に記載の文字認識装置。 The character recognition device according to claim 1, wherein the line determination threshold calculated from the histogram by the threshold calculation unit is a threshold for a difference between peak values of the number of frequencies of double black pixels in the histogram. .
  6.  入力された画像データから文字列領域を抽出する文字列領域抽出ステップと、
     前記文字列領域抽出ステップで抽出された前記文字列領域の行方向の黒画素の頻度を示すヒストグラムを生成するヒストグラム生成ステップと、
     前記ヒストグラム生成ステップで生成された前記ヒストグラムから行判定閾値を算出する閾値算出ステップと、
     前記閾値算出ステップで算出された前記行判定閾値に基づいて前記文字列領域における異なる行の境界を判定する行境界判定ステップと、
     前記文字列領域抽出ステップで抽出された前記文字列領域と前記行境界判定ステップで判定された行の境界に基づいて前記文字列領域の文字を認識する文字認識ステップと、
    を備えた文字認識方法。
    A character string region extraction step for extracting a character string region from the input image data;
    A histogram generation step of generating a histogram indicating the frequency of black pixels in the row direction of the character string region extracted in the character string region extraction step;
    A threshold calculation step for calculating a row determination threshold from the histogram generated in the histogram generation step;
    A line boundary determination step for determining a boundary between different lines in the character string region based on the line determination threshold calculated in the threshold calculation step;
    A character recognition step for recognizing characters in the character string region based on a boundary between the character string region extracted in the character string region extraction step and a line boundary determined in the line boundary determination step;
    A character recognition method.
PCT/JP2015/083948 2015-12-03 2015-12-03 Character recognition device and character recognition method WO2017094156A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2017553564A JP6493559B2 (en) 2015-12-03 2015-12-03 Character recognition device and character recognition method
PCT/JP2015/083948 WO2017094156A1 (en) 2015-12-03 2015-12-03 Character recognition device and character recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2015/083948 WO2017094156A1 (en) 2015-12-03 2015-12-03 Character recognition device and character recognition method

Publications (1)

Publication Number Publication Date
WO2017094156A1 true WO2017094156A1 (en) 2017-06-08

Family

ID=58796627

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2015/083948 WO2017094156A1 (en) 2015-12-03 2015-12-03 Character recognition device and character recognition method

Country Status (2)

Country Link
JP (1) JP6493559B2 (en)
WO (1) WO2017094156A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06195462A (en) * 1992-12-22 1994-07-15 Fujitsu Ltd Angle of inclination of image measuring system
JP2004171560A (en) * 2002-11-21 2004-06-17 Nariyuki Motoi Composited image providing system, composited image generation program, information processor and data carrier
JP2004341977A (en) * 2003-05-19 2004-12-02 Mitsubishi Electric Corp Character recognition device and portable information terminal
JP2007165983A (en) * 2005-12-09 2007-06-28 Nippon Telegr & Teleph Corp <Ntt> Metadata automatic generating apparatus, metadata automatic generating method, metadata automatic generating program, and recording medium for recording program

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06195462A (en) * 1992-12-22 1994-07-15 Fujitsu Ltd Angle of inclination of image measuring system
JP2004171560A (en) * 2002-11-21 2004-06-17 Nariyuki Motoi Composited image providing system, composited image generation program, information processor and data carrier
JP2004341977A (en) * 2003-05-19 2004-12-02 Mitsubishi Electric Corp Character recognition device and portable information terminal
JP2007165983A (en) * 2005-12-09 2007-06-28 Nippon Telegr & Teleph Corp <Ntt> Metadata automatic generating apparatus, metadata automatic generating method, metadata automatic generating program, and recording medium for recording program

Also Published As

Publication number Publication date
JP6493559B2 (en) 2019-04-03
JPWO2017094156A1 (en) 2018-02-08

Similar Documents

Publication Publication Date Title
JP5844783B2 (en) Method for processing grayscale document image including text region, method for binarizing at least text region of grayscale document image, method and program for extracting table for forming grid in grayscale document image
KR100512831B1 (en) Image processing method, apparatus and program storage medium
CN109543501B (en) Image processing apparatus, image processing method, and storage medium
JP4522468B2 (en) Image discrimination device, image search device, image search program, and recording medium
JP5934762B2 (en) Document modification detection method by character comparison using character shape characteristics, computer program, recording medium, and information processing apparatus
TW201437925A (en) Object identification device, method, and storage medium
WO2014131339A1 (en) Character identification method and character identification apparatus
JP6268023B2 (en) Character recognition device and character cutout method thereof
JP4100885B2 (en) Form recognition apparatus, method, program, and storage medium
US10455163B2 (en) Image processing apparatus that generates a combined image, control method, and storage medium
JP2014131278A (en) Method of authenticating printed document
JP6177541B2 (en) Character recognition device, character recognition method and program
JP2011248702A (en) Image processing device, image processing method, image processing program, and program storage medium
JP6338429B2 (en) Subject detection apparatus, subject detection method, and program
US9167129B1 (en) Method and apparatus for segmenting image into halftone and non-halftone regions
JP6542230B2 (en) Method and system for correcting projected distortion
US6269186B1 (en) Image processing apparatus and method
JP5984880B2 (en) Image processing device
JP2021111228A (en) Learning device, learning method, and program
JP6493559B2 (en) Character recognition device and character recognition method
JP2010225047A (en) Noise component removing device, and medium with noise component removing program recorded thereon
JP6580201B2 (en) Subject detection apparatus, subject detection method, and program
JP2010250387A (en) Image recognition device and program
JP2019021085A (en) Image processing program, image processing method, and image processing device
JP2014127763A (en) Image processing apparatus, image processing method, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15909786

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2017553564

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15909786

Country of ref document: EP

Kind code of ref document: A1