WO2017094156A1

WO2017094156A1 - Character recognition device and character recognition method

Info

Publication number: WO2017094156A1
Application number: PCT/JP2015/083948
Authority: WO
Inventors: 裕介伊谷
Original assignee: 三菱電機株式会社
Priority date: 2015-12-03
Filing date: 2015-12-03
Publication date: 2017-06-08
Also published as: JP6493559B2; JPWO2017094156A1

Abstract

According to the present invention, a character recognition device generates a histogram representing the number of black pixels in each pixel row in a character string region that is extracted from input image data by a character string region extraction unit, then calculates a row determination threshold value from the generated histogram, and determines the boundaries between adjacent rows of characters in the character string region on the basis of the calculated row determination threshold value. Thus, according to the present invention, the threshold value for determining the boundaries between adjacent rows of characters is appropriately set taking into account a feature of each pixel row, making it possible to appropriately divide the character string region into rows comprising individual character strings.

Description

Character recognition device and character recognition method

The present invention relates to a character recognition device and a character recognition method.

There is a demand to digitize and store documents created on paper as image data by reading them with a scanner, for example, and when using these digitized documents, the documents can be searched with keywords. There is a document retrieval and management system. When a document or the like is digitized in such a system, a character recognition technique is used to automatically recognize characters in the document and convert them into keywords. In the conventional character recognition technology, first, specify where the character string exists in the image data in a rough range, determine whether there are different lines in the range, and if there are multiple lines The character recognition accuracy was improved by performing the process of separating lines appropriately.

In the character recognition device of Patent Document 1, a histogram indicating the frequency of black pixels in the row direction is created from the binarized image data, and a position that becomes a valley of the histogram is estimated as a position between lines that is a candidate for a line boundary. . In addition, the character recognition device estimates a character rectangular area in which a character area that is supposed to be recognized as one character is indicated by a rectangle, and calculates a certainty factor C indicating the character of the image in each character rectangular area. If the estimated line spacing does not divide the character rectangle having the high certainty factor C, the line spacing is determined as the boundary between the lines, thereby separating the lines.

JP 2009-211142 A

In the line segmentation method as described above, the certainty factor C indicating the character character of the image in the character rectangular area estimated when the line boundary is determined is used. However, when the line spacing is narrow, the certainty factor C is used. If a high-character rectangular region straddles between lines, the line spacing position cannot be determined as a line boundary, and the line cannot be divided appropriately.

The present invention has been made to solve the above-described problem, and an object of the present invention is to obtain a character recognition device 1 that can accurately separate lines.

In the character recognition device according to the present invention, a histogram indicating the frequency of black pixels in the row direction of the character string region extracted by the character string region extraction unit extracted from the input image data is generated, and the generated histogram is used. A line determination threshold is calculated, and a boundary between different lines in the character string region is determined based on the calculated line determination threshold.

According to the present invention, the character string region is extracted from the image data, and the line determination threshold value is set from the histogram indicating the frequency of black pixels in the line direction. Therefore, the threshold value for determining the line boundary is obtained from the entire line direction. The character string in the character string area can be separated into appropriate lines.

It is a block diagram of the character recognition apparatus which concerns on Embodiment 1. FIG. It is explanatory drawing which shows the example of the operation | movement which the character string area | region extraction part of the character recognition apparatus concerning this Embodiment 1 extracts a character string area | region. (A) is explanatory drawing which shows the example of the character type area | region which concerns on Embodiment 1, (b) is explanatory drawing which shows the example of the histogram which the histogram generation part of the character recognition apparatus which concerns on Embodiment 1 produces | generates. It is explanatory drawing of the example of an image in case it is useful for the character recognition apparatus which concerns on Embodiment 1 to change the weighting coefficient (rho) for every area | region. 4 is a flowchart illustrating an operation of the character recognition device according to the first embodiment. 6 is a detailed flowchart illustrating an operation in which a threshold value calculation unit of the character recognition device according to Embodiment 1 calculates a line determination threshold value. 6 is a detailed flowchart illustrating an operation of estimating a line boundary in a line boundary determination unit of the character recognition device according to the first embodiment. 10 is a detailed flowchart illustrating an operation of estimating a line boundary in a line boundary determination unit of the character recognition device according to the second embodiment. It is explanatory drawing of the example at the time of producing | generating the histogram of the character string area | region when the length of 2 lines which concerns on Embodiment 3 differs. 12 is a detailed flowchart illustrating an operation of estimating a line boundary in a line boundary determination unit of the character recognition device according to the third embodiment. 14 is a detailed flowchart illustrating an operation of estimating a line boundary in a line boundary determination unit of the character recognition device according to the fourth embodiment. It is a hardware block diagram for implement | achieving the character recognition apparatus which concerns on Embodiment 1 with hardware. It is a hardware block diagram in the case of implement | achieving the character recognition apparatus which concerns on Embodiment 1 with software.

Embodiment 1 FIG.
FIG. 1 is a configuration diagram of a character recognition device 1 according to the first embodiment. The character recognition device 1 includes: a binarization processing unit 2 that binarizes image data to generate binarized data; and a character string region data by extracting a character string region based on the binarized data A character string region extraction unit 3 that generates a histogram, a histogram generation unit 4 that generates a histogram indicating the frequency of black pixels in the row direction from the binarized data and the character string region data, and a row determination threshold value calculated from the histogram A threshold value calculating unit 5, a line boundary determining unit 6 that determines a boundary between different lines in the character string region using the line determination threshold value, and a character string based on the line boundary determined by the line boundary determining unit 6 And a character recognition unit 7 for recognizing characters in the area.

Here, the binarization processing unit 2 performs binarization processing on image data sent from an image capturing device such as a scanner or a camera to generate binarized data. The binarization process is a process for converting a grayscale image into a two-tone image of white and black. For example, if the pixel value at a bit position in the binarized data is α (x, y), α (x, y) is converted to 1 when it is a black pixel and 0 when it is a white pixel. Specifically, a certain threshold value is determined, and processing is performed in which if the value of each pixel is above the threshold value, it is replaced with white, and if it is below, it is replaced with black. As described above, the binarization processing method is not limited to the method of setting one threshold value, and the image is divided into a plurality according to the luminance range, and binarization is performed by providing different threshold values for each region. May be. For character recognition, there is also a binarization process in which the margin of image data including the background is converted into white pixels, and characters, ruled lines, symbol figures, etc. other than the margin are converted into black pixels. The image data to be binarized to be input to the binarization processing unit 2 may be any data that is represented in various formats representing images and characters. For example, JPEG (Joint Photographic Experts Group), TIFF ( This is an image represented in a data format related to images and characters such as Tagged Image File Format) and BMP (Bit MaP).

The character string area extraction unit 3 extracts a character string area from the binarized data generated by the binarization processing unit 2. A character string refers to a set of black pixels of binarized data estimated to have characters. The extraction of the character string region means that when a set of black pixels is detected in the binarized data, it is estimated that there is a character around the x-axis, which is a range including the detected set of black pixels. A rectangular area including position information from the first xstart to the last xend on the upper side and from the first ystart to the last yend on the y-axis is extracted as a character string area. The character string area extraction unit 3 extracts a character string area, and displays data indicating a rectangular range including position information of the extracted character string area, that is, coordinate data and binarized data of the xstart, xend, ystart, and yend. Generated as character string area data.

FIG. 2 is an explanatory diagram illustrating an example of an operation in which the character string region extraction unit 3 of the character recognition device 1 according to the first embodiment extracts the character string region 3b. When the binarized data generated by the binarization processing unit 2 represents the binarized image 3a as shown in FIG. 2, the character string region extraction unit 3 includes black in the binarized image 3a. A set of pixels is detected, a rectangular range including the detected set of black pixels is extracted as a character string region 3b, and data indicating the rectangular range of the extracted character string region 3b is generated as character region data. The character string area data is data indicating, for example, the first coordinate xstart from the first coordinate xstart on the x axis and the last coordinate yend from the first coordinate ystart on the y axis of the character string area 3b in the binarized image 3a. . The extraction method of the character string region 3b described above is an example, and other methods may be used as long as the character string region is extracted.
Further, the character string region extraction unit 3 estimates the direction of the line in the extracted character string region from the shape of the extracted character string region 3b or the attribute information of the binarized image.

The histogram generation unit 4 generates a histogram using the character string region extracted by the character string region extraction unit 3. Specifically, when the histogram generation unit 4 extracts a character string region from the binarized data generated by the binarization processing unit 2 and the character string region data generated by the character string region extraction unit 3, A histogram indicating the frequency of black pixels appearing in the estimated row direction is generated. FIG. 3A is an explanatory diagram illustrating an example of a character expression area, and FIG. 3B is an explanatory diagram illustrating an example of a histogram generated by the histogram generation unit 4 of the character recognition device 1 according to the first embodiment. In FIG. 3A, the x-axis direction of the binarized data and the x-axis direction of the character string area 3b match, but the estimated line direction of the character expression area matches the x-axis of the binarized data. You may do it. The histogram generation unit 4 determines the total number of black pixels in the x-axis direction for each coordinate of the y-axis pixel unit, and generates a histogram. An example of an expression for generating this histogram H (y) is shown below. α (x, y) represents the value of the binarized data at the coordinates (x, y), and is 1 when it is a black pixel and 0 when it is a white pixel. H (y), which is a function of the coordinate y, is the total number of black pixels in the range from the first xstart to the last xend of the coordinates in the row direction (= x-axis direction) of the character string region at each coordinate y.

In the example of FIG. 3, in the portion of the character string region 3b shown in FIG. 3 (a), the black pixel frequency in the y-axis direction is high in the histogram of FIG. Then, it can be seen that the black pixel frequency in the y-axis direction is as small as about 20.

The threshold value calculation unit 5 calculates a threshold value for determining a line boundary from the histogram in the character string region generated by the histogram generation unit 4. When a character is present, the value of H (y) is large, and H (y) is small at the line boundary. Therefore, the line boundary can be determined from the value of H (y). By obtaining a threshold for this determination from the histogram of each individual recognition target image, a threshold based on the characteristics of each image, in particular, a characteristic obtained from the entire row direction that is not a microscopic criterion such as character units. It is possible to set a threshold based on In the first embodiment, the peak value P of the histogram generated by the histogram generation unit 4 is detected, and the row determination threshold th1 is calculated from the peak value P. An expression used when the threshold calculation unit 5 calculates the row determination threshold th1 is shown below. Here, ρ is a parameter indicating a weighting factor.

The weighting factor ρ is a parameter that is set by the user or automatically. When the line spacing of the character string in the image to be processed is narrow, the difference in the value of H (y) is small between the position where the character exists and the estimated position of the line, and the H ( Since the value of y) may not be sufficiently small, it is necessary to set the threshold value higher for an appropriate determination. For this purpose, the weighting factor ρ is increased. On the other hand, if the line spacing is wide, even if there is a part where H (y) is somewhat lower due to variations in H (y) where there is a character, the threshold is lowered so that it is not considered as a line boundary. For this purpose, the weighting factor ρ is reduced. The setting of the weighting factor ρ may be constant for the entire image, or the image may be divided into regions and changed for each region. As a method of dividing the area when the weighting coefficient ρ is changed for each area, for example, there are an area surrounded by a ruled line and an area surrounded by a blank. In order to automatically detect these areas, it is possible to perform ruled line detection, blank detection, symbol detection, and the like, and there are various conventional methods. For example, there is a method of Reference 1 below for ruled line detection and blank detection, and a method of Reference 2 below for symbol detection.
Reference 1: Takashi Hirano, Yasuhiro Okada, Fumio Yoda, “Rule Extraction Method from Document Images”, IEICE General Conference, March 1998 Reference 2: Noboru Yoneyama, Takashi Hirano, Yasuhiro Okada, “Drawing “Examination of Image Symbol Extraction Method”, IEICE General Conference, March 2006

FIG. 4 shows an example of an image when it is useful to change the weighting factor ρ for each region. In the case of the image as shown in FIG. 4, the line determination difficulty region 5a, in which the line space is narrowed and it is estimated that the line boundary is difficult to determine, and the line determination is easy to determine that the line boundary is wide and the line boundary is easily determined. There is a region 5b. In this case, the user previously increases the weighting factor ρ of the row determination difficult region 5a and sets the weighting factor ρ of the row determination easy region 5b small. By setting such a weighting factor ρ, it is possible to set an appropriate line determination threshold according to an area such as an area where it is difficult to determine a line boundary according to the width of a character line. The weighting factor ρ may be automatically set based on the frequency of black pixels in the region and the tendency of distribution.
In the example of FIG. 3 described above, the case where the peak value P of the histogram is 102 and the weight coefficient ρ is 0.22, the result is that the row determination threshold th1 is 22.

The line boundary determination unit 6 determines a boundary between different lines in the character string area based on the line determination threshold th1 calculated by the threshold calculation unit 5. The determined line boundary indicates position information of a line determined to have a line boundary. The determination condition that the line boundary determination unit 6 uses as a line boundary is the following expression. When H (y) is smaller than the line determination threshold th1, it is estimated that there is a line boundary at the coordinate y, and when H (y) is greater than or equal to the line determination threshold th1, it is estimated that there is no line boundary at the coordinate y. To do.

If there is one coordinate y estimated to have a line boundary, the coordinate y is present. If there are a plurality of estimated coordinates adjacent to each other, the coordinate y located at the center among the plurality of coordinates y. Is a line boundary. Which of the plurality of adjacent coordinates y determined to have a line boundary is determined as a line boundary is not limited to the center, and may be selected from a plurality of adjacent coordinates y. If coordinates y determined to have a line boundary are not adjacent to each other, the coordinates y are determined to be a line boundary. The line boundary is not limited to one in one character string area, and there may be a plurality of line boundaries.
The character recognition unit 7 performs character recognition processing in the character string region based on the line boundary determination determined by the line boundary determination unit 6 and the character string region extracted by the character string region extraction unit 3. Conventionally, there are various methods for performing character recognition. For example, there is a technique described in the following reference in which robustness against image degradation is improved by using run length correction.
Reference 3: Minoru Mori, Minako Sawaki, Norihiro Hamada, Hiroshi Murase, Naoki Takekawa, “Robust Feature Extraction for Image Degradation Using Run-Length Correction”, IEICE Transactions, Vol. J86-D2, No. 7, pp. 1049-1057, July 2003 When the character recognition processing is completed, the character recognition unit 7 outputs a character recognition result. The above is the configuration related to the character recognition device 1.

Next, the operation of the character recognition device 1 according to the first embodiment will be described. FIG. 5 is a flowchart showing the operation of the character recognition device 1 according to the present embodiment.

First, in step S1, the binarization processing unit 2 performs binarization processing on image data to generate binarized data. The generated binarized data is sent to the character string region extraction unit 3.

In step S2, the character string region extraction unit 3 extracts a character string region from the binarized data generated in step S1, and generates character string region data indicating the extracted character string region. In addition, the character string region extraction unit 3 estimates the direction of the line in the extracted character string region from the shape of the extracted character string region or the attribute information of the binarized image. The character string region data generated by the character string region extraction unit 3 is sent to the character recognition unit 7 together with the input binarized data and data indicating the estimated line direction. Here, the number of character string regions is not limited to one for one binarized data, and a plurality of character string regions may be extracted. In the following steps, one character recognition process in the character string area extracted in step S2 will be described.

In step S3, the histogram generation unit 4 generates a histogram using the character string region extracted in step S2. The histogram generation unit 4 generates black pixels in the direction of the row estimated when extracting the character string region from the binarized data generated in step S1 and the character string region data extracted in step S2. A histogram indicating the frequency is generated. FIG. 3A shows an example of an image normalized with the estimated row direction as the x-axis, and an example of a histogram generated based on this image is shown in FIG.
Data indicating the histogram generated by the histogram generation unit 4 is sent to the threshold value calculation unit 5 together with the character string region data generated in step S2.

In step S4, the threshold value calculation unit 5 calculates a row determination threshold value th1 for determining a line boundary using the histogram generated in step S3. The calculated line determination threshold th1 is sent to the line boundary determination unit 6 together with the character string region data generated in step S2 and the histogram data generated in step S3.
Here, the detailed operation of step S4 for calculating the row determination threshold value th1 will be described. FIG. 6 is a detailed flowchart illustrating an operation in which the threshold calculation unit 5 calculates a row determination threshold.

In step S41, the threshold value calculation unit 5 detects the peak value P in the histogram generated by the histogram generation unit 4 in step S3.
In step S42, the threshold value calculation unit 5 calculates the row determination threshold value th1 using the peak value P calculated in step S41 and the weighting coefficient ρ.
The example shown in FIG. 3 shows a case where the peak value P of the histogram is 102, and the row determination threshold th1 is 22 as a result of setting the weighting coefficient ρ to 0.22. In FIG. 3B, there is a line boundary corresponding to the line boundary area 4a of the character area shown in FIG. 3A, corresponding to the line boundary area 4a where H (y) is smaller than 22 as th1. Recognize. As described above, the weight coefficient ρ can be adjusted to a value suitable for each type of image or for each region in the image, so that more appropriate row determination can be performed.

In step S5, the line boundary determination unit 6 determines the boundary of different lines in the character string region using the line determination threshold th1 calculated in step S4. The line boundary determination unit 6 stores one or a plurality of coordinates y estimated to be a line boundary by comparing the line boundary threshold th1 and the value of the histogram H (y).

FIG. 7 is a detailed flowchart showing the operation of estimating the line boundary in the line boundary determination unit 6.
First, in step S51-1, ystart included in the character string area data, that is, the coordinate y at which the character string area starts is set as an initial value.
In step S51-2, the line determination threshold th1 is compared with the histogram value H (y) corresponding to the current coordinate y. When H (y) is smaller than the row determination threshold th1 (H (y) <th1), there is a high possibility that there is a row boundary at this coordinate y. In this case, the process proceeds to step S51-3. On the other hand, if H (y) is greater than or equal to the line determination threshold th1 (H (y) ≧ th1), there is a high possibility that a character string exists at this coordinate y. In this case, the process proceeds to step S54.
In step S51-3, the coordinate y is stored as a coordinate that is estimated to have a line boundary, and the process proceeds to step S51-4.
After step S51-3 or when H (y) is greater than or equal to the row determination threshold th1 in step S51-2, the process proceeds to step S51-4, where y is incremented to the next coordinate y, and in step S51-5 The operation from step S51-2 to step S51-5 is repeated until it is determined that y is the coordinate yield at which the character string region 32 ends.
By such an operation, one or a plurality of coordinates y estimated as a line boundary in the character string region are extracted and stored.
In step S51-6, the line boundary determination unit 6 determines that the coordinate y estimated to be a line boundary is one, and if there are a plurality of estimated coordinates adjacent to each other, Of the plurality of coordinates y, the coordinate y located at the center is determined to be the boundary of the row.
The y coordinate determined to be a line boundary is sent to the character recognition unit 7 as line boundary data together with the character string area data generated in step S2.

In step S6, the character string recognition unit performs character recognition processing based on the character string region data generated in step S2 and the line boundary data determined in step S5. When the character recognition process is completed, the character recognition unit 7 outputs a character recognition result. As described above, line boundary determination and character recognition are performed by the character recognition device 1 according to the present embodiment.

As described above, according to the character recognition device 1 according to the first embodiment, the frequency of black pixels in the row direction of the character string region extracted by the character string region extraction unit 3 extracted from the input image data is shown. A histogram is generated, a line determination threshold value is calculated from the generated histogram, and a boundary between different lines in the character string region is determined based on the calculated line determination threshold value. Thereby, the threshold value for determining the boundary between lines is appropriately set based on the characteristics obtained from the entire line direction, and the character string in the character string region can be separated into appropriate lines.

Embodiment 2. FIG.
Next, the character recognition device 1 according to Embodiment 2 will be described. In the first embodiment, the row determination is performed using the row determination threshold th1 for the frequency of black pixels as the determination criterion of the row boundary determination unit 6, but in the second embodiment, in addition to the frequency of black pixels, The line boundary determination is performed by using the gradient g (y) of the histogram as the line determination reference.
In the second embodiment, the detailed configurations and operations of the threshold value calculation unit 5 and the row boundary determination unit 6 are different from those in the first embodiment, and other parts are the same as those in the first embodiment.

The threshold calculation unit 5 calculates a line determination threshold th1 for determining a line boundary from the histogram in the character string region generated by the histogram generation unit 4 as in the first embodiment. In addition, the threshold value calculation unit 5 stores in advance a row determination threshold value th2 relating to the histogram inclination g (y). The slope g (y) of the histogram is dH (y) / dy. Since the slope of the histogram g (y) becomes steep at the boundary of the line, the numerical value becomes large. In the area where the character exists, the numerical value becomes small and becomes small. Therefore, by setting a threshold value for the slope of the histogram, Can determine the boundary.

By using the line determination threshold th1 obtained from the histogram of each individual recognition target image and also using the line determination threshold th2 regarding the inclination g (y) of the histogram, a threshold based on the characteristics of each image, in particular, in character units. A threshold based on characteristics obtained from the entire row direction, which is not such a microscopic judgment criterion, is set, and a histogram gradient g (for a coordinate y estimated to be a line boundary based on the frequency of black pixels is further added. By using the row determination threshold th2 for y), it is possible to remove the coordinate y that has a high possibility of not being a line boundary, and estimate the boundary between lines with higher accuracy.

The line boundary determination unit 6 determines a boundary between different lines in the character string area based on the line determination threshold th2 in addition to the line determination threshold th1 calculated by the threshold calculation unit 5.
In the line boundary determination unit 6, the coordinate y estimated that there is a line boundary based on the line determination threshold th1 calculated by the threshold calculation unit 5 is further determined by the following formula. It is estimated that there is a line boundary in y. In other cases, it is estimated that there is no line boundary in the coordinate y.

If H (y) is smaller than the row determination threshold th1 and H (y) -H (y-1) is larger than the row determination threshold th2, it is estimated that there is a row boundary at the coordinate y, and otherwise In the case of, it is estimated that there is no line boundary at the position y. Here, the difference value between H (y) and H (y−1) is applied to the gradient g (y) of the histogram according to the following equation.

Then, the line boundary determination unit 6 indicates that there is one coordinate y estimated to have a line boundary, and when there are a plurality of adjacent coordinates y that are estimated to have a line boundary, It is determined that the center position among the plurality of coordinates y is the boundary of the row. Which of the plurality of adjacent coordinates y determined to have a line boundary is determined as a line boundary is not limited to the center, and may be selected from a plurality of adjacent coordinates y. If coordinates y determined to have a line boundary are not adjacent to each other, the coordinates y are determined to be a line boundary. A line boundary is not limited to one per character string area, and there may be a plurality of lines.
The determined line boundary indicates position information of a line determined to have a line boundary, and is sent to the character recognition unit 7 together with the character string area data as line boundary data.
Other configurations are the same as those in the first embodiment.

Next, the operation of the character recognition device 1 according to the second embodiment will be described. As for the operation, the detailed operations in step S4 and step S5 are different from those in the first embodiment.

In step S4, the threshold value calculation unit 5 calculates a row determination threshold value th1 for determining a line boundary using the histogram generated in step S3. The method for calculating the row determination threshold th1 is the same as in the first embodiment. The threshold calculation unit 5 has a row determination threshold th2 in addition to the row determination threshold th1. The row determination threshold value th2 is a fixed value set in advance by the user, and is stored in the threshold value calculation unit 5 in advance. The threshold calculation unit 5 sends the row determination thresholds th1 and th2 to the row boundary determination unit 6.

FIG. 8 is a detailed flowchart showing the operation of estimating the line boundary in the line boundary determination unit 6.
First, in step S52-1, ystart included in the character string area data, that is, the coordinate y at which the character string area starts is set as an initial value.
In step S52-2, the line determination threshold th1 is compared with the histogram value H (y) corresponding to the current coordinate y. If H (y) is smaller than the line determination threshold th1 (H (y) <th1), there is a high possibility that there is a line boundary at this coordinate y. In this case, the process proceeds to step S52-3. On the other hand, if H (y) is greater than or equal to the line determination threshold th1 (H (y) ≧ th1), there is a high possibility that a character string exists at this coordinate y. In this case, the process proceeds to step S52-5.
In step S52-3, the line determination threshold th2 is compared with the slope H (y) -H (y-1) of the histogram corresponding to the current coordinate y. When H (y) -H (y-1) is larger than the row determination threshold th2 (H (y) -H (y-1)> th2), the histogram has a steep slope, so this coordinate y is set. It can be estimated that there is a high possibility that there is a line boundary. In this case, the process proceeds to step S52-4. On the other hand, when H (y) −H (y−1) is equal to or less than the row determination threshold value th2 (H (y) −H (y−1) ≦ th2), a black pixel is detected at the coordinate y from the row determination threshold value th1. Although the number is small, there is a high possibility that it is not a line boundary because the slope of the histogram is gentle. In this case, the process proceeds to step S52-5.
In step S52-4, the coordinate y is stored as a coordinate that is estimated to have a line boundary, and the process proceeds to step S52-5.
After step S52-4 is completed, or in step S52-2, H (y) is determined to be equal to or higher than the row determination threshold th1, or in step S52-3, H (y) -H (y-1) is determined to be equal to or lower than the row determination threshold th2. If it is determined, the process proceeds to step S52-5, where y is incremented to the next coordinate y, and from step S52-2 until it is determined in step S52-6 that the coordinate string end of the character string area 32 of y ends. The operation up to step S52-5 is repeated.
By such an operation, one or a plurality of coordinates y estimated as a line boundary in the character string region are extracted and stored.
Then, in step S52-7, the line boundary determination unit 6 indicates that the coordinate y estimated to have a line boundary is one, and if there are a plurality of estimated coordinates adjacent to each other, Of the plurality of coordinates y, the coordinate y located at the center is determined to be the boundary of the row.
Note that the estimation based on the number of black pixels performed in step S52-2 and the estimation based on the slope of the histogram performed in step S52-3 may be interchanged. In this case, the coordinates of the candidate for the line boundary are first estimated based on the inclination of the histogram, and the candidate for the line boundary is determined by further estimating the estimated candidate based on the number of black pixels.
Further, the estimation based on the number of black pixels performed in step S52-2 and the estimation based on the histogram inclination performed in step S52-3 can be integrated, and the following expression of the cost function C can be used as a determination expression. It is.

As described above, according to the character recognition device 1 according to the second embodiment, the frequency of black pixels in the row direction of the character string region extracted by the character string region extraction unit 3 extracted from the input image data is shown. A histogram is generated, a line determination threshold value is calculated from the generated histogram, and a boundary between different lines in the character string region is estimated based on the calculated line determination threshold value th1. Furthermore, by estimating the line boundary based on the slope of the histogram, the threshold value for determining the line boundary is appropriately set based on the characteristics obtained from the entire line direction, and the character string in the character string area Can be separated into appropriate lines.

Embodiment 3 FIG.
Next, the character recognition device 1 according to Embodiment 3 will be described. In the first embodiment, the row determination is performed using the row determination threshold th1 for the frequency of black pixels as the determination criterion of the row boundary determination unit 6, but in the third embodiment, in addition to the frequency of black pixels, The line boundary determination is performed by using the peak value P (n) detected from the histogram as the line determination criterion.
In the third embodiment, the detailed configurations and operations of the threshold value calculation unit 5 and the row boundary determination unit 6 are different from those in the first embodiment, and other parts are the same as those in the first embodiment.

The threshold calculation unit 5 calculates a line determination threshold th1 for determining a line boundary from the histogram in the character string region generated by the histogram generation unit 4 as in the first embodiment. Further, the threshold value calculation unit 5 stores in advance a row determination threshold value th3 relating to the difference P (n) −P (n−1) between the peak values of the histogram.
FIG. 9 shows an example of an image and a histogram when a histogram of a character string region when the lengths of two lines are different is generated. When there are a plurality of rows and their lengths are different, the difference P (n) −P (n−1) in the peak value of the histogram is increased. By using this property and setting a threshold value regarding the difference between the peak values of the histogram, it is possible to determine the boundary between lines when there are a plurality of lines having different lengths.

By using the row determination threshold th1 obtained from the histogram of each individual recognition target image and also using the row determination threshold th3 related to the peak value difference P (n) −P (n−1) of the histogram, Threshold based on features, especially not based on microscopic judgment criteria such as character units, but also based on features obtained from the entire line direction, it is also accurate when multiple lines have different lengths. It is possible to estimate the boundaries of good lines.

The line boundary determination unit 6 determines a boundary between different lines in the character string area based on the line determination threshold th3 in addition to the line determination threshold th1 calculated by the threshold calculation unit 5.
First, the line boundary determination unit 6 estimates that there is a line boundary at the coordinate y when H (y) is smaller than the line determination threshold th1, and the coordinate when H (y) is greater than or equal to the line determination threshold th1. Estimate that there is no row boundary at position y.
Then, the line boundary determination unit 6 determines that the line boundary determination unit 6 has one coordinate y that is estimated to have a line boundary, and one coordinate y that is estimated to have a line boundary. In this case, the central position among the plurality of coordinates y is determined to be a line boundary. Which of the plurality of adjacent coordinates y determined to have a line boundary is determined as a line boundary is not limited to the center, and may be selected from a plurality of adjacent coordinates y. If coordinates y determined to have a line boundary are not adjacent to each other, the coordinates y are determined to be a line boundary. A line boundary is not limited to one per character string area, and there may be a plurality of lines.

On the other hand, the row boundary determination unit 6 calculates a peak difference P (n) −P (n−1). As shown in FIG. 9, the row boundary determination unit 6 detects all the peaks of the generated histogram, and calculates the difference between the peak value P (n) and the adjacent peak value P (n−1). The row boundary determination unit 6 then calculates P (n) and P (n−1) when the peak value difference P (n) −P (n−1) is larger than the row determination threshold th3 according to the following equation. If the difference P (n) −P (n−1) is smaller than the row determination threshold th3, the P (n) and P (n− Estimate that there is no line boundary between each y coordinate taking 1). When the peak value difference P (n) -P (n-1) is larger than the row determination threshold th3, it is estimated that there is a row boundary between P (n) and P (n-1). determines the center position of the coordinate _{y n-1} taking coordinate _{y n} and the peak value P (n-1) to a peak value P (n) to be the boundary line. The position of the line boundary between the coordinate y _n of P (n) and the coordinate y _n−1 of P (n−1) is not limited to the center, but from a plurality of adjacent coordinates y. It only has to be selected.

The determined line boundary indicates position information of a line determined to have a line boundary, and is sent to the character recognition unit 7 together with the character string area data as line boundary data.
Other configurations are the same as those in the first embodiment.

In step S4, the threshold value calculation unit 5 calculates a row determination threshold value th1 for determining a line boundary using the histogram generated in step S3. The method for calculating the row determination threshold th1 is the same as in the first embodiment. The threshold calculation unit 5 has a row determination threshold th3 in addition to the row determination threshold th1. The row determination threshold th3 is a fixed value set in advance by the user, and is stored in the threshold calculation unit 5 in advance. The threshold calculation unit 5 sends the row determination thresholds th1 and th3 to the row boundary determination unit 6.

FIG. 10 is a detailed flowchart showing the operation of estimating the line boundary in the line boundary determination unit 6.
The line boundary determination using the line boundary threshold th1 is the same as in the first embodiment. When the operation according to the flowchart of FIG. 7 described in the first embodiment is completed, the row boundary determination unit performs the operation according to the flowchart illustrated in FIG.
First, in step S53-1, n = 1 is set as the count initial value of the peak value included in the character string area data.
Next, in step S53-2, the row determination threshold th3 is compared with the adjacent peak value difference P (n) -P (n-1). When the peak value difference P (n) −P (n−1) is larger than the row determination threshold th3 (P (n) −P (n−1)> th3), the peak values P (n) and P (n It is highly likely that there is a line boundary between each y coordinate taking -1), and in this case, the process proceeds to step S53-3. On the other hand, when the peak value difference P (n) −P (n−1) is equal to or less than the row determination threshold th3 (P (n) −P (n−1) ≦ th3), the peak value difference P (n) − The presence / absence of a row cannot be determined from P (n−1), and in this case, the process proceeds to step S53-4.
In step S53-3, the y-coordinates between the peak values P (n) and P (n-1) are stored as coordinates estimated to have a line boundary, and the process proceeds to step S53-4. .
After step S53-3 is completed, or when P (n) -P (n-1) is determined to be equal to or less than the row determination threshold th3 in step S53-2, the process proceeds to step S53-4, where n is incremented, The operation from step S53-2 to step S53-5 is repeated until it is determined in step S53-5 that n is nend which is the count value of the last peak value P (nend) of the histogram.
By such an operation, one or a plurality of coordinates y estimated as a line boundary in the character string region are extracted and stored.
In step S53-6, the line boundary determination unit 6 determines that the coordinate y estimated to have a line boundary is one, and the peak value P (n−) from the peak value P (n). When there are a plurality of coordinates y up to 1), it is determined that the coordinate y located at the center among the plurality of coordinates y is the boundary of the row.
As described above, the line boundary data obtained by the determination based on the line boundary threshold th1 by the flowchart of FIG. 7 and the line boundary data obtained by the determination based on the line boundary threshold th3 by the flowchart of FIG. It is sent to the character recognition unit 7. The character recognition unit 7 performs character recognition using both of these line boundary data as in the first embodiment.
Note that the order of the operation of the flowchart of FIG. 7 using the row determination threshold th1 and the operation of the flowchart of FIG. 10 using the row determination threshold th3 may be reversed.
Other operations are the same as those in the first embodiment.

As described above, according to the character recognition device 1 according to the third embodiment, the frequency of black pixels in the row direction of the character string region extracted by the character string region extraction unit 3 extracted from the input image data is shown. A histogram is generated, a line determination threshold value is calculated from the generated histogram, and a boundary between different lines in the character string region is estimated based on the calculated line determination threshold value th1. In addition, by estimating the boundary between lines based on the difference between adjacent peak values in the histogram, it is possible to more clearly separate character strings in the character string area into appropriate lines when the line lengths are different. Can do.

In this embodiment, the line boundary is determined by comparing the line determination threshold th3 and the difference P (n) −P (n−1) between the peak values. However, the line boundary data obtained from the operation of the flowchart of FIG. 10 using the line determination threshold th3 does not indicate only the coordinates but may include data indicating the probability that the line boundary exists at the coordinates. . This probability is calculated, for example, by subtracting the peak value difference P (n) −P (n−1) from the row determination threshold th3 and dividing by the row determination threshold th3. Based on this probability, the character recognition unit 7 can select whether or not to adopt the coordinates indicated in the line boundary data as the line boundary.

Embodiment 4 FIG.
Next, the character recognition device 1 according to Embodiment 4 will be described. In the second embodiment, the line boundary determination is performed by using the line determination threshold th1 for the frequency of black pixels and the gradient g (y) of the histogram as the line determination reference as the determination criterion of the line boundary determination unit 6. However, in the present embodiment, a line boundary is determined using only the gradient g (y) of the histogram as a line determination criterion.
In the fourth embodiment, the detailed configurations and operations of the threshold value calculation unit 5 and the row boundary determination unit 6 are different from those in the second embodiment, and other parts are the same as those in the second embodiment.

The threshold value calculation unit 5 stores in advance a row determination threshold value th2 related to the gradient g (y) of the histogram. The slope g (y) of the histogram is dH (y) / dy. Since the slope of the histogram g (y) becomes steep at the boundary of the line, the numerical value becomes large. In the area where the character exists, the numerical value becomes small and becomes small. Therefore, by setting a threshold value for the slope of the histogram, Can determine the boundary.

The line boundary determination unit 6 determines a boundary between different lines in the character string area based on the line determination threshold th2.
First, the line boundary determination unit 6 estimates that there is a line boundary at the coordinate y when H (y) −H (y−1) is greater than th2, and H (y) −H (y−1) is If it is smaller than th2, it is estimated that there is no line boundary in the coordinate y.

The slope g (y) of the histogram can be calculated assuming that it is a difference value between H (y) and H (y−1).

Then, the line boundary determination unit 6 indicates that there is one coordinate y estimated to have a line boundary, and when there are a plurality of adjacent coordinates y that are estimated to have a line boundary, It is determined that the center position among the plurality of coordinates y is the boundary of the row. Which of the plurality of adjacent coordinates y determined to have a line boundary is determined as a line boundary is not limited to the center, and may be selected from a plurality of adjacent coordinates y. If coordinates y determined to have a line boundary are not adjacent to each other, the coordinates y are determined to be a line boundary. A line boundary is not limited to one per character string area, and there may be a plurality of lines.
The determined line boundary indicates the position information of the line determined to have a line boundary, and is sent to the character recognition unit together with the character string area data as line boundary data.
Other configurations are the same as those in the first embodiment.

Next, the operation of the character recognition device 1 according to the fourth embodiment will be described. As for the operation, the detailed operations in step S4 and step S5 are different from those in the first embodiment.

In step S4, the threshold calculation unit 5 sends the row determination threshold th2 to the row boundary determination unit 6. The row determination threshold value th2 is a fixed value set in advance by the user, and is stored in the threshold value calculation unit 5 in advance.

FIG. 11 shows a detailed flowchart showing the operation of estimating the line boundary in the line boundary determination unit 6.
First, in step S54-1, ystart included in the character string area data, that is, the coordinate y at which the character string area starts is set as an initial value.
In step S54-2, the line determination threshold th2 is compared with the slope H (y) -H (y-1) of the histogram corresponding to the current coordinate y. When H (y) -H (y-1) is larger than the row determination threshold th2 (H (y) -H (y-1)> th2), the histogram has a steep slope, so this coordinate y is set. It can be estimated that there is a high possibility that there is a line boundary. In this case, the process proceeds to step S54-3. On the other hand, when H (y) −H (y−1) is equal to or less than the row determination threshold th2 (H (y) −H (y−1) ≦ th2), a black pixel is detected at the coordinate y from the row determination threshold th2. Although the number is small, the slope of the histogram is gentle, so there is a high possibility that it is not a line boundary. In this case, the process proceeds to step S54-4.
In step S54-3, the coordinate y is stored as a coordinate that is estimated to have a line boundary, and the process proceeds to step S54-4.
After step S54-3, or when it is determined in step S54-2 that P (n) -P (n-1) is equal to or less than the row determination threshold th2, the process proceeds to step S54-4, and y is incremented and the next The operation from step S54-2 to step S54-5 is repeated until it is determined in step S54-5 that the coordinate yend is the end of the character string area 32 of y.
By such an operation, one or a plurality of coordinates y estimated as a line boundary in the character string region are extracted and stored.
In step S54-6, the line boundary determination unit 6 determines that there is one coordinate y estimated to have a line boundary, and if there are a plurality of estimated coordinates adjacent to each other. The coordinate y located at the center among the plurality of coordinates y is determined to be a line boundary.
Other operations are the same as those in the first embodiment.

As described above, according to the character recognition device 1 according to the fourth embodiment, the frequency of black pixels in the row direction of the character string region extracted by the character string region extraction unit 3 extracted from the input image data is shown. By generating a histogram and estimating the line boundary from the generated histogram based on the slope of the histogram, the threshold for determining the line boundary is appropriately set based on the characteristics obtained from the entire line direction, The character string in the character string area can be separated into appropriate lines.

In any of the above embodiments, the binarization processing unit 2 generates and uses binarized data, but the target data is not limited to the binarized data, and the boundary between the character part and the line is As long as the data can be distinguished, it is also possible to use, for example, multi-value data representing pixels in multiple values, or data indicating chromaticity.

In any of the above embodiments, the character string region data includes binarized data in the character region extraction unit. However, the binarized data may be sent directly from the binarization processing unit 2 to each unit that requires the binarized data. Not only binarized data but also other data may be sent directly from each unit to each unit that requires the data.

In any of the above embodiments, the center position of the plurality of coordinates y is set to the line position only when there are a plurality of adjacent coordinates y that are estimated to be line boundaries by the line boundary determination unit 6. Judged to be a boundary. However, if a certain range is provided and the coordinates y estimated to have a line boundary are within the certain range, it is estimated that the coordinates y are adjacent to each other, and the center position is determined to be the line boundary. May be.

Note that all of the above embodiments are realized in the configuration shown in FIGS. FIG. 12 is a hardware configuration diagram for realizing the character recognition apparatus according to the first embodiment by hardware. An image is input by an image capturing device 8 including a scanner and a camera. The binarization processing unit 2, the character string extraction unit 3, the histogram generation unit 4, the threshold value calculation unit 5, the line boundary determination unit 6, and the character recognition unit 7 are realized by a processing circuit 9. The processing circuit 9 may be realized by, for example, a single circuit, a composite circuit, a programmed processor, a parallel programmed processor, or various electronic circuits combining these. The display 10 displays the progress of the process. Each program is stored in the hard disk 11.
FIG. 13 is a hardware configuration diagram when the character recognition apparatus according to the first embodiment is realized by software. When the character recognition device 1 is configured as a computer as described above, the processor 12 is converted into the binarization processing unit 2, the character string extraction unit 3, the histogram generation unit 4, the threshold value calculation unit 5, the line boundary determination unit 6, and the character recognition unit. Function as 7. The processor 12 is realized by software, firmware, or a combination of software and firmware. These software and firmware are stored in the hard disk 11 and function by being taken out from the hard disk 11 to the memory 13 when executed.

1. 1. Character recognition device 2. binarization processing unit; Character string region extraction unit 3a. 3. binarized image 3b character string area Histogram generator 4a. 4. Line boundary region Threshold calculation unit 5a. Line determination difficult area 5b. Line determination easy area 6. 6. Line boundary determination unit Character recognition unit 8. 8. Image capturing device Processing circuit 10. Display 11. hard disk

Claims

A character string region extraction unit that extracts a character string region from input image data;
A histogram generation unit that generates a histogram indicating the frequency of black pixels in the row direction of the character string region extracted by the character string region extraction unit;
A threshold calculation unit that calculates a row determination threshold from the histogram generated by the histogram generation unit;
A line boundary determination unit that determines a boundary between different lines in the character string region based on the line determination threshold calculated by the threshold calculation unit;
A character recognition unit that recognizes characters in the character string region based on a boundary between the character string region extracted by the character string region extraction unit and a line determined by the line boundary determination unit;
A character recognition device.
The character recognition device according to claim 1, wherein the line determination threshold calculated from the histogram by the threshold calculation unit is a threshold for the frequency of black pixels.
3. The character recognition device according to claim 2, wherein the black pixel frequency threshold is obtained by multiplying a peak value of the histogram by a coefficient.
The character recognition device according to claim 1, wherein the line determination threshold calculated from the histogram by the threshold calculation unit is a threshold with respect to an inclination of the histogram.
The character recognition device according to claim 1, wherein the line determination threshold calculated from the histogram by the threshold calculation unit is a threshold for a difference between peak values of the number of frequencies of double black pixels in the histogram. .
A character string region extraction step for extracting a character string region from the input image data;
A histogram generation step of generating a histogram indicating the frequency of black pixels in the row direction of the character string region extracted in the character string region extraction step;
A threshold calculation step for calculating a row determination threshold from the histogram generated in the histogram generation step;
A line boundary determination step for determining a boundary between different lines in the character string region based on the line determination threshold calculated in the threshold calculation step;
A character recognition step for recognizing characters in the character string region based on a boundary between the character string region extracted in the character string region extraction step and a line boundary determined in the line boundary determination step;
A character recognition method.