WO2016086877A1 - 一种文本检测的方法和装置 - Google Patents

一种文本检测的方法和装置 Download PDF

Info

Publication number
WO2016086877A1
WO2016086877A1 PCT/CN2015/096305 CN2015096305W WO2016086877A1 WO 2016086877 A1 WO2016086877 A1 WO 2016086877A1 CN 2015096305 W CN2015096305 W CN 2015096305W WO 2016086877 A1 WO2016086877 A1 WO 2016086877A1
Authority
WO
WIPO (PCT)
Prior art keywords
stroke
pixel
esw
calculating
orientations
Prior art date
Application number
PCT/CN2015/096305
Other languages
English (en)
French (fr)
Inventor
江淑红
吴波
Original Assignee
夏普株式会社
江淑红
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 夏普株式会社, 江淑红 filed Critical 夏普株式会社
Priority to JP2017528527A priority Critical patent/JP2017535891A/ja
Publication of WO2016086877A1 publication Critical patent/WO2016086877A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition

Definitions

  • the present invention relates to human-computer interaction techniques, and in particular to text detection or optical character recognition OCR techniques.
  • the natural scene contains not only a large amount of graphic information, but also rich text information such as road signs, store names, and the like. These textual information have important value for the description and understanding of scene content, which is the key clue for scene image retrieval. Therefore, an automated tool is urgently needed to obtain text information in a scene through text recognition in a natural scene, and to improve the efficiency of image data management for searching, querying, browsing scene image data and understanding scene content.
  • Mobile phones, PDAs, desktops, laptops, tablets, and other electronic devices often support text detection or optical character recognition (OCR).
  • OCR optical character recognition
  • Stroke Width Transformation is a commonly used text detection method in the prior art. "Detecting Text in Natural Scenes with Stroke Width Transform” (IEEE Computer Vision and Pattern Recognition CVPR, 2010) provides a text detection method based on SWT. As described therein, Stroke Width Transformation (SWT) is a successful method for text detection in natural scenes. This method detects text regardless of the text's zoom, orientation, font, and language. To extract stroke information, SWT first uses the Canny edge detector to calculate the edges of the image. Then, consider the gradient orientation of each edge pixel to find its stroke width. SWT is a local image operator that computes the most likely stroke width for that pixel for each pixel. The output of the SWT is an image of equal size to the input image, where each point stores the width of the stroke associated with the pixel.
  • FIG. 1 shows a schematic diagram of implementing an SWT method
  • FIG. 2 shows a flow chart of implementing an SWT method.
  • the SWT method will now be described in conjunction with Figures 1 and 2.
  • Figure 1 (a) is a schematic diagram of a typical stroke in which the pixels of the stroke are darker than the pixels of the background.
  • the edge of the input image is calculated by an edge detector such as a Canny edge detector.
  • the initial value of the value stored in association with all the pixel points in the stroke edge and the stroke is + ⁇ .
  • step S120 For each pixel on the edge of the stroke (for example, point p shown in Figure 1(b)), The tangential direction at the pixel point p is calculated, and then the gradient (normal) direction (the gradient direction and the tangential direction are perpendicular to each other) is calculated (step S120).
  • step S130 the pixel point q on the opposite edge of the stroke on the gradient orientation is obtained, and the distance between the two pixel points p, q is calculated as the stroke width w at the pixel point p, as shown in FIG. 1(b). Shown.
  • step S140 for each pixel point t between two points p and q (as shown in Fig. 1(c)), the value a stored in association with t is obtained.
  • step S150 It is judged whether or not the stroke width w at the pixel point p is smaller than the value a stored in association with the pixel point t (step S150). If the stroke width w is smaller than the value a stored in association with the pixel point t, the stored value a associated with the pixel point t is replaced with the stroke width w as a new associated stored value a (step S160). Then, the above operation is repeated for other pixel points in the gradient direction (step S170). Finally, the above operations are repeated for other pixels on the edge of the stroke (step S180).
  • step S120 since the edge of the stroke has an irregular shape, calculating the tangential direction at the pixel point p in step S120 is a very complicated process, and the process is calculated. The complexity is high and consumes a large amount of processor resources and calculation time; in step S150, the value a of the stroke width w associated with the pixel point t is compared, but since the stroke edge points are large and the stroke edge shape is irregular, the point inside the stroke There may be multiple normals passing through, which will result in too many comparisons and is very cumbersome to handle.
  • the present invention proposes a new simplified estimated stroke width (ESW) text detecting method.
  • the ESW measures the distance of edge pixels along multiple predetermined orientations as stroke width, which reduces computational complexity and saves processor resources and computation time.
  • the distance between the pixel points on the opposite edge of the stroke in the gradient direction is calculated as the stroke width by calculating the tangential direction and the gradient (normal) direction for each edge pixel in the SWT, and in the present invention, the ESW passes Measuring a minimum value of the distance of each edge pixel of the stroke along a plurality of predetermined directions to a pixel point on the opposite edge as the stroke width at the edge pixel point degree.
  • the ESW does not need to calculate the tangential direction at each pixel of the edge of the stroke but adopts a predetermined plurality of fixed orientations, and because of the fixed orientation, the number of comparisons at each pixel in the stroke is relatively fixed, thereby reducing computational complexity. And save processor resources and computing time.
  • a method for calculating an estimated stroke width ESW comprising the steps of: acquiring stroke edge information according to a binarized image; calculating each stroke edge pixel point to be no less than four The stroke width in the orientation, the stroke width of each stroke edge pixel in not less than four orientations is the stroke edge pixel point to a line located by the stroke edge pixel point and the orientation The distance of the edge pixel of another stroke; the calculated stroke width of each stroke edge pixel in not less than four orientations and the pixel within each stroke along the stroke edge pixel The points are associated; and for each pixel within the stroke, a minimum of a plurality of stroke widths associated with the pixels within the stroke is selected as the estimated stroke width ESW of the pixels within the stroke.
  • the calculating step includes calculating a stroke width in no less than four orientations for each stroke edge pixel, the associating step comprising calculating the calculated not less than four orientations
  • the stroke widths are respectively associated with pixels within each stroke along the orientation
  • the selecting step includes selecting, for each pixel within each stroke, a plurality of stroke widths associated with the pixels within the stroke. The minimum value is taken as the estimated stroke width ESW of the pixel within the stroke.
  • the calculating step includes calculating a stroke width at each of the stroke edge pixel points for each of the not less than four orientations, the associating step comprising: Upward pixel points in the stroke that are not stored in association, and the calculated stroke width is stored in association with the pixel points in the stroke; for the pixel points in the stroke along the orientation that have been stored in association, the calculated stroke is calculated The width is compared to a value that has been stored in association with the pixel within the stroke, and if the stroke width is less than a value stored in association with the pixel within the stroke, the value stored in association with the pixel within the stroke is overwritten with the stroke width .
  • the number of orientations of the not less than four orientations is four.
  • the not less than four orientations comprise a horizontal orientation and a vertical orientation.
  • the angle between any of the four orientations and the adjacent orientation is 45 degrees.
  • the four orientations are horizontal, vertical, inclined 45 degrees to the upper right and 45 degrees to the lower right.
  • a non-text removal method that utilizes a connected domain feature with respect to text characteristics and a connected domain feature with respect to connected domains and surrounding connected domain associated information, characterized in that
  • the connected domain feature with respect to the text characteristic includes an ESW calculated using the method for calculating the ESW as described above for each pixel, and a variance of the ESW within the connected domain; the related information about the connected domain and its surrounding connected domain
  • the connected domain feature includes an average ESW of the connected domain, and the average ESW of the connected domain is an average of the ESWs calculated using the method of calculating the ESW as described above for each pixel in the connected domain.
  • the connected domain feature with respect to text characteristics further includes one or more of the following: an aspect ratio of the circumscribed rectangular frame and a proportion of the foreground pixel area in the region.
  • the connected domain feature about the connected domain and its surrounding connected domain association information further includes one or more of the following: a distance between the circumscribed rectangular frames of the adjacent domain, an average area and an area of the area The average gray level.
  • an OCR method comprising a pre-processing step comprising: performing non-text removal using the method as described above.
  • an apparatus for calculating an estimated stroke width ESW comprising: an acquisition unit configured to: acquire stroke edge information according to a binarized image; and a calculation unit configured to: calculate each The stroke width of the stroke edge pixel is not less than four orientations, and the stroke width of each stroke edge pixel point in not less than four orientations is the stroke edge pixel point to be located by the stroke edge pixel The distance between the point and another stroke edge pixel on the line determined by the orientation; the association unit is configured to: respectively calculate the stroke width of each stroke edge pixel in not less than four orientations The stroke edge pixel points are associated with each pixel within the stroke; and the selection unit is configured to: select, for each pixel within each stroke, a plurality of pixels associated with the pixel within the stroke The minimum value of the stroke width is taken as the estimated stroke width ESW of the pixel within the stroke.
  • the computing unit includes calculating a stroke width in no less than four orientations for each stroke edge pixel, the correlation unit including the calculated not less than four orientations
  • the stroke widths are respectively associated with pixels within each stroke along the orientation
  • the selection unit includes selecting, for each pixel within the stroke, a plurality of stroke widths associated with the pixels within the stroke. The minimum value is taken as the estimated stroke width ESW of the pixel within the stroke.
  • the computing unit includes calculating a stroke width at each of the stroke edge pixel points for each of the not less than four orientations, the association unit comprising: Upward pixel points in the stroke that are not stored in association, and the calculated stroke width is stored in association with the pixel points in the stroke; for the pixel points in the stroke along the orientation that have been stored in association, the calculated stroke is calculated The width is compared to a value that has been stored in association with the pixel within the stroke, and if the stroke width is less than a value stored in association with the pixel within the stroke, the value stored in association with the pixel within the stroke is overwritten with the stroke width .
  • the number of orientations of the not less than four orientations is four.
  • the not less than four orientations comprise a horizontal orientation and a vertical orientation.
  • the angle between any of the four orientations and the adjacent orientation is 45 degrees.
  • the four orientations are horizontal, vertical, inclined 45 degrees to the upper right and 45 degrees to the lower right.
  • a non-text remover apparatus comprising the apparatus for calculating an ESW as described above, the non-text remover apparatus configured to: utilize connected domain features regarding text characteristics and about connected domains And a connected domain feature of the connected domain associated information, wherein the connected domain feature with respect to the text characteristic includes an ESW calculated by using the device for calculating the ESW for each pixel, and a variance of the ESW within the connected domain;
  • the connected domain feature of the connected domain and its surrounding connected domain association information includes an average ESW of the connected domain, and the average ESW of the connected domain is calculated by using the device for calculating the ESW for each pixel in the connected domain. The average value of ESW.
  • the connected domain feature with respect to text characteristics further includes one of the following Or more: the aspect ratio of the circumscribed rectangle and the proportion of the foreground pixel area in the area.
  • the connected domain feature about the connected domain and its surrounding connected domain association information further includes one or more of the following: a distance between the circumscribed rectangular frames of the adjacent domain, an average area and an area of the area The average gray level.
  • an OCR system comprising a pre-processing device comprising a non-text remover device as described above.
  • the computational complexity can be reduced and the processor resources and computing time can be saved, thereby meeting the requirements of the real-time OCR system in the natural scene.
  • FIG. 1 shows a schematic diagram of implementing the SWT method in the prior art
  • FIG. 3 shows a flow chart of an ESW calculation method in accordance with the present invention
  • FIG. 4 shows a flow chart of an embodiment of an ESW calculation method in accordance with the present invention
  • FIG. 5 illustrates three different orientations of the ESW calculation method in accordance with the present invention
  • FIG. 6 shows a flow chart of another embodiment of an ESW calculation method in accordance with the present invention.
  • FIG. 7 shows a schematic diagram of an implementation of another embodiment of an ESW calculation method in accordance with the present invention.
  • FIG. 8 shows a flow chart of an OCR method in accordance with the present invention
  • Figure 9 is a diagram showing the effect of an image processed by each step of the OCR method according to the present invention.
  • FIG. 10 is a block diagram showing an estimated stroke width ESW computing device in accordance with the present invention.
  • Figure 11 shows a block diagram of a non-text remover device in accordance with the present invention.
  • Figure 12 shows a block diagram of an OCR system in accordance with the present invention.
  • the stroke width of each pixel within the stroke is calculated below using a plurality of specific orientations as an example, and various embodiments in accordance with the present invention are specifically described. However, it should be noted that the present invention is not limited to the following embodiments, but is applicable to more other text detection or optical character recognition OCR methods and systems.
  • Figure 3 shows a flow chart of the ESW calculation method.
  • stroke edge information is acquired based on the binarized image (step S310).
  • step S320 the stroke width of each stroke edge pixel point in not less than four orientations is calculated.
  • step S330 the calculated stroke width of each stroke edge pixel point in not less than four orientations is respectively related to the pixel point passing through the stroke edge pixel point and along each stroke in the orientation. Union.
  • step S340 the minimum of the plurality of stroke widths associated with the pixel points within the stroke is selected as the estimated stroke width ESW of the pixel within the stroke.
  • Figure 4 shows a flow chart of one embodiment of an ESW calculation method.
  • Figure 5 illustrates the implementation of the ESW calculation method in combination with three different orientations.
  • a commonly used Japanese character is shown in Figure 5(a).
  • An enlarged view of one edge point of the Japanese character in Fig. 5(a) is shown in Fig. 5(b).
  • four orientations such as horizontal (Ox), vertical (Oy), obliquely upward 45 degrees (Os), and downward rightward 45 degrees (Ot) are selected as the measurement orientation in this embodiment.
  • step S410 in order to reduce time consuming, the present invention does not utilize any edge detector to calculate edges, but only uses edges obtained from the binarization step.
  • step S420 the stroke width in no less than four orientations is calculated for each stroke edge pixel. That is, as shown in FIG. 4(b), for the stroke edge pixel point O, the distance of Ox or Oy, Os, Ot is calculated.
  • the stroke width in each orientation is the distance from the stroke edge pixel point to another stroke edge pixel point on the line determined by the stroke edge pixel point and the orientation.
  • the calculated stroke widths of the not less than four orientations are respectively associated with the pixels within each stroke along the orientation. In the present embodiment, assuming that the stroke width in the Os orientation is 10, the value 10 is stored in association with each pixel point (including the point m in the Os orientation) within the stroke in the corresponding orientation.
  • step S440 for each pixel within the stroke, the minimum value of the plurality of stroke widths stored in association with the pixel points in the stroke is selected as the estimation of the pixel points in the stroke. Stroke width ESW. For example, in the stroke of FIG.
  • ESW is the minimum of the stroke width in the Qx, Qy, Qt, and Qs orientations, that is, the stroke width in the Qt orientation.
  • the ESW algorithm in this embodiment only needs 3 comparisons and adjacent pixels in four orientations.
  • the coordinate calculations are very simple (the coordinates of the x and y axes of adjacent pixels are either the same or a difference of 1 compared to the previous pixel).
  • Figure 5(d) shows another orientation related to the four orientations in Figure 5(b). Specifically, the four orientations in FIG. 5(d) are obtained by rotating the four orientations in FIG. 5(c) at a certain angle ((0, 90°)). Since the angles of the two orientations are both 45°, the four orientations are evenly distributed for the entire 360° direction, but since the four orientations do not include horizontal and vertical orientations, Figure 5(d) The amount of calculation of the ESW shown is larger than the amount of calculation of the ESW of FIG. 5(b).
  • Figure 5 (e) shows an embodiment in which the four orientations are not uniformly distributed.
  • the solution in this embodiment is a scheme in which the coverage of each orientation is unevenly distributed. If the two orientations of the scheme are vertical and horizontal, it is just the tangent orientation of the horizontal and vertical strokes. At this time, the scheme is also applicable to Chinese, Japanese, Korean, and the like with many horizontal and vertical strokes.
  • ESW estimated stroke width
  • Figure 6 shows a flow chart of another embodiment of an ESW calculation method.
  • Figure 7 depicts an implementation corresponding to this alternative embodiment of the ESW calculation method.
  • the present embodiment preferentially Scan in a fixed orientation.
  • step S610 the edge obtained from the binarization step is used similarly to step S410, and details are not described herein again.
  • step S620 for each of the not less than four orientations, the stroke width at each stroke edge pixel point is calculated. That is, all of the pixels at the edge of the stroke are scanned at a specified fixed orientation, and the width of the stroke of each pixel in the fixed orientation is calculated.
  • step S630 the calculated stroke width is stored in association with the pixel within the stroke for the pixel points in the stroke that are not associated stored in the orientation; for the correlation along the orientation has been associated a pixel within the stored stroke, the calculated stroke width is compared with a value already stored in association with the pixel within the stroke, and if the stroke width is less than a value stored in association with the pixel within the stroke, the stroke is The width overrides the value stored in association with the pixel within the stroke.
  • step S635 it is judged whether or not the next fixed orientation scan is to be performed, and if so, the flow returns to step S620 to repeat the above process.
  • step S640 for each pixel within the stroke, the pixel-related stored value within the stroke is the minimum value of the plurality of stroke widths in the four orientations. This minimum value is taken as the estimated stroke width ESW of the pixel within the stroke.
  • FIG. 8 shows a flow chart of an OCR method to which the ESW scheme of the present invention can be applied. An effect diagram of an image processed by each step of the OCR method is shown in FIG. The principle of the OCR method will be described in detail below with reference to FIGS. 8 and 9.
  • the OCR method is divided into two main steps: pre-processing S810 and OCR engine S820.
  • pre-processing S810 the natural scene image (as shown by the image 901 in Fig. 9) is subjected to image preprocessing S810.
  • image preprocessing step S810 in order to be suitable for practical use, the present invention uses a binarized local threshold based on the image contrast in the sub-image area. The contrast of the image is enhanced when there is a small difference between the foreground and the background of the input image. If the color of the foreground is brighter than the background color, the grayscale image of the sub-image area will be inverted before binarization.
  • step S811 the obtained grayscale image is binarized, according to a certain The rule divides each pixel in the grayscale image into two categories, 0 and 1.
  • step S812 the present invention uses two different connected domain features to remove non-text regions.
  • One connected domain feature is a text feature; the other connected domain feature is an associated feature of the connected domain and its surrounding connected domain.
  • ESW estimated stroke width
  • a character always includes strokes of similar width, and the stroke boundaries are nearly smooth.
  • a connected domain (CC) feature can be used to identify a connected domain.
  • Connected domain features for text features include, but are not limited to, one or more of the following:
  • ESW Estimated Stroke Width
  • Text is considered to appear in groups, and a set of texts often have similarities, such as stroke width, character width, height, character spacing, etc., and the spacing between front and back characters is nearly equal.
  • a set of text can be identified using a connected domain feature of a connected domain to its peripheral connected domain.
  • Connected domain features relating to the relationship of a connected domain to its peripheral connected domains include, but are not limited to, one or more of the following:
  • the average ESW of the connected domain Characters in a set of text always include strokes of similar width, so the ESW mean of each connected domain is approximately equal to the ESW mean of the connected domains around them.
  • the ratio of ESW mean values of adjacent connected domains is less than 2.0;
  • step S813 a number of adjacent connected domains, which may be texts retained in step S812, are combined into candidate characters: for the Latin language, one letter is one character, and for Chinese characters, each character element It can be combined into one Chinese character according to the structure of up, down, left and right, and surrounding. Then consider one or more of the following rules:
  • the distance between the circumscribed rectangles of adjacent candidate characters should be nearly equal.
  • the interval between the circumscribed rectangles of adjacent candidate characters is no more than three times the width of the wide character.
  • the connected domain characters satisfying the condition can be clustered to form a text row (or column).
  • the method of the present invention can calculate the ESW feature at a higher speed, and thus it is more suitable as a pre-processing system before the real-time OCR system of the natural scene.
  • SWT Stroke Width Transformation
  • the text area (shown as image 902 in FIG. 9) detected in step S813 is output to the OCR engine processing S820.
  • the character recognition result is finally obtained from the OCR engine step S820 and output to the user interface of the application (as shown by image 903 in Fig. 9).
  • Figure 10 shows a block diagram of an estimated stroke width ESW computing device 1000 in accordance with the present invention.
  • the device 1000 for calculating the stroke width ESW includes: an obtaining unit 1010 configured to: acquire stroke edge information according to the binarized image; and a calculating unit 1020 configured to: calculate each edge of the stroke pixel at not less than four The stroke width in the orientation, the stroke width of each stroke edge pixel in not less than four orientations is the stroke edge pixel point to a line located by the stroke edge pixel point and the orientation Another stroke distance of the edge pixel; the associating unit 1030 is configured to: calculate the stroke width of each stroke edge pixel in not less than four orientations respectively and along the edge pixel point of the stroke A pixel within each stroke on the orientation is associated; a selection unit 1040 configured to: a needle For each pixel within the stroke, the minimum of the plurality of stroke widths associated with the pixel within the stroke is selected as the estimated stroke width ESW of the pixel within the stroke.
  • FIG 11 shows a block diagram of a non-text remover device in accordance with the present invention.
  • the non-text remover device 1100 includes a device 1110 that calculates an ESW as described above, the non-text remover device 1110 configured to utilize a connected domain feature with respect to text characteristics and a connected domain with respect to connected domains and surrounding connected domain associated information
  • the feature, wherein the connected domain feature with respect to the text characteristic comprises an ESW calculated using the device for calculating the ESW for each pixel, the connected domain feature of the connectivity domain and its surrounding connected domain association information, including
  • the average ESW of the connected domains, the average ESW of the connected domains is the average of the ESWs calculated using the device calculating the ESW for each pixel in the connected domain.
  • Figure 12 shows a block diagram of an OCR system in accordance with the present invention.
  • the system includes two main devices: a pre-processing device 1210 and an OCR engine device 1220.
  • a natural scene image shown as image 901 in FIG. 9 is subjected to image preprocessing by the image preprocessing apparatus 1210.
  • image pre-processing device 1210 for real-time applications, embodiments of the present invention use binarized local thresholds based on image contrast in sub-image regions. The contrast of the image is enhanced when there is a small difference between the foreground and the background of the input image. If the color of the foreground is brighter than the background color, the grayscale image of the sub-image area will be inverted before binarization.
  • the obtained gradation image is subjected to binarization processing, and each pixel point in the gradation image is divided into two types, that is, 0 and 1, according to a certain rule.
  • the present invention uses two different aspects of connected domain features to remove non-text regions.
  • One connected domain feature is a text feature; the other connected domain feature is a feature of the connected domain and its peripheral relationship.
  • the simplified estimated stroke width (ESW) feature described above is used as one of the text features.
  • a plurality of adjacent connected domains which may be texts retained in the device 1212, are combined into candidate characters, and the connected domain characters are clustered according to the rules mentioned in step S813 to form a text line ( Or column) and output the result (shown as image 902 in FIG. 9) to OCR engine device 1220.
  • the character recognition result is finally obtained from the OCR engine device 1220 and output to the user interface of the application (as shown by image 903 in FIG. 9).
  • ESW estimated stroke width
  • the solution of the present invention can be implemented on mobile phones, PDAs, desktop computers, notebook computers, tablets, and other electronic devices that typically support text detection or optical character recognition (OCR).
  • OCR optical character recognition
  • the solution of the present invention can be implemented by software, hardware or a combination of both software and hardware.
  • various components within the device in the above embodiments may be implemented by various devices including, but not limited to, analog circuit devices, digital circuit devices, digital signal processing (DSP) circuits, programmable processors, dedicated Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), Programmable Logic Devices (CPLDs), and more.
  • DSP digital signal processing
  • ASICs dedicated Integrated Circuits
  • FPGAs Field Programmable Gate Arrays
  • CPLDs Programmable Logic Devices

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Input (AREA)

Abstract

一种计算估计笔画宽度ESW的方法及装置,所述方法包括以下步骤:根据二值化图像,获取笔画边缘信息(S310);计算每个笔画边缘像素点在不少于四个取向上的笔画宽度(S320),所述每个笔画边缘像素点在不少于四个取向上的笔画宽度是所述笔画边缘像素点到位于由所述笔画边缘像素点和所述取向决定的直线上的另一笔画边缘像素点的距离;将计算得到的每个笔画边缘像素点在不少于四个取向上的笔画宽度分别与经过该笔画边缘像素点并沿着该取向上的每个笔画内像素点相关联(S330);以及针对每个笔画内像素点,选择与所述笔画内像素点相关联的多个笔画宽度的最小值作为所述笔画内像素点的估计笔画宽度ESW(S340)。

Description

一种文本检测的方法和装置 技术领域
本发明涉及人机交互技术,具体地涉及文本检测或光学字符识别OCR技术。
背景技术
自然场景中不仅包含大量的图形信息,而且存在丰富的文本信息,例如道路标志、商店名称等。这些文本信息对场景内容的描述与理解有重要的价值,该信息是场景图像检索的关键线索。因而迫切需要一种自动化的工具,通过自然场景中文本认知获取场景中的文本信息,为检索、查询、浏览场景图像资料和理解场景内容服务,提高图像资料的管理效率。移动电话、PDA、台式计算机、笔记本电脑、平板电脑和其他电子设备通常都能支持文本检测或光学字符识别(OCR)。
笔画宽度变换(SWT)是现有技术中一种常用的文本检测方法。“Detecting Text in Natural Scenes with Stroke Width Transform”(IEEE计算机视觉和模式识别CVPR,2010)提供了基于SWT的文本检测方法。如其中所述,笔画宽度变换(SWT)是一种用于自然场景中的文本检测的成功方法。不管文本的缩放、方向、字体和语言,该方法都能检测文本。为了提取笔画信息,SWT首先使用Canny边缘检测器来计算图像的边缘。然后,考虑每个边缘像素的梯度取向来找到其笔画宽度。SWT是一种局部图像算子,对每个像素点计算包含该像素点的最有可能的笔画宽度。SWT的输出是与输入图像具有相等大小的图像,其中,每个点存储的是与像素点相关联的笔画的宽度。
图1示出了实现SWT方法的示意图,图2示出了实现SWT方法的流程图。现在结合图1和图2描述SWT方法。图1(a)是一个典型笔画的示意图,其中,笔画的像素点比背景的像素点颜色更深。首先,在图2的步骤S100中,通过边缘检测器(例如Canny边缘检测器)计算输入图像的边缘。然后,在步骤S110中,将笔画边缘及笔画内部所有像素点关联存储的值赋初值为+∞。对于笔画边缘上的每个像素点(例如图1(b)中所示的点p), 计算在该像素点p处的切线方向,然后计算梯度(法线)方向(梯度方向与切线方向相互垂直)(步骤S120)。接下来,在步骤S130中,获得梯度取向上笔画对面边缘上的像素点q,并计算两个像素点p、q之间的距离作为像素点p处的笔画宽度w,如图1(b)所示。在步骤S140中,对于p、q两点之间的每一个像素点t(如图1(c)所示),获得与t关联存储的值a。判断像素点p处的笔画宽度w是否小于与像素点t关联存储的值a(步骤S150)。如果笔画宽度w小于与像素点t关联存储的值a,则用笔画宽度w替代像素点t关联存储的值a,作为新的关联存储值a(步骤S160)。然后,对梯度方向上其他像素点重复以上操作(步骤S170)。最后,对笔画边缘上其他像素点重复以上操作(步骤S180)。
但通过分析以上SWT算法,可以很容易地知道该算法存在以下问题:因为笔画边缘呈不规则的形状,所以步骤S120中计算在像素点p处的切线方向是一个非常复杂的过程,该过程计算复杂度高而且消耗大量处理器资源和计算时间;在步骤S150中对笔画宽度w与像素点t关联存储的值a进行比较,然而由于笔画边缘点多且笔画边缘形状不规则,笔画内部的点可能会有多条法线经过,这样会造成比较次数过多,处理非常繁琐。
因此,利用SWT的文本检测处理太复杂和耗时,现有技术提到这种文本检测的时间是0.94秒,而对于自然场景的OCR系统来说,文本检测之后的OCR处理过程也要花费时间,还有之后的进一步应用,例如翻译或检索等等,所以SWT的这种速度作为OCR系统中的预处理步骤来说太慢,远无法达到实现自然场景OCR系统的实时性的要求。
发明内容
为了解决以上技术问题,本发明提出了一种新的简化估计笔画宽度(ESW)文本检测方法。ESW测量边缘像素点沿多个预定取向的距离作为笔画宽度,可以降低计算复杂度并节省处理器资源和计算时间。
具体地,与SWT中通过对于每个边缘像素点计算切线方向和梯度(法线)方向来计算与梯度方向上笔画对面边缘上的像素点的距离作为笔画宽度不同,在本发明中,ESW通过测量笔画每个边缘像素点沿多个预定方向到对面边缘上的像素点的距离的最小值作为该边缘像素点处的笔画宽 度。ESW不用计算在笔画边缘每个像素点处的切线方向而是采用预定的多个固定取向,并且由于采用固定取向,会使在笔画内各像素点处的比较次数相对固定,从而可以降低计算复杂度并节省处理器资源和计算时间。
具体地,根据本发明的一个方面,提供了一种计算估计笔画宽度ESW的方法,包括以下步骤:根据二值化图像,获取笔画边缘信息;计算每个笔画边缘像素点在不少于四个取向上的笔画宽度,所述每个笔画边缘像素点在不少于四个取向上的笔画宽度是所述笔画边缘像素点到位于由所述笔画边缘像素点和所述取向决定的直线上的另一笔画边缘像素点的距离;将计算得到的每个笔画边缘像素点在不少于四个取向上的笔画宽度分别与经过该笔画边缘像素点并沿着该取向上的每个笔画内像素点相关联;以及针对每个笔画内像素点,选择与所述笔画内像素点相关联的多个笔画宽度的最小值作为所述笔画内像素点的估计笔画宽度ESW。
在一个实施例中,所述计算步骤包括对于每个笔画边缘像素点,计算在不少于四个取向上的笔画宽度,所述关联步骤包括将计算得到的所述不少于四个取向上的笔画宽度分别与沿着该取向上的每个笔画内像素点进行关联存储,并且所述选择步骤包括针对每个笔画内像素点,选择与所述笔画内像素点关联存储的多个笔画宽度的最小值作为所述笔画内像素点的估计笔画宽度ESW。
在一个实施例中,所述计算步骤包括对于所述不少于四个取向中的每个取向,计算在每个笔画边缘像素点处的笔画宽度,所述关联步骤包括:对于沿着该取向上的未进行关联存储的笔画内像素点,将计算得到的笔画宽度与该笔画内像素点进行关联存储;对于沿着该取向上的已经进行关联存储的笔画内像素点,将计算得到的笔画宽度与在该笔画内像素点已经关联存储的值进行比较,如果所述笔画宽度小于与该笔画内像素点关联存储的值,则以所述笔画宽度覆盖与该笔画内像素点关联存储的值。
在一个实施例中,所述不少于四个取向的取向的数量为四。
在一个实施例中,所述不少于四个取向中包含一个水平取向和一个垂直取向。
在一个实施例中,四个取向中任一取向与相邻取向之间的夹角均为45度。
在一个实施例中,四个取向分别为水平、垂直、向右上倾斜45度和向右下倾斜45度。
根据本发明的另一个方面,提供了一种非文本去除方法,所述非文本去除方法利用关于文本特性的连通域特征和关于连通域及其周围连通域关联信息的连通域特征,其特征在于,所述关于文本特性的连通域特征包括针对每个像素点使用如上所述的计算ESW的方法计算得到的ESW,以及连通域内ESW的方差;所述关于连通域及其周围连通域关联信息的连通域特征包括连通域的平均ESW,所述连通域的平均ESW是针对连通域中的每个像素点使用如上所述的计算ESW的方法计算得到的ESW的平均值。
在一个实施例中,所述关于文本特性的连通域特征还包括以下一项或更多项:外接矩形框的高宽比和前景像素面积在区域中的占有比例。
在一个实施例中,所述关于连通域及其周围连通域关联信息的连通域特征还包括以下一项或更多项:相邻域的外接矩形框之间的距离、区域的平均面积和区域的平均灰度。
根据本发明的另一个方面,提供了一种OCR方法,包括预处理步骤,所述预处理步骤包括:利用如上所述的方法进行非文本去除。
根据本发明的另一个方面,提供了一种计算估计笔画宽度ESW的装置,包括:获取单元,被配置为:根据二值化图像,获取笔画边缘信息;计算单元,被配置为:计算每个笔画边缘像素点在不少于四个取向上的笔画宽度,所述每个笔画边缘像素点在不少于四个取向上的笔画宽度是所述笔画边缘像素点到位于由所述笔画边缘像素点和所述取向决定的直线上的另一笔画边缘像素点的距离;关联单元,被配置为:将计算得到的每个笔画边缘像素点在不少于四个取向上的笔画宽度分别与经过该笔画边缘像素点并沿着该取向上的每个笔画内像素点相关联;以及选择单元,被配置为:针对每个笔画内像素点,选择与所述笔画内像素点相关联的多个笔画宽度的最小值作为所述笔画内像素点的估计笔画宽度ESW。
在一个实施例中,所述计算单元包括对于每个笔画边缘像素点,计算在不少于四个取向上的笔画宽度,所述关联单元包括将计算得到的所述不少于四个取向上的笔画宽度分别与沿着该取向上的每个笔画内像素点进行关联存储,并且所述选择单元包括针对每个笔画内像素点,选择与所述笔画内像素点关联存储的多个笔画宽度的最小值作为所述笔画内像素点的估计笔画宽度ESW。
在一个实施例中,所述计算单元包括对于所述不少于四个取向中的每个取向,计算在每个笔画边缘像素点处的笔画宽度,所述关联单元包括:对于沿着该取向上的未进行关联存储的笔画内像素点,将计算得到的笔画宽度与该笔画内像素点进行关联存储;对于沿着该取向上的已经进行关联存储的笔画内像素点,将计算得到的笔画宽度与在该笔画内像素点已经关联存储的值进行比较,如果所述笔画宽度小于与该笔画内像素点关联存储的值,则以所述笔画宽度覆盖与该笔画内像素点关联存储的值。
在一个实施例中,所述不少于四个取向的取向的数量为四。
在一个实施例中,所述不少于四个取向中包含一个水平取向和一个垂直取向。
在一个实施例中,四个取向中任一取向与相邻取向之间的夹角均为45度。
在一个实施例中,四个取向分别为水平、垂直、向右上倾斜45度和向右下倾斜45度。
根据本发明的另一个方面,提供了一种非文本去除器装置,包括如上所述计算ESW的装置,所述非文本去除器装置被配置为:利用关于文本特性的连通域特征和关于连通域及其周围连通域关联信息的连通域特征,其特征在于,所述关于文本特性的连通域特征包括针对每个像素点使用所述计算ESW的装置计算得到的ESW,以及连通域内ESW的方差;所述关于连通域及其周围连通域关联信息的连通域特征包括连通域的平均ESW,所述连通域的平均ESW是针对连通域中的每个像素点使用所述计算ESW的装置计算得到的ESW的平均值。
在一个实施例中,所述关于文本特性的连通域特征还包括以下一项 或更多项:外接矩形框的高宽比和前景像素面积在区域中的占有比例。
在一个实施例中,所述关于连通域及其周围连通域关联信息的连通域特征还包括以下一项或更多项:相邻域的外接矩形框之间的距离、区域的平均面积和区域的平均灰度。
根据本发明的另一个方面,提供了一种OCR系统,包括预处理设备,所述预处理设备包括如上所述的非文本去除器装置。
采用本发明,可以降低计算复杂度并节省处理器资源和计算时间,从而满足自然场景下实时OCR系统的要求。
附图说明
通过下文结合附图的详细描述,本发明的上述和其它特征将会变得更加明显,其中:
图1示出了实现现有技术中的SWT方法的示意图;
图2示出了实现现有技术中的SWT方法的流程图;
图3示出了根据本发明的ESW计算方法的流程图;
图4示出了根据本发明的ESW计算方法的一种实施方式的流程图;
图5示出了根据本发明的ESW计算方法的三种不同的取向方式;
图6示出了根据本发明的ESW计算方法的另一种实施方式的流程图;
图7示出了根据本发明的ESW计算方法的另一种实施方式的实现的示意图;
图8示出了根据本发明的OCR方法的流程图;
图9示出了根据本发明的经过OCR方法各步骤处理后的图像的效果图;
图10示出了根据本发明的估计笔画宽度ESW计算装置的框图;
图11示出了根据本发明的非文本去除器装置的框图;以及
图12示出了根据本发明的OCR系统的框图。
具体实施方式
以下将结合附图和具体实施例,对本发明所提出的简化估计笔画宽 度(ESW)文本检测方法进行详细阐述。应当注意,本发明不应局限于下文所述的具体实施例。另外,为了简便起见,省略了对与本发明没有直接关联的公知技术的详细描述,以防止对本发明的理解造成混淆。
下文以采用多个特定取向作为示例来计算笔画内每个像素点的笔画宽度,具体描述了根据本发明的多个实施例。然而,需要指出的是,本发明不限于以下实施例,而是可适用于更多其它的文本检测或光学字符识别OCR方法和系统。
以下将结合图3~7详细地描述ESW计算方法。
图3示出了ESW计算方法的流程图。首先,根据二值化图像,获取笔画边缘信息(步骤S310)。在步骤S320中,计算每个笔画边缘像素点在不少于四个取向上的笔画宽度。然后,在步骤S330中,将计算得到的每个笔画边缘像素点在不少于四个取向上的笔画宽度分别与经过该笔画边缘像素点并沿着该取向上的每个笔画内像素点相关联。最后,针对每个笔画内像素点,选择与所述笔画内像素点相关联的多个笔画宽度的最小值作为所述笔画内像素点的估计笔画宽度ESW(步骤S340)。
图4示出了ESW计算方法的一种实施方式的流程图。图5结合三种不同取向方式描述ESW计算方法的实现。图5(a)中示出了一个常用的日文字。图5(b)中示出了图5(a)中日文字的一个边缘点处的放大图。如图5(b)所示,选取水平(Ox)、垂直(Oy)、向右上倾斜45度(Os)、向右下倾斜45度(Ot)等四个取向作为本实施例中的测量取向。在步骤S410中,为了降低耗时,本发明不利用任何边缘检测器来计算边缘,而是仅使用从二值化步骤获得的边缘。在步骤S420中,对于每个笔画边缘像素点,计算在不少于四个取向上的笔画宽度。即,如图4(b)所示,对于笔画边缘像素点O,计算Ox或Oy、Os、Ot的距离。在本发明中,每一个取向上的笔画宽度是该笔画边缘像素点到位于由该笔画边缘像素点和该取向决定的直线上的另一笔画边缘像素点的距离。然后,在步骤S430中,将计算得到的所述不少于四个取向上的笔画宽度分别与沿着该取向上的每个笔画内像素点进行关联存储。在本实施例中,假设沿Os取向上的笔画宽度为10,则将数值10与沿着对应取向上的笔画内的每个像素点(包括Os取向上的点m)关联存储。如果存在其他笔画边缘像素点,则计算对于其他笔画边缘像素点在不少于四 个取向上的笔画宽度。如果不存在其他笔画边缘像素点,则在步骤S440中,针对每个笔画内像素点,选择与所述笔画内像素点关联存储的多个笔画宽度的最小值作为所述笔画内像素点的估计笔画宽度ESW。例如,在图5(c)的笔画中,分别与笔画内的Q点关联存储了Qx、Qy、Qt、Qs四个取向上的四个笔画宽度,则笔画内的对于Q点的估计笔画宽度ESW为Qx、Qy、Qt、Qs取向上的笔画宽度的最小值,即Qt取向上的笔画宽度。
与传统SWT中的向沿着梯度取向的每个像素点分配其当前值中的最小值并计算笔画宽度相比,本实施例中的ESW算法仅需要3次比较且四个取向上相邻像素的坐标计算都非常简单(相邻像素的x轴和y轴的坐标与前一像素相比,或者是相同或者是相差1)。并且对于中文、日文汉字和韩文等来说,有很多横竖的笔划,该方案中的竖直和水平恰好是它们的切线取向,使得计算最准确。
图5(d)示出了与图5(b)中的四种取向相关的另一种取向方式。具体而言,图5(d)中的四种取向是将图5(c)中的四种取向进行一定角度((0,90°))旋转得到的。因为该四个取向的两两夹角都是45°,所以该四个取向对整个360°方向是均匀分布的,但是因为这4个取向中不包括水平和垂直取向,所以图5(d)所示的ESW的计算量比图5(b)的ESW的计算量大。
图5(e)示出了四个取向不是均匀分布的实施例。该实施例中的方案是在对各取向的覆盖程度上不均匀分布的方案。如果该方案的其中两个取向取竖直和水平,则恰好是横竖笔划的切线取向,此时该方案同样适用于有很多横竖笔划的中文、日文汉字和韩文等。
以上通过几种不同的测量取向,描述了本发明所提出的简化估计笔画宽度(ESW)文本检测方案。应该理解,上述实施例仅示出了四个取向的ESW方案,但本发明同样可应用于多于四个取向的情况。此外,ESW测量边缘像素点沿多个预定取向的距离作为笔画宽度,可以降低计算复杂度并节省处理器资源和计算时间。
图6示出了ESW计算方法的另一种实施方式的流程图。图7描述了与ESW计算方法的该另一种实施方式相对应的实现。与图4中所描述的实施方式中优先对笔画边缘点进行扫描不同,本实施方式优先对特 定固定取向进行扫描。在步骤S610中,与步骤S410类似地使用从二值化步骤获得的边缘,此处不再赘述。在步骤S620中,对于所述不少于四个取向中的每个取向,计算在每个笔画边缘像素点处的笔画宽度。即,在指定的固定取向上对笔画边缘处的所有像素点进行扫描,计算每个像素在该固定取向上的笔画的宽度。如图7所示,对于指定的固定取向(如向右上倾斜45度),对笔画边缘处的所有像素点进行扫描,计算每个像素点在该固定取向上的笔画的宽度(即Os的距离)。然后,在步骤S630中,对于沿着该取向上的未进行关联存储的笔画内像素点,将计算得到的笔画宽度与该笔画内像素点进行关联存储;对于沿着该取向上的已经进行关联存储的笔画内像素点,将计算得到的笔画宽度与在该笔画内像素点已经关联存储的值进行比较,如果所述笔画宽度小于与该笔画内像素点关联存储的值,则以所述笔画宽度覆盖与该笔画内像素点关联存储的值。在本实施例中,假设沿Os取向上的笔画宽度为10,如果是第一次扫描,则将该笔画宽度与该像素点关联存储;否则,将该笔画宽度与该像素点(例如:Os取向上的点m)关联存储值进行比较,如果该笔画宽度(例如10)小于该像素点关联存储值,则把该像素点关联存储值改为10。在步骤S635中,判断是否还要进行下一个固定取向的扫描,如果是,则返回步骤S620,重复上述过程。如果不存在其他需要扫描的固定取向,则在步骤S640中,针对每个笔画内像素点,所述笔画内像素点关联存储值就是所述四个取向上的的多个笔画宽度的最小值,并把这个最小值作为所述笔画内像素点的估计笔画宽度ESW。
图8给出了可以应用本发明的ESW方案的OCR方法的流程图。图9中示出了经过OCR方法各步骤处理后的图像的效果图。以下结合图8和图9详细描述OCR方法的原理。
从图8中可以看到,OCR方法分为两个主要步骤:预处理S810和OCR引擎S820。首先,将自然场景图像(如图9中图像901所示)进行图像预处理S810。在图像预处理步骤S810中,为了适用于实际应用,本发明根据子图像区域中的图像对比度使用二值化局部阈值。当输入图像的前景和背景之间存在较小差异时,增强图像的对比度。如果前景的颜色比背景颜色亮,则在二值化之前将反转该子图像区域的灰度图像。
在步骤S811中,将所获得的灰度图像进行二值化处理,按照一定的 规则将灰度图像中的每个像素点划分为两类,即0和1。
在步骤S812中,本发明使用两种不同连通域特征来移除非文本区域。一种连通域特征是文本特征;另一种连通域特征是连通域与其周围连通域的关联特征。为了匹配自然场景,采用简化估计笔画宽度(ESW)特征作为文本特征之一。
一个字符总是包括具有相似宽度的笔画,并且笔画边界近乎平滑。可以使用关于文本特征的连通域(CC)特征来识别一个连通域。关于文本特征的连通域特征包括(但不限于)以下一项或更多项:
(1)排除太大或太小的连通域。
(2)外接矩形框的比例:宽度(w)与高度(h)比。通过该特征可以排除诸如瘦长等的不符合要求的连同域(如电线杆等长宽比较大的区域)。
(3)前景像素面积在区域中的占有比例。一般来说,文字区域的面积总是小于区域中的背景面积。
(4)每个像素点的估计笔画宽度(ESW)。文本区域的连通域中所有ESW的数值波动不大,即方差比较小。计算每个连通候选区域笔画宽度的方差,对于方差太大的情况加以排除。通过该特征可以排除如窗户、树叶这样的区域。优选地,最大差值的阈值可以设定为该连通区域估计笔画宽度的平均值的一半。
文本被认为是以成组的形式出现的,一组文本常有相似之处,如:笔画宽度、字符宽度、高度、字符间距等,并且前后字符之间的间隔近乎相等。可以使用关于一个连通域与其外围连通域的关系的连通域特征来识别一组文本。关于一个连通域与其外围连通域的关系的连通域特征包括(但不限于)以下一项或更多项:
(1)连通域的平均ESW。一组文本中的字符总是包括具有相似宽度的笔画,因此每个连通域的ESW均值与它们周围的连通域的ESW均值近似相等。优选地,邻近连通域的ESW均值之比小于2.0;
(2)区域的平均灰度。一组文本中的字符总是相互之间有近似的灰度分布。
(3)将单独的字母(汉字或偏旁)连通域当作噪声从图像中剔除,因为图像中通常不会出现单独的字母(汉字或偏旁),而是以单词或汉字 词组的形式出现。
(4)候选字符的外接矩形框的平均区域面积。一组文本中的每个字符元素(可能是字母或汉字的偏旁)的面积不可能相差很大。
在步骤S813中,将从步骤S812中保留的可能是文本的若干相邻的连通域组合成成候选字符:对于拉丁语系来说,一个字母就是一个字符,而对于中文汉字来说,各字符元素可以根据上下、左右、包围等结构组合成一个汉字字符。然后考虑以下一项或更多项规则:
(1)相邻候选字符的外接矩形框之间的距离。相邻候选字符的外接矩形框之间的间隔应该是近乎相等的。优选地,相邻候选字符的外接矩形框之间的间隔不超过宽字符的三倍。
(2)候选字符的外接矩形框的平均区域面积。一组文本中的每个字符的面积大致相等。
(3)候选字符的外接矩形框的平均高度。一组文本中的每个字符具有大致相等的高度。
基于以上特征,可以对满足条件的连通域字符进行聚类,形成文本行(或列)。
与笔画宽度变换(SWT)特征相比,本发明的方法可以更高速地计算ESW特征,因此其更适用作自然场景的实时OCR系统之前的预处理系统。
将在步骤S813中检测出的文本区域(如图9中图像902所示)输出到OCR引擎处理S820。最终从OCR引擎步骤S820获得字符识别结果并输出到应用的用户界面(如图9中图像903所示)。
图10示出了根据本发明的估计笔画宽度ESW计算装置1000的框图。该计算估计笔画宽度ESW的装置1000包括:获取单元1010,被配置为:根据二值化图像,获取笔画边缘信息;计算单元1020,被配置为:计算每个笔画边缘像素点在不少于四个取向上的笔画宽度,所述每个笔画边缘像素点在不少于四个取向上的笔画宽度是所述笔画边缘像素点到位于由所述笔画边缘像素点和所述取向决定的直线上的另一笔画边缘像素点的距离;关联单元1030,被配置为:将计算得到的每个笔画边缘像素点在不少于四个取向上的笔画宽度分别与经过该笔画边缘像素点并沿着该取向上的每个笔画内像素点相关联;选择单元1040,被配置为:针 对每个笔画内像素点,选择与所述笔画内像素点相关联的多个笔画宽度的最小值作为所述笔画内像素点的估计笔画宽度ESW。
图11示出了根据本发明的非文本去除器装置的框图。非文本去除器装置1100包括如上所述计算ESW的装置1110,所述非文本去除器装置1110被配置为:利用关于文本特性的连通域特征和关于连通域及其周围连通域关联信息的连通域特征,其特征在于,所述关于文本特性的连通域特征包括针对每个像素点使用所述计算ESW的装置计算得到的ESW,所述关于连通域及其周围连通域关联信息的连通域特征包括连通域的平均ESW,所述连通域的平均ESW是针对连通域中的每个像素点使用所述计算ESW的装置计算得到的ESW的平均值。
图12示出了根据本发明的OCR系统的框图。从图12中可以看到,系统包含两个主要设备:预处理设备1210和OCR引擎设备1220。首先,将自然场景图像(如图9中图像901所示)通过图像预处理设备1210进行图像预处理。在图像预处理设备1210中,为了实时应用,本发明的实施例根据子图像区域中的图像对比度使用二值化局部阈值。当输入图像的前景和背景之间存在较小差异时,增强图像的对比度。如果前景的颜色比背景颜色亮,则在二值化之前将反转该子图像区域的灰度图像。
在二值化处理装置1211中,将所获得的灰度图像进行二值化处理,按照一定的规则将灰度图像中的每个像素点划分为两类,即0和1。
在非文本去除器装置1212中,本发明使用两种不同方面的连通域特征来移除非文本区域。一个连通域特征是文本特征;另一连通域特征是连通域与其外围关系的特征。为了匹配自然场景,采用上述简化估计笔画宽度(ESW)特征作为文本特征之一。
在文本检测装置1213中,将从装置1212中保留的可能是文本的若干相邻的连通域组合成成候选字符,根据步骤S813中提到的规则对连通域字符进行聚类,形成文本行(或列),并将结果(如图9中图像902所示)输出到OCR引擎设备1220。最终从OCR引擎设备1220获得字符识别结果并输出到应用的用户界面(如图9中图像903所示)。
本申请实现了一种新的简化估计笔画宽度(ESW)文本检测方案。ESW测量边缘像素点沿多个预定取向的距离作为笔画宽度,可以降低计算复杂度并节省处理器资源和计算时间,更适用作自然场景的实时OCR 系统之前的预处理系统。。
应该理解,本发明的上述实施例仅示出了四个取向的ESW方案,但本发明同样可应用于多于四个取向的情况。本发明的方案可以在移动电话、PDA、台式计算机、笔记本电脑、平板电脑和通常都能支持文本检测或光学字符识别(OCR)的其他电子设备上实现。本发明的方案可以通过软件、硬件或者软件和硬件两者的结合来实现。例如,上述实施例中的设备内部的各种组件可以通过多种器件来实现,这些器件包括但不限于:模拟电路器件、数字电路器件、数字信号处理(DSP)电路、可编程处理器、专用集成电路(ASIC)、现场可编程门阵列(FPGA)、可编程逻辑器件(CPLD),等等。
尽管以上已经结合本发明的优选实施例示出了本发明,但是本领域的技术人员将会理解,在不脱离本发明的精神和范围的情况下,可以对本发明进行各种修改、替换和改变。因此,本发明不应由上述实施例来限定,而应由所附权利要求及其等价物来限定。

Claims (22)

  1. 一种计算估计笔画宽度ESW的方法,包括以下步骤:
    根据二值化图像,获取笔画边缘信息;
    计算每个笔画边缘像素点在不少于四个取向上的笔画宽度,所述每个笔画边缘像素点在不少于四个取向上的笔画宽度是所述笔画边缘像素点到位于由所述笔画边缘像素点和所述取向决定的直线上的另一笔画边缘像素点的距离;
    将计算得到的每个笔画边缘像素点在不少于四个取向上的笔画宽度分别与经过该笔画边缘像素点并沿着该取向上的每个笔画内像素点相关联;以及
    针对每个笔画内像素点,选择与所述笔画内像素点相关联的多个笔画宽度的最小值作为所述笔画内像素点的估计笔画宽度ESW。
  2. 根据权利要求1所述的计算ESW的方法,其中,所述计算步骤包括对于每个笔画边缘像素点,计算在不少于四个取向上的笔画宽度,所述关联步骤包括将计算得到的所述不少于四个取向上的笔画宽度分别与沿着该取向上的每个笔画内像素点进行关联存储,并且所述选择步骤包括针对每个笔画内像素点,选择与所述笔画内像素点关联存储的多个笔画宽度的最小值作为所述笔画内像素点的估计笔画宽度ESW。
  3. 根据权利要求1所述的计算ESW的方法,其中,所述计算步骤包括对于所述不少于四个取向中的每个取向,计算在每个笔画边缘像素点处的笔画宽度,所述关联步骤包括:对于沿着该取向上的未进行关联存储的笔画内像素点,将计算得到的笔画宽度与该笔画内像素点进行关联存储;对于沿着该取向上的已经进行关联存储的笔画内像素点,将计算得到的笔画宽度与在该笔画内像素点已经关联存储的值进行比较,如果所述笔画宽度小于与该笔画内像素点关联存储的值,则以所述笔画宽度覆盖与该笔画内像素点关联存储的值。
  4. 根据权利要求1~3中任一项所述的计算ESW的方法,其中,所述不少于四个取向的取向的数量为四。
  5. 根据权利要求1~3中任一项所述的计算ESW的方法,其中,所述不 少于四个取向中包含一个水平取向和一个垂直取向。
  6. 根据权利要求4所述的计算ESW的方法,其中,四个取向中任一取向与相邻取向之间的夹角均为45度。
  7. 根据权利要求4所述的计算ESW的方法,其中,四个取向分别为水平、垂直、向右上倾斜45度和向右下倾斜45度。
  8. 一种非文本去除方法,所述非文本去除方法利用关于文本特性的连通域特征和关于连通域及其周围连通域关联信息的连通域特征,其特征在于,所述关于文本特性的连通域特征包括针对每个像素点使用如权利要求1~7中任一项所述的计算ESW的方法计算得到的ESW,以及连通域内ESW的方差;所述关于连通域及其周围连通域关联信息的连通域特征包括连通域的平均ESW,所述连通域的平均ESW是针对连通域中的每个像素点使用如权利要求1~7中任一项所述的计算ESW的方法计算得到的ESW的平均值。
  9. 根据权利要求8所述的非文本去除方法,其中,所述关于文本特性的连通域特征还包括以下一项或更多项:外接矩形框的高宽比和前景像素面积在区域中的占有比例。
  10. 根据权利要求8或9所述的非文本去除方法,其中,所述关于连通域及其周围连通域关联信息的连通域特征还包括以下一项或更多项:相邻域的外接矩形框之间的距离、区域的平均面积和区域的平均灰度。
  11. 一种OCR方法,包括预处理步骤,所述预处理步骤包括:利用如权利要求8~10中任一项所述的方法进行非文本去除。
  12. 一种计算估计笔画宽度ESW的装置,包括:
    获取单元,被配置为:根据二值化图像,获取笔画边缘信息;
    计算单元,被配置为:计算每个笔画边缘像素点在不少于四个取向上的笔画宽度,所述每个笔画边缘像素点在不少于四个取向上的笔画宽度是所述笔画边缘像素点到位于由所述笔画边缘像素点和所述取向决定的直线上的另一笔画边缘像素点的距离;
    关联单元,被配置为:将计算得到的每个笔画边缘像素点在不少于四个取向上的笔画宽度分别与经过该笔画边缘像素点并沿着该取向上的每个笔画内像素点相关联;以及
    选择单元,被配置为:针对每个笔画内像素点,选择与所述笔画内像素点相关联的多个笔画宽度的最小值作为所述笔画内像素点的估计笔画宽度ESW。
  13. 根据权利要求12所述的计算ESW的装置,其中,所述计算单元包括对于每个笔画边缘像素点,计算在不少于四个取向上的笔画宽度,所述关联单元包括将计算得到的所述不少于四个取向上的笔画宽度分别与沿着该取向上的每个笔画内像素点进行关联存储,并且所述选择单元包括针对每个笔画内像素点,选择与所述笔画内像素点关联存储的多个笔画宽度的最小值作为所述笔画内像素点的估计笔画宽度ESW。
  14. 根据权利要求12所述的计算ESW的装置,其中,所述计算单元包括对于所述不少于四个取向中的每个取向,计算在每个笔画边缘像素点处的笔画宽度,所述关联单元包括:对于沿着该取向上的未进行关联存储的笔画内像素点,将计算得到的笔画宽度与该笔画内像素点进行关联存储;对于沿着该取向上的已经进行关联存储的笔画内像素点,将计算得到的笔画宽度与在该笔画内像素点已经关联存储的值进行比较,如果所述笔画宽度小于与该笔画内像素点关联存储的值,则以所述笔画宽度覆盖与该笔画内像素点关联存储的值。
  15. 根据权利要求12~14中任一项所述的计算ESW的装置,其中,所述不少于四个取向的取向的数量为四。
  16. 根据权利要求12~14中任一项所述的计算ESW的装置,其中,所述不少于四个取向中包含一个水平取向和一个垂直取向。
  17. 根据权利要求15所述的计算ESW的装置,其中,四个取向中任一取向与相邻取向之间的夹角均为45度。
  18. 根据权利要求15所述的计算ESW的装置,其中,四个取向分别为水平、垂直、向右上倾斜45度和向右下倾斜45度。
  19. 一种非文本去除器装置,包括如权利要求12~18中任一项所述的计算ESW的装置,所述非文本去除器装置被配置为:利用关于文本特性的连通域特征和关于连通域及其周围连通域关联信息的连通域特征,其特征在于,所述关于文本特性的连通域特征包括针对每个像素点使用所述计算ESW的装置计算得到的ESW,以及连通域内ESW的方差;所述关于连通 域及其周围连通域关联信息的连通域特征包括连通域的平均ESW,所述连通域的平均ESW是针对连通域中的每个像素点使用所述计算ESW的装置计算得到的ESW的平均值。
  20. 根据权利要求19所述的非文本去除器装置,其中,所述关于文本特性的连通域特征还包括以下一项或更多项:外接矩形框的高宽比和前景像素面积在区域中的占有比例。
  21. 根据权利要求19或20所述的非文本去除器装置,其中,所述关于连通域及其周围连通域关联信息的连通域特征还包括以下一项或更多项:相邻域的外接矩形框之间的距离、区域的平均面积和区域的平均灰度。
  22. 一种OCR系统,包括预处理设备,所述预处理设备包括如权利要求19~21中任一项所述的非文本去除器装置。
PCT/CN2015/096305 2014-12-03 2015-12-03 一种文本检测的方法和装置 WO2016086877A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2017528527A JP2017535891A (ja) 2014-12-03 2015-12-03 テキストを検出する方法およびその装置

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410724574.3A CN105718926A (zh) 2014-12-03 2014-12-03 一种文本检测的方法和装置
CN201410724574.3 2014-12-03

Publications (1)

Publication Number Publication Date
WO2016086877A1 true WO2016086877A1 (zh) 2016-06-09

Family

ID=56091036

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/096305 WO2016086877A1 (zh) 2014-12-03 2015-12-03 一种文本检测的方法和装置

Country Status (3)

Country Link
JP (1) JP2017535891A (zh)
CN (1) CN105718926A (zh)
WO (1) WO2016086877A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107563384A (zh) * 2017-08-31 2018-01-09 江苏大学 基于广义Hough聚类的粘连猪的头尾识别方法
CN111325199A (zh) * 2018-12-14 2020-06-23 中移(杭州)信息技术有限公司 一种文字倾斜角度检测方法及装置
CN111709419A (zh) * 2020-06-10 2020-09-25 中国工商银行股份有限公司 一种纸币冠字号的定位方法、系统、设备及可读存储介质
CN115497109A (zh) * 2022-11-17 2022-12-20 山东思玛特教育科技有限公司 基于智能翻译的文字图像预处理方法

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108345883B (zh) * 2017-01-23 2023-11-28 利得技术公司 用于确定文本的旋转角度的装置、方法和计算机可读存储介质
CN108573251B (zh) * 2017-03-15 2021-09-07 北京京东尚科信息技术有限公司 文字区域定位方法和装置
CN116343242B (zh) * 2023-05-30 2023-08-11 山东一品文化传媒有限公司 基于图像数据的试题实时批阅方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101256630A (zh) * 2007-02-26 2008-09-03 富士通株式会社 用于改善文档图像二值化性能的去噪声装置和方法
CN102663383A (zh) * 2012-04-26 2012-09-12 北京科技大学 一种定位自然场景图像中文本的方法
CN102782706A (zh) * 2010-03-10 2012-11-14 微软公司 经历光学字符识别的文本图像的文本增强
CN103077389A (zh) * 2013-01-07 2013-05-01 华中科技大学 一种结合字符级分类和字符串级分类的文本检测和识别方法
WO2014014686A1 (en) * 2012-07-19 2014-01-23 Qualcomm Incorporated Parameter selection and coarse localization of regions of interest for mser|processing

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0968962A (ja) * 1995-08-30 1997-03-11 Toshiba Corp 文字パターン描画方法及び文字出力装置
US8917935B2 (en) * 2008-05-19 2014-12-23 Microsoft Corporation Detecting text using stroke width based text detection

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101256630A (zh) * 2007-02-26 2008-09-03 富士通株式会社 用于改善文档图像二值化性能的去噪声装置和方法
CN102782706A (zh) * 2010-03-10 2012-11-14 微软公司 经历光学字符识别的文本图像的文本增强
CN102663383A (zh) * 2012-04-26 2012-09-12 北京科技大学 一种定位自然场景图像中文本的方法
WO2014014686A1 (en) * 2012-07-19 2014-01-23 Qualcomm Incorporated Parameter selection and coarse localization of regions of interest for mser|processing
CN103077389A (zh) * 2013-01-07 2013-05-01 华中科技大学 一种结合字符级分类和字符串级分类的文本检测和识别方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
EPSHTEIN, B. ET AL.: "Detecting Text In Natural Scenes with Stroke Width Transform", 2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR, 18 June 2010 (2010-06-18), pages 2965 - 2967, ISSN: 1063-6919 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107563384A (zh) * 2017-08-31 2018-01-09 江苏大学 基于广义Hough聚类的粘连猪的头尾识别方法
CN107563384B (zh) * 2017-08-31 2020-02-21 江苏大学 基于广义Hough聚类的粘连猪的头尾识别方法
CN111325199A (zh) * 2018-12-14 2020-06-23 中移(杭州)信息技术有限公司 一种文字倾斜角度检测方法及装置
CN111325199B (zh) * 2018-12-14 2023-10-27 中移(杭州)信息技术有限公司 一种文字倾斜角度检测方法及装置
CN111709419A (zh) * 2020-06-10 2020-09-25 中国工商银行股份有限公司 一种纸币冠字号的定位方法、系统、设备及可读存储介质
CN115497109A (zh) * 2022-11-17 2022-12-20 山东思玛特教育科技有限公司 基于智能翻译的文字图像预处理方法
CN115497109B (zh) * 2022-11-17 2023-03-24 山东思玛特教育科技有限公司 基于智能翻译的文字图像预处理方法

Also Published As

Publication number Publication date
JP2017535891A (ja) 2017-11-30
CN105718926A (zh) 2016-06-29

Similar Documents

Publication Publication Date Title
WO2016086877A1 (zh) 一种文本检测的方法和装置
CN110717489B (zh) Osd的文字区域的识别方法、装置及存储介质
KR101690981B1 (ko) 형태 인식 방법 및 디바이스
Zhang et al. Image segmentation based on 2D Otsu method with histogram analysis
Liu et al. Robustly extracting captions in videos based on stroke-like edges and spatio-temporal analysis
Fabrizio et al. Text segmentation in natural scenes using toggle-mapping
Moradi et al. Farsi/Arabic text extraction from video images by corner detection
JP2014525626A (ja) 画像領域を使用するテキスト検出
US9245194B2 (en) Efficient line detection method
Huang et al. Automatic detection and localization of natural scene text in video
CN109948521B (zh) 图像纠偏方法和装置、设备及存储介质
Liang et al. A new wavelet-Laplacian method for arbitrarily-oriented character segmentation in video text lines
Kavitha et al. A new watershed model based system for character segmentation in degraded text lines
Dwaich et al. Signature texture features extraction using GLCM approach in android studio
CN100397400C (zh) 图形检索的方法
CN108764343B (zh) 一种跟踪算法中的跟踪目标框的定位方法
Ali et al. A novel approach to correction of a skew at document level using an Arabic script
Wu et al. Text detection using delaunay triangulation in video sequence
JP2017211976A (ja) 画像処理装置及び画像処理プログラム
Kumar et al. An efficient algorithm for text localization and extraction in complex video text images
Aghajari et al. A text localization algorithm in color image via new projection profile
Vasilopoulos et al. Unified layout analysis and text localization framework
RU2697737C2 (ru) Способ обнаружения и локализации текстовых форм на изображениях
Shahzad et al. Oriental-script text detection and extraction in videos
JPH05174182A (ja) 文書傾き角検出方法および文書傾き角検出装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15866128

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2017528527

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15866128

Country of ref document: EP

Kind code of ref document: A1