WO2017148282A1 - 文本检测方法和设备 - Google Patents

文本检测方法和设备 Download PDF

Info

Publication number
WO2017148282A1
WO2017148282A1 PCT/CN2017/073939 CN2017073939W WO2017148282A1 WO 2017148282 A1 WO2017148282 A1 WO 2017148282A1 CN 2017073939 W CN2017073939 W CN 2017073939W WO 2017148282 A1 WO2017148282 A1 WO 2017148282A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
vertical
horizontal
candidate
row
Prior art date
Application number
PCT/CN2017/073939
Other languages
English (en)
French (fr)
Inventor
张庆久
乐宁
吴波
江淑红
Original Assignee
夏普株式会社
张庆久
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 夏普株式会社, 张庆久 filed Critical 夏普株式会社
Publication of WO2017148282A1 publication Critical patent/WO2017148282A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • G06V10/225Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition based on a marking or identifier characterising the area
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering

Definitions

  • the present invention relates to text detection techniques and, more particularly, to a method and apparatus for detecting text from natural scene images, capable of supporting multiple languages, and capable of detecting horizontal lines and vertical lines.
  • Chinese patent application 201410334436.4 proposes a Chinese text positioning device which can extract text from natural scene images. Among them, the image is binarized by the MSER method of the maximum stable extremum region, and the text is detected according to the characteristics of the Chinese characters. However, the extracted text is limited to Chinese characters.
  • the present disclosure proposes a text detection method and apparatus capable of supporting multiple languages and capable of detecting horizontal lines and vertical lines.
  • a text detection method comprising: binarizing an image to be detected to obtain a binarized image and extracting a connected domain to obtain a feature of the connected domain; and combining the extracted connected domains To detect horizontal and vertical lines; and to filter the detected results to eliminate noise.
  • the image to be detected is binarized by the maximum stable extreme value region MSER method.
  • the characteristics of the connected domain include at least one of: an outer rectangle; a foreground area; a ratio of a foreground area to an area of an outer rectangle; a stroke thickness; and a color of the connected domain.
  • the method further comprises: removing the connected domain having the feature that is not significantly belonging to the text from the extracted connected domain.
  • detecting the horizontal line and the vertical line comprises: detecting the horizontal line first, and then detecting the vertical line.
  • detecting the horizontal line comprises: combining adjacent connected domains whose horizontal distance is smaller than the first threshold into one candidate horizontal sub-row according to characteristics of the connected domain; and arranging the horizontal distances less than the second threshold according to the second threshold
  • the candidate horizontal sub-rows are combined into one candidate horizontal row; the number of connected domains in the candidate horizontal row is greater than 2 as the horizontal row, and the remaining rows are used as the vertical row candidate.
  • detecting the vertical line comprises: combining adjacent vertical line candidates having a vertical distance smaller than the third threshold into one candidate vertical sub-line; and according to the fourth threshold, the adjacent candidate vertical sub-section having a vertical distance smaller than the fourth threshold The rows are combined into one candidate vertical row; the row having the number of connected domains in the candidate vertical row greater than or equal to 3 is taken as a vertical row.
  • filtering for the detected result to eliminate noise comprises: identifying a line having a preset noise characteristic present in the detected result according to the preset noise characteristic, and removing the identified line from the result .
  • a text detecting apparatus comprising: a text extracting module configured to binarize an image to be detected to obtain a binarized image and extract a connected domain to obtain a connected domain characteristic a row detection module configured to combine the extracted connected domains to detect horizontal and vertical rows; and a post-processing module configured to filter the detected results to eliminate noise.
  • the text detection method and apparatus according to the embodiments of the present invention improve the performance of text detection in various aspects, including at least:
  • the text line can be positioned with high precision.
  • FIG. 1 is a schematic block diagram showing a text detecting apparatus according to an embodiment of the present invention.
  • Figure 2 shows an example image to be detected.
  • FIG. 3 shows the binarization result and the connected domain of the image to be detected shown in FIG. 2.
  • FIG. 4 shows the result of removing the connected domain of the feature that is not significantly belonging to the text in the binarization result shown in FIG. 3.
  • FIG. 5 shows the line detection result of the image to be detected shown in FIG. 2.
  • Fig. 6 shows another example of an image to be detected and a line detection result.
  • FIG. 7 shows a flow chart of a text detection method in accordance with an embodiment of the present invention.
  • FIG. 8 shows an application example using a text detecting method according to an embodiment of the present invention.
  • FIG. 1 is a schematic block diagram showing a text detecting apparatus 100 according to an embodiment of the present invention.
  • the text detecting apparatus 100 includes: an input module 110 configured to input an image to be detected; and a text extracting module 120 configured to perform binarization on the image to be detected to obtain a binarized image and extract the connected domain to obtain connectivity a feature of the domain; a row detection module 130 configured to combine the extracted connected domains to detect horizontal and vertical rows;
  • the processing module 140 is configured to filter the detected results to eliminate noise.
  • the text detecting device 100 can be implemented on, for example, a smart phone, a tablet, a notebook, or other handheld electronic device.
  • the input module 110 is used to input an image to be detected.
  • the input module 110 may be a camera on a smart phone for taking a natural scene image as an image to be detected.
  • the input module 110 may be a communication module on a notebook for receiving an image to be detected from the outside.
  • Figure 2 shows an example image to be detected.
  • the text extraction module 120 is configured to perform text extraction by extracting connected domains from the image to be detected.
  • the text extraction module 120 is configured to binarize the image to be detected by the maximum stable extremum region MSER method to obtain a binarized image.
  • the connected domain is then extracted from the image and the features of the connected domain are obtained.
  • FIG. 3 shows the binarization result and the connected domain of the image to be detected shown in FIG. 2.
  • the characteristics of the connected domain include at least one of the following: an outer rectangle; a foreground area; a ratio of the foreground area to the area of the outer rectangle; a stroke thickness; and a color of the connected domain.
  • the outer rectangle is the smallest rectangular area that can enclose a connected domain.
  • the foreground area is the area of a connected domain.
  • the stroke thickness represents the thickness of the strokes in the connected domain.
  • the color of the connected domain indicates the color of the connected domain in the original image.
  • the text detection module 120 is further configured to remove connected domains having features that are not significantly of text from the extracted connected domains.
  • FIG. 4 shows the result of removing the connected domain of the feature that is not significantly belonging to the text in the binarization result shown in FIG. 3.
  • the features of the text may be pre-set to filter the extracted connected domains after being extracted to the connected domain.
  • a camera as an input module can input information of the captured image, and the text detection module can filter the extracted connected domain according to the information of the image.
  • the information of the image includes, for example, the number of pixels of the image, the width, height, and the like of the image.
  • Row detection module 130 is configured to combine the extracted connected domains to detect horizontal and vertical rows. Horizontal and vertical lines can be processed separately. In the real world, the probability of a horizontal line appearing is much higher than the probability of a vertical line appearing. Therefore, horizontal lines can be detected first to ensure that horizontal line detection has a higher priority than vertical line detection.
  • the algorithm is as follows. Combining adjacent connected domains whose horizontal distance is smaller than the first threshold into one candidate horizontal sub-row according to characteristics of the connected domain; and combining adjacent candidate horizontal sub-rows whose horizontal distance is smaller than the second threshold into one according to the second threshold Candidate horizontal rows; rows with more than 2 connected domains in the candidate horizontal rows are used as horizontal rows, and the remaining rows are treated as vertical rows.
  • all connected domains can be combined into a group.
  • the combination method is based on the horizontal positional relationship of the connected domain and other features, such as stroke thickness, stroke color, and the like. Only adjacent connected domains that are close in horizontal distance are combined into the same group. For example, according to the characteristics of the connected domain, assuming that the average size of the outer rectangle of the connected domain is 10*10, the first threshold may be set to 5, and the adjacent connected domains whose horizontal distance is smaller than the first threshold are combined to the same In the group, as a candidate horizontal subrow. It is assumed that CH group1 is the result of the combination, where the group can have one or more connected domains. Then, CH group1 is combined again according to the larger horizontal distance.
  • the second threshold may be set to 10, and the adjacent candidate horizontal sub-rows whose horizontal distance is smaller than the second threshold may be combined into one candidate horizontal row.
  • CH group2 Since some Asian language characters having left and right portions, it is possible to communicate only the number of fields in each CH group2 greater than 2 is CH group2 selected horizontal line. Thus, CH group2 L hor divided into two portions and C rest, L hor is detected horizontal line, C rest vertical lines as candidates, the vertical line detecting participation.
  • the algorithm is as follows. Combining adjacent vertical line candidates having a vertical distance smaller than a third threshold into one candidate vertical sub-line; combining adjacent candidate vertical sub-rows whose vertical distance is smaller than the fourth threshold into one candidate vertical line according to the fourth threshold; A row in which the number of connected domains in the candidate vertical rows is greater than or equal to 3 is taken as a vertical row.
  • C rest is combined according to the vertical positional relationship. Only vertical line candidates with close vertical distances are combined into the same group.
  • the third threshold may be set to 5, and the adjacent vertical row candidates having a vertical distance smaller than the third threshold are combined to In the same group, as a candidate vertical subrow.
  • CV group1 is the combined result.
  • CV group1 is combined again according to the larger vertical distance.
  • the fourth threshold may be set to 10, and the adjacent candidate vertical sub-rows whose vertical distance is smaller than the fourth threshold may be combined into one candidate vertical row.
  • FIG. 5 shows the line detection result of the image to be detected shown in FIG. 2.
  • Fig. 6 shows another example of an image to be detected and a line detection result, in which Fig. 6(a) shows an image to be detected, and Fig. 6(b) shows a line detection result.
  • Post-processing module 140 is configured to filter the detected results to improve the accuracy of the detection.
  • some noise lines may be extracted because the text detecting device according to an embodiment of the present invention is not limited to a specific voice type.
  • bricks on a wall may be recognized as lines of text.
  • noise can be filtered out by the following steps: 1) extracting features of the line, including the average size of the characters, the average fill ratio of the foreground area to the area of the outer rectangle, and the like. 2) Identify the noise based on the line characteristics and then remove the noise from the results.
  • noise characteristics can be preset.
  • features of noise objects such as windows, walls, book pages, etc. that may be identified may be preset.
  • a row having a preset noise characteristic present in the detected result is identified based on the preset noise characteristic, and the identified row is removed from the result.
  • FIG. 1 also shows that the text detecting apparatus 100 according to an embodiment of the present invention further includes a display 150 for displaying a text detection result.
  • FIG. 7 shows a flow diagram of a text detection method 700 in accordance with an embodiment of the present invention.
  • a text detecting method according to an embodiment of the present invention is applied to an electronic device capable of performing a text device on an image to be detected on the electronic device.
  • the text recognition method according to an embodiment of the present invention is activated when it is necessary to recognize a text line.
  • the image to be detected is binarized to obtain a binarized image and the connected domain is extracted to obtain a feature of the connected domain.
  • the extracted connected domains are combined to detect horizontal lines and vertical lines.
  • filtering is performed on the detected result to eliminate noise.
  • the feature of the connected domain obtained in step S710 includes at least one of the following: an outer rectangle; a foreground area; a ratio of the foreground area to the area of the outer rectangle; a stroke thickness; and a color of the connected domain.
  • step S710 after extracting the connected domain, the method further includes: removing the connected domain having the feature that is not obviously belonging to the text from the extracted connected domain.
  • step S720 the horizontal line is detected first, and then the vertical line is detected.
  • detecting the horizontal line includes: combining adjacent connected domains whose horizontal distance is less than the first threshold into one candidate horizontal sub-row according to the feature of the connected domain; and, according to the second threshold, the adjacent horizontal distances are less than the second threshold
  • the candidate horizontal sub-rows are combined into one candidate horizontal row; the number of connected domains in the candidate horizontal row is greater than 2 as the horizontal row, and the remaining rows are used as the vertical row candidate.
  • Detecting a vertical line includes: combining adjacent vertical line candidates having a vertical distance smaller than a third threshold into one candidate vertical sub-line; and combining adjacent candidate vertical sub-rows having a vertical distance smaller than a fourth threshold according to the fourth threshold A candidate vertical line; a line having a number of connected domains in the candidate vertical line greater than or equal to 3 is regarded as a vertical line.
  • a row having a preset noise feature existing in the detected result may be identified according to the preset noise feature, and the identified row is removed from the result.
  • the text detection method and apparatus can be applied to various electronic devices, including smart phones, tablet, notebooks, or other handheld electronic devices.
  • the user can input an image to be detected on such an electronic device.
  • Electronic devices can identify lines of text in an image efficiently and accurately.
  • line detection By extracting the connected domain and performing line detection based on the characteristics of the extracted connected domain, there is no restriction on the language of the text, and various languages can be supported.
  • the vertical line detection is performed after the horizontal line detection is performed, the horizontal line and the vertical line can be simultaneously detected. Since the noise removal processing is performed after the line is detected, the line detection can be performed with high precision.
  • FIG. 8 shows an application example using a text detecting method according to an embodiment of the present invention.
  • a text detecting method according to an embodiment of the present invention is run on a smartphone.
  • the smartphone has a camera.
  • the camera captures images of the real world to obtain images to be detected.
  • the text detecting method according to the embodiment of the present invention performs text recognition on the image to be detected, and obtains one horizontal line and one vertical line.
  • the optical character recognition OCR method can then be run on the smartphone to identify text in horizontal and vertical lines.
  • the translator can be run on the smartphone to translate the recognized text into the language desired by the user so that the user can easily understand the text content seen.
  • the computer program product is an embodiment having a computer readable medium encoded with computer program logic, the computer program logic providing related operations when provided on a computing device to provide The above technical solution.
  • the computer program logic When executed on at least one processor of a computing system, the computer program logic causes the processor to perform the operations (methods) described in the embodiments of the present invention.
  • Such an arrangement of the present invention is typically provided as software, code and/or other data structures, or such as one or more, that are arranged or encoded on a computer readable medium such as an optical medium (e.g., CD-ROM), floppy disk, or hard disk.
  • Software or firmware or such a configuration may be installed on the computing device such that one or more processors in the computing device perform the techniques described in this embodiment of the invention.
  • a software process that operates in conjunction with a computing device, such as a group of data communication devices or other entities, may also provide the device in accordance with the present invention.
  • the device according to the invention may also be distributed between multiple software processes on multiple data communication devices, or all software processes running on a small set of dedicated computers, or all software processes running on a single computer.
  • embodiments of the invention may be implemented as software programs, software and hardware on a computer device, or as separate software and/or separate circuits.

Abstract

本发明涉及文本检测方法和设备,能够支持多语言,且能够以高精度识别文本。根据本发明的文本检测方法包括:对待检测的图像进行二值化,以得到二值化图像并提取连通域,得到连通域的特征;对提取的连通域进行组合,以检测水平行和垂直行;以及针对检测到的结果进行滤波,以消除噪声。

Description

文本检测方法和设备 技术领域
本发明涉及文本检测技术,更具体地,涉及一种从自然场景图像中检测文本的方法和设备,能够支持多语言,且能够检测水平行和垂直行。
背景技术
随着信息技术的发展,电子设备(例如,个人数字助理、手持电脑、手机)等的使用在人们的生活中越来越普及。配备有摄像装置的电子设备的使用也越来越普及。当人们通过摄像装置拍摄自然场景图像时,可能需要对所拍摄的图像中的文本行进行识别。
中国专利申请201410334436.4提出了一种中文文本定位设备,其可以从自然场景图像中提取文本。其中,通过最大稳定极值区域MSER方法对图像进行二值化,并根据中文字符的特征来检测文本。但是,所提取的文本局限于中文字符。
现有的文本检测方法局限于一种或某几种特定语言,而无法普适于所有语言。当在图像中出现未知语言时,得到的结果非常差。
此外,现有的文本检测方法通常仅能够处理水平行,而无法同时处理水平行和垂直行。
以高精度来检测自然场景图像中的文本非常困难。一方面,在图像中可能存在非常多的非文本内容,而这些内容可能导致大量噪声并降低检测精度。另一方面,真实世界中的文本具有各种各样的布局和大小,这种复杂的情况非常难以处理。
因此,需要一种能够支持多语言且能够检测水平行和垂直行的文本检测机制。
发明内容
本公开提出了一种文本检测方法和设备,能够支持多语言,且能够检测水平行和垂直行。
根据本发明的一个方面,提出了一种文本检测方法,包括:对待检测的图像进行二值化,以得到二值化图像并提取连通域,得到连通域的特征;对提取的连通域进行组合,以检测水平行和垂直行;以及针对检测到的结果进行滤波,以消除噪声。
优选地,通过最大稳定极值区域MSER方法对待检测的图像进行二值化。
优选地,所述连通域的特征至少包括以下之一:外界矩形;前景面积;前景面积与外界矩形的面积之比;笔画粗细;以及连通域的颜色。
优选地,在提取连通域之后,所述方法还包括:从提取的连通域中移除具有明显不属于文本的特征的连通域。
优选地,检测水平行和垂直行包括:先检测水平行,然后检测垂直行。
优选地,检测水平行包括:根据连通域的特征,将水平相距小于第一阈值的相邻的连通域组合为一个候选水平子行;根据第二阈值,将水平相距小于第二阈值的相邻的候选水平子行组合为一个候选水平行;将候选水平行中连通域的数量大于2的行作为水平行,并将剩余的行作为垂直行候选项。
优选地,检测垂直行包括:将垂直距离小于第三阈值的相邻的垂直行候选项组合为一个候选垂直子行;根据第四阈值,将垂直距离小于第四阈值的相邻的候选垂直子行组合为一个候选垂直行;将候选垂直行中连通域的数量大于或等于3的行作为垂直行。
优选地,针对检测到的结果进行滤波,以消除噪声包括:根据预设的噪声特征,识别检测到的结果中存在的具有预设的噪声特征的行,并从结果中移除所识别的行。
根据本发明的另一方面,提出了一种文本检测设备,包括:文本提取模块,被配置为对待检测的图像进行二值化,以得到二值化图像并提取连通域,得到连通域的特征;行检测模块,被配置为对提取的连通域进行组合,以检测水平行和垂直行;以及后处理模块,被配置为针对检测到的结果进行滤波,以消除噪声。
与现有技术不同,根据本发明实施例的文本检测方法和设备在多个方面改善了文本检测的性能,至少包括:
1.不局限于某种或某些特定语言,而可以识别任何语言的文本行;
2.能够同时检测存在的水平行和垂直行;
3.可以以高精度定位文本行。
附图说明
通过下面结合附图说明本发明的优选实施例,将使本发明的上述及其它目的、特征和优点更加清楚,其中:
图1是示出了根据本发明实施例的文本检测设备的示意框图。
图2示出了一个示例的待检测的图像。
图3示出了图2所示的待检测的图像的二值化结果和连通域。
图4示出了图3所示的二值化结果中移除了明显不属于文本的特征的连通域后的结果。
图5示出了图2所示的待检测的图像的行检测结果。
图6示出了另一个示例的待检测的图像和行检测结果。
图7示出了根据本发明实施例的文本检测方法的流程图。
图8示出了利用根据本发明实施例的文本检测方法的一个应用示例。
具体实施方式
以下参照附图,对本发明的示例实施例进行详细描述。在以下描述中,一些具体实施例仅用于描述目的,而不应该理解为对本发明有任何限制,而只是本发明的示例。在可能导致对本发明的理解造成混淆时,将省略常规结构或构造。
图1是示出了根据本发明实施例的文本检测设备100的示意框图。该文本检测设备100包括:输入模块110,被配置为输入待检测的图像;文本提取模块120,被配置为对待检测的图像进行二值化,以得到二值化图像并提取连通域,得到连通域的特征;行检测模块130,被配置为对提取的连通域进行组合,以检测水平行和垂直行;以及后 处理模块140,被配置为针对检测到的结果进行滤波,以消除噪声。
根据本实施例的文本检测设备100可以在诸如智能电话、写字板、笔记本或其他手持电子设备上实现。
输入模块110用于输入待检测的图像。例如,输入模块110可以是智能电话上的摄像机,用于拍摄自然场景图像,作为待检测的图像。又例如,输入模块110可以是笔记本上的通信模块,用于从外部接收待检测的图像。图2示出了一个示例的待检测的图像。
文本提取模块120被配置为通过从待检测的图像中提取连通域,来进行文本提取。根据一个实施例,文本提取模块120被配置为通过最大稳定极值区域MSER方法对待检测的图像进行二值化,得到二值化的图像。然后从图像中提取连通域,并得到连通域的特征。图3示出了图2所示的待检测的图像的二值化结果和连通域。连通域的特征至少包括以下之一:外界矩形;前景面积;前景面积与外界矩形的面积之比;笔画粗细;以及连通域的颜色。
外界矩形是能够将一个连通域包围起来的最小矩形区域。前景面积是一个连通域的面积。笔画粗细表示连通域中的笔画的粗细。连通域的颜色表示该连通域在原始图像中的颜色。这些特征并不与特定的语言类型相关,因此文本检测设备100可以普适于多种语言。
这些连通域的特征可用于进行行的检测和噪声消除。
文本检测模块120还被配置为:从提取的连通域中移除具有明显不属于文本的特征的连通域。
例如,当检测到一条直线时,其高宽比明显与其它的连通域的宽高比不同。因此,可以将其从提取的连通域中删除。再例如,当检测到一个噪声点时,其所占像素数目明显小于其他的连通域所占的像素数目。因此,可以将其从提取的连通域中删除。移除这些连通域以提高检测精度。图4示出了图3所示的二值化结果中移除了明显不属于文本的特征的连通域后的结果。
可以预先设置文本的特征,以便在提取到连通域之后对提取的连通域进行过滤。当然,例如作为输入模块的摄像机可以输入所拍摄的图像的信息,文本检测模块可以根据图像的信息对提取的连通域进行过滤。图像的信息包括例如图像的像素数目、图像的宽度、高度等。
行检测模块130被配置为对提取的连通域进行组合,以检测水平行和垂直行。可以分别处理水平行和垂直行。在现实世界中,水平行出现的概率远高于垂直行出现的概率。因此,可以首先检测水平行以确保水平行检测相比于垂直行检测具有较高的优先级。
针对水平行检测,其算法如下。根据连通域的特征,将水平距离小于第一阈值的相邻的连通域组合为一个候选水平子行;根据第二阈值,将水平距离小于第二阈值的相邻的候选水平子行组合为一个候选水平行;将候选水平行中连通域的数量大于2的行作为水平行,并将剩余的行作为垂直行候选项。
假定提取的连通域表示为Call,所有连通域可以组合为组。组合方法是根据连通域的水平位置关系和其他特征,例如笔画粗细、笔画颜色等。仅将水平距离很近的相邻连通域组合到相同的组中。例如,根据连通域的特征,假定连通域的外界矩形的平均大小是10*10,则可以将第一阈值设置为5,并将水平距离小于第一阈值的相邻的连通域组合到相同的组中,作为一个候选水平子行。假定CHgroup1是组合后的结果,其中该组可以具有一个或多个连通域。然后,根据较大的水平距离再次对CHgroup1进行组合。例如,假定连通域的外界矩形的平均大小是10*10,则可以将第二阈值设置为10,将水平距离小于第二阈值的相邻候选水平子行组合为一个候选水平行。假定结果是CHgroup2。由于亚洲语言的一些字符具有左右部分,因此可以仅将在每一个CHgroup2中的连通域的数量大于2的CHgroup2选择为水平行。因此,CHgroup2将分为两个部分Lhor和Crest,Lhor是检测到的水平行,Crest将作为垂直行候选项,参与垂直行检测。
针对垂直行检测,其算法如下。将垂直距离小于第三阈值的相邻的垂直行候选项组合为一个候选垂直子行;根据第四阈值,将垂直距离小于第四阈值的相邻的候选垂直子行组合为一个候选垂直行;将候选垂直行中连通域的数量大于或等于3的行作为垂直行。
例如,根据垂直位置关系对Crest进行组合。仅将垂直距离很近的垂直行候选项组合到相同的组中。例如,根据连通域的特征,假定连通域的外界矩形的平均大小是10*10,则可以将第三阈值设置为5,并将垂直距离小于第三阈值的相邻的垂直行候选项组合到相同的组中, 作为一个候选垂直子行。假定CVgroup1是组合结果。然后,根据较大的垂直距离再次对CVgroup1进行组合。例如,假定连通域的外界矩形的平均大小是10*10,则可以将第四阈值设置为10,将垂直距离小于第四阈值的相邻候选垂直子行组合为一个候选垂直行。假定最终组合结果是CVgroup2。仅将在每一组CVgroup2中的连通域的数量大于3的CVgroup2选择为垂直行Lver。Lhor和Lver是检测到的水平行和垂直行。图5示出了图2所示的待检测的图像的行检测结果。图6示出了另一个示例的待检测的图像和行检测结果,其中图6(a)示出了待检测的图像,图6(b)示出了行检测结果。
后处理模块140被配置为针对检测到的结果进行滤波,以提高检测的精度。实际上,可能会提取出一些噪声行,因为根据本发明实施例的文本检测设备不局限于特定的语音类型。例如,可能会将墙壁上的砖块识别为文本行。根据本发明实施例,可以通过以下步骤滤除噪声:1)提取行的特征,这些特征包括字符的平均大小、前景面积与外界矩形的面积的平均填充比等。2)根据行特征识别噪声,然后从结果中移除噪声。例如,可以预设噪声特征。例如,可以预设可能识别的窗、墙壁、书本页面等的噪声对象的特征。根据预设的噪声特征,识别检测到的结果中存在的具有预设的噪声特征的行,并从结果中移除所识别的行。
图1还示出了,根据本发明实施例的文本检测设备100还包括显示器150,用于显示文本检测结果。
图7示出了根据本发明实施例的文本检测方法700的流程图。根据本发明实施例的文本检测方法应用于电子设备,能够对电子设备上的待检测的图像进行文本设备。当需要识别文本行时,根据本发明实施例的文本识别方法启动。首先,在步骤S710处,对待检测的图像进行二值化,以得到二值化图像并提取连通域,得到连通域的特征。然后,在步骤S720处,对提取的连通域进行组合,以检测水平行和垂直行。在步骤S730处,针对检测到的结果进行滤波,以消除噪声。
步骤S710中得到的连通域的特征至少包括以下之一:外界矩形;前景面积;前景面积与外界矩形的面积之比;笔画粗细;以及连通域的颜色。
在步骤S710中,在提取连通域之后,还包括:从提取的连通域中移除具有明显不属于文本的特征的连通域。
在步骤S720中,先检测水平行,然后检测垂直行。具体地,检测水平行包括:根据连通域的特征,将水平相距小于第一阈值的相邻的连通域组合为一个候选水平子行;根据第二阈值,将水平相距小于第二阈值的相邻的候选水平子行组合为一个候选水平行;将候选水平行中连通域的数量大于2的行作为水平行,并将剩余的行作为垂直行候选项。检测垂直行包括:将垂直距离小于第三阈值的相邻的垂直行候选项组合为一个候选垂直子行;根据第四阈值,将垂直距离小于第四阈值的相邻的候选垂直子行组合为一个候选垂直行;将候选垂直行中连通域的数量大于或等于3的行作为垂直行。
在步骤S730中,可以根据预设的噪声特征,识别检测到的结果中存在的具有预设的噪声特征的行,并从结果中移除所识别的行。
根据本发明实施例的文本检测方法和设备可以应用于各种电子设备,包括智能电话、写字板、笔记本或其他手持电子设备。用户可以在这种电子设备上输入待检测的图像。电子设备可以高效且高精确地对图像中的文本行进行识别。通过提取连通域并根据提取的连通域的特征来进行行的检测,所以对于文本的语言没有限制,而可以支持各种语言。此外,由于在执行了水平行检测之后进行垂直行检测,可以同时检测水平行和垂直行。由于在检测到行之后进行噪声移除处理,可以以高精度进行行检测。
图8示出了利用根据本发明实施例的文本检测方法的一个应用示例。如图8所示,在智能电话上运行根据本发明实施例的文本检测方法。当用户在外旅行时,需要识别他看到的文字。假定该智能电话具有摄像头。首先,利用摄像机捕获现实世界的图像,得到待检测的图像。然后,根据本发明实施例的文本检测方法对待检测的图像进行文本识别,得到一个水平行和一个垂直行。然后,可以在该智能电话上运行光学字符识别OCR方法,识别水平行和垂直行中的文本。可以在该智能电话上运行翻译程序,以将识别的文本翻译为用户所需的语言,从而该用户可以方便地了解所看到的文字内容。
这里所公开的本发明实施例的其他设置包括执行在先概述的方 法实施例的步骤和操作的软件程序。更具体地,计算机程序产品是如下的一种实施例:具有计算机可读介质,计算机可读介质上编码有计算机程序逻辑,当在计算设备上执行时,计算机程序逻辑提供相关的操作,从而提供上述技术方案。当在计算系统的至少一个处理器上执行时,计算机程序逻辑使得处理器执行本发明实施例所述的操作(方法)。本发明的这种设置典型地提供为设置或编码在例如光介质(例如CD-ROM)、软盘或硬盘等的计算机可读介质上的软件、代码和/或其他数据结构、或者诸如一个或多个ROM或RAM或PROM芯片上的固件或微代码的其他介质、或专用集成电路(ASIC)、或一个或多个模块中的可下载的软件图像、共享数据库等。软件或固件或这种配置可安装在计算设备上,以使得计算设备中的一个或多个处理器执行本发明实施例所述的技术。结合诸如一组数据通信设备或其他实体中的计算设备进行操作的软件过程也可以提供根据本发明的设备。根据本发明的设备也可以分布在多个数据通信设备上的多个软件过程、或者在一组小型专用计算机上运行的所有软件过程、或者单个计算机上运行的所有软件过程之间。
应该理解,严格地讲,本发明的实施例可以实现为计算机设备上的软件程序、软件和硬件、或者单独的软件和/或单独的电路。
应当注意的是,在以上的描述中,仅以示例的方式,示出了本发明的技术方案,但并不意味着本发明局限于上述步骤和单元结构。在可能的情形下,可以根据需要对步骤和单元结构进行调整和取舍。因此,某些步骤和单元并非实施本发明的总体发明思想所必需的元素。因此,本发明所必需的技术特征仅受限于能够实现本发明的总体发明思想的最低要求,而不受以上具体实例的限制。
至此已经结合优选实施例对本发明进行了描述。应该理解,本领域技术人员在不脱离本发明的精神和范围的情况下,可以进行各种其它的改变、替换和添加。因此,本发明的范围不局限于上述特定实施例,而应由所附权利要求所限定。

Claims (17)

  1. 一种文本检测方法,包括:
    对待检测的图像进行二值化,以得到二值化图像并提取连通域,得到连通域的特征;
    对提取的连通域进行组合,以检测水平行和垂直行;以及
    针对检测到的结果进行滤波,以消除噪声。
  2. 根据权利要求1所述的文本检测方法,其中,通过最大稳定极值区域MSER方法对待检测的图像进行二值化。
  3. 根据权利要求1所述的文本检测方法,其中,所述连通域的特征至少包括以下之一:
    外界矩形;
    前景面积;
    前景面积与外界矩形的面积之比;
    笔画粗细;以及
    连通域的颜色。
  4. 根据权利要求1所述的文本检测方法,其中,在提取连通域之后,所述方法还包括:
    从提取的连通域中移除具有明显不属于文本的特征的连通域。
  5. 根据权利要求1所述的文本检测方法,其中,检测水平行和垂直行包括:
    先检测水平行,然后检测垂直行。
  6. 根据权利要求5所述的文本检测方法,其中,检测水平行包括:
    根据连通域的特征,将水平相距小于第一阈值的相邻的连通域组合为一个候选水平子行;
    根据第二阈值,将水平相距小于第二阈值的相邻的候选水平子行组合为一个候选水平行;
    将候选水平行中连通域的数量大于2的行作为水平行,并将剩余的行作为垂直行候选项。
  7. 根据权利要求6所述的文本检测方法,其中,检测垂直行包括:
    将垂直距离小于第三阈值的相邻的垂直行候选项组合为一个候选垂直子行;
    根据第四阈值,将垂直距离小于第四阈值的相邻的候选垂直子行组合为一个候选垂直行;
    将候选垂直行中连通域的数量大于或等于3的行作为垂直行。
  8. 根据权利要求1所述的文本检测方法,其中,针对检测到的结果进行滤波,以消除噪声包括:
    根据预设的噪声特征,识别检测到的结果中存在的具有预设的噪声特征的行,并从结果中移除所识别的行。
  9. 一种文本检测设备,包括:
    文本提取模块,被配置为对待检测的图像进行二值化,以得到二值化图像并提取连通域,得到连通域的特征;
    行检测模块,被配置为对提取的连通域进行组合,以检测水平行和垂直行;以及
    后处理模块,被配置为针对检测到的结果进行滤波,以消除噪声。
  10. 根据权利要求9所述的文本检测设备,其中,所述文本提取模块被配置为通过最大稳定极值区域MSER方法对待检测的图像进行二值化。
  11. 根据权利要求9所述的文本检测设备,其中,所述连通域的特征至少包括以下之一:
    外界矩形;
    前景面积;
    前景面积与外界矩形的面积之比;
    笔画粗细;以及
    连通域的颜色。
  12. 根据权利要求9所述的文本检测设备,其中,所述文本检测模块还被配置为:
    从提取的连通域中移除具有明显不属于文本的特征的连通域。
  13. 根据权利要求9所述的文本检测设备,其中,所述行检测模块被配置为:
    先检测水平行,然后检测垂直行。
  14. 根据权利要求13所述的文本检测设备,其中,所述行检测模块被配置为:
    根据连通域的特征,将水平距离小于第一阈值的相邻的连通域组合为一个候选水平子行;
    根据第二阈值,将水平距离小于第二阈值的相邻的候选水平子行组合为一个候选水平行;以及
    将候选水平行中连通域的数量大于2的行作为水平行,并将剩余的行作为垂直行候选项。
  15. 根据权利要求14所述的文本检测设备,其中,所述行检测模块被配置为:
    将垂直距离小于第三阈值的相邻的垂直行候选项组合为一个候选垂直子行;
    根据第四阈值,将垂直距离小于第四阈值的相邻的候选垂直子行组合为一个候选垂直行;以及
    将候选垂直行中连通域的数量大于或等于3的行作为垂直行。
  16. 根据权利要求9所述的文本检测设备,其中,所述后处理模 块被配置为:
    根据预设的噪声特征,识别检测到的结果中存在的具有预设的噪声特征的行,并从结果中移除所识别的行。
  17. 根据权利要求9所述的文本检测设备,其中,所述文本检测设备实现在智能电话、写字板、笔记本或其他手持电子设备上。
PCT/CN2017/073939 2016-03-01 2017-02-17 文本检测方法和设备 WO2017148282A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610115229.9A CN107145883A (zh) 2016-03-01 2016-03-01 文本检测方法和设备
CN201610115229.9 2016-03-01

Publications (1)

Publication Number Publication Date
WO2017148282A1 true WO2017148282A1 (zh) 2017-09-08

Family

ID=59742558

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/073939 WO2017148282A1 (zh) 2016-03-01 2017-02-17 文本检测方法和设备

Country Status (2)

Country Link
CN (1) CN107145883A (zh)
WO (1) WO2017148282A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109874313A (zh) * 2017-10-13 2019-06-11 众安信息技术服务有限公司 文本行检测方法及文本行检测装置
CN110020655A (zh) * 2019-04-19 2019-07-16 厦门商集网络科技有限责任公司 一种基于二值化的字符去噪方法及终端

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112200202A (zh) * 2020-10-29 2021-01-08 上海商汤智能科技有限公司 文本检测方法及装置、电子设备和存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020037100A1 (en) * 2000-08-25 2002-03-28 Yukari Toda Image processing apparatus and method
US20110200250A1 (en) * 2010-02-17 2011-08-18 Samsung Electronics Co., Ltd. Apparatus and method for generating image for character region extraction
CN102163284A (zh) * 2011-04-11 2011-08-24 西安电子科技大学 面向中文环境的复杂场景文本定位方法
CN104182750A (zh) * 2014-07-14 2014-12-03 上海交通大学 一种在自然场景图像中基于极值连通域的中文检测方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020037100A1 (en) * 2000-08-25 2002-03-28 Yukari Toda Image processing apparatus and method
US20110200250A1 (en) * 2010-02-17 2011-08-18 Samsung Electronics Co., Ltd. Apparatus and method for generating image for character region extraction
CN102163284A (zh) * 2011-04-11 2011-08-24 西安电子科技大学 面向中文环境的复杂场景文本定位方法
CN104182750A (zh) * 2014-07-14 2014-12-03 上海交通大学 一种在自然场景图像中基于极值连通域的中文检测方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109874313A (zh) * 2017-10-13 2019-06-11 众安信息技术服务有限公司 文本行检测方法及文本行检测装置
CN110020655A (zh) * 2019-04-19 2019-07-16 厦门商集网络科技有限责任公司 一种基于二值化的字符去噪方法及终端

Also Published As

Publication number Publication date
CN107145883A (zh) 2017-09-08

Similar Documents

Publication Publication Date Title
US10445569B1 (en) Combination of heterogeneous recognizer for image-based character recognition
US10127471B2 (en) Method, device, and computer-readable storage medium for area extraction
US9043349B1 (en) Image-based character recognition
US9098888B1 (en) Collaborative text detection and recognition
US8768062B2 (en) Online script independent recognition of handwritten sub-word units and words
US11244144B2 (en) Age recognition method, computer storage medium and electronic device
US9298365B2 (en) Storage medium, information processing apparatus and character recognition method
CN104182750A (zh) 一种在自然场景图像中基于极值连通域的中文检测方法
CN106297755B (zh) 一种用于乐谱图像识别的电子设备及识别方法
WO2017148282A1 (zh) 文本检测方法和设备
US10262202B2 (en) Form recognition method, form recognition device, and non-transitory computer-readable medium
WO2015031702A1 (en) Multiple hypothesis testing for word detection
Liang et al. A new wavelet-Laplacian method for arbitrarily-oriented character segmentation in video text lines
JP2019016350A (ja) 電子文書における強調テキストの識別
US20160283786A1 (en) Image processor, image processing method, and non-transitory recording medium
US10452943B2 (en) Information processing apparatus, control method of information processing apparatus, and storage medium
US20160110597A1 (en) Method and System for Imaging Documents, Such As Passports, Border Crossing Cards, Visas, and Other Travel Documents, In Mobile Applications
CN111435407A (zh) 错别字的纠正方法、装置、设备及存储介质
CN104899588B (zh) 识别图像中的字符的方法及装置
Jindal et al. A new method for segmentation of pre-detected Devanagari words from the scene images: Pihu method
WO2016192664A1 (zh) 手写表识别方法和设备
Pavithra et al. A comprehensive of transforms, Gabor filter and k-means clustering for text detection in images and video
CN110134924B (zh) 重叠文本组件提取方法和装置、文本识别系统及存储介质
CN111209865A (zh) 文件内容提取方法、装置、电子设备及存储介质
JP5857634B2 (ja) 単語間空白検出装置、単語間空白検出方法及び単語間空白検出用コンピュータプログラム

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17759133

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 17759133

Country of ref document: EP

Kind code of ref document: A1