WO2018145470A1 - 一种图像检测方法和装置 - Google Patents

一种图像检测方法和装置 Download PDF

Info

Publication number
WO2018145470A1
WO2018145470A1 PCT/CN2017/103283 CN2017103283W WO2018145470A1 WO 2018145470 A1 WO2018145470 A1 WO 2018145470A1 CN 2017103283 W CN2017103283 W CN 2017103283W WO 2018145470 A1 WO2018145470 A1 WO 2018145470A1
Authority
WO
WIPO (PCT)
Prior art keywords
region
image
mser
detected
area
Prior art date
Application number
PCT/CN2017/103283
Other languages
English (en)
French (fr)
Inventor
李红匣
Original Assignee
广州视源电子科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州视源电子科技股份有限公司 filed Critical 广州视源电子科技股份有限公司
Publication of WO2018145470A1 publication Critical patent/WO2018145470A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/187Segmentation; Edge detection involving region growing; involving region merging; involving connected component labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing

Definitions

  • the present invention relates to the field of image processing technologies, and in particular, to an image detection method and apparatus.
  • Natural scene text detection is one of the important research topics in the field of target detection and recognition by computer vision and pattern recognition technology. The purpose of this technology is to accurately detect text information in the captured natural scene image, which has broad application prospects in natural scene understanding and analysis, robot-assisted navigation, video retrieval, blind assisted reading and text translation.
  • the sliding window based method refers to sliding a multi-scale window from left to right and top to bottom in an image, and classifying the image in the sliding window to determine whether it is a text area, in order to be able to detect all Text area, this method usually requires a large number of sliding windows, resulting in increased computational complexity, and can not meet real-time requirements.
  • the method based on connected regions refers to similarity clustering of pixels according to attributes inherent in text, such as color, texture, stroke width, etc., generating a large number of connected regions, and performing features on connected regions (such as text height, width, and The spacing, etc.) extracts and filters the non-text area to complete the text detection.
  • the calculation amount of the method is relatively reduced, but the required connected area is required.
  • the extraction has a high requirement that the extracted connected area should include all the text areas, and it is difficult to effectively cope with the complicated background.
  • the present invention provides an image detecting method and apparatus, which can quickly and accurately detect a text area in a complex natural scene.
  • the embodiment of the present invention adopts the following technical solutions:
  • an embodiment of the present invention provides an image detection method, including:
  • the embodiment of the present invention further provides an image detecting apparatus, including:
  • An image acquiring module to be detected configured to acquire an image to be detected
  • An MSER region extraction module configured to extract a maximum stable extreme value MSER region from the image to be detected, where the MSER region is a connected region;
  • the MSER area filtering module is configured to filter the MSER area to obtain a text area in the image to be detected.
  • the image to be detected is obtained, and the maximum stable extreme value MSER region is extracted from the image to be detected, wherein the maximum stable extreme value region is a connected region, and the MSER region is filtered to obtain a text region in the image to be detected.
  • the MSER region is extracted as a candidate region by dividing the connected region, and then the extracted MSER region is filtered and filtered.
  • the text area in the image to be detected is obtained, and the area division is beneficial to reduce the calculation amount and improve the detection efficiency.
  • extracting the MSER area can reduce the interference of the image background, and can improve the accuracy of detecting the background image to be detected.
  • Embodiment 1 is a schematic flow chart of an image detecting method according to Embodiment 1 of the present invention.
  • FIG. 2A is a schematic flowchart of an image detecting method according to Embodiment 2 of the present invention.
  • FIG. 2B is a schematic flow chart of an alternative embodiment of S250 of FIG. 2A;
  • FIG. 2C is a schematic structural diagram of a convolutional neural network model used in Embodiment 2 of the present invention.
  • FIG. 3 is a schematic structural diagram of an image detecting apparatus according to Embodiment 3 of the present invention.
  • FIG. 4A is a schematic structural diagram of an image detecting apparatus according to Embodiment 4 of the present invention.
  • FIG. 4B is a block diagram of an alternative embodiment of the MSER region filtering module 450 of FIG. 4A.
  • FIG. 1 is a schematic flowchart of an image detecting method according to Embodiment 1 of the present invention.
  • the method of this embodiment can be performed by a mobile device such as a smart phone, a tablet computer or a notebook computer equipped with a camera, and can be adapted to detect a situation in which a text region in a natural scene image is recognized.
  • the image to be detected may be an original image, or may be an image obtained by preprocessing the original image.
  • the original image is preferably pre-processed to obtain an image to be detected.
  • a Maximally Stable Extrernal Regions (MSER) region is a connected region formed by a certain threshold change of an image to be detected, and a plurality of MSER regions can be extracted from the image to be detected, and the region can be connected.
  • the minimum circumscribed rectangle is used to represent the MSER area. Among them, the colors, textures, and character stroke widths in the same connected area are basically the same.
  • Each rectangular frame displayed in the image to be detected represents an MSER region, and a plurality of MSER regions may be extracted from the image to be detected, or an MSER region may not be extracted, that is, there is no text region in the image to be detected.
  • filtering the MSER region for example filtering according to the regional characteristics of the MSER region.
  • An optional implementation of filtering the MSER region is provided in Embodiment 2 of the present invention, and details are not described herein.
  • the image to be detected is acquired, and the maximum stable extreme value MSER region is extracted from the image to be detected, wherein the maximum stable extreme value region is a connected region, and the MSER region is filtered.
  • the text area in the image to be detected By extracting the MSER region from the image to be detected, the MSER region is extracted as a candidate region by dividing the connected region, and then the extracted MSER region is filtered and filtered, and finally the text region in the image to be detected is obtained, and the region division is beneficial to reduce the calculation.
  • the quantity, the detection efficiency, and the extraction of the MSER region can reduce the interference of the image background, and can improve the accuracy when detecting complex background images.
  • FIG. 2A is a schematic flowchart of an image detecting method according to Embodiment 2 of the present invention
  • FIG. 2B is a schematic flowchart of an alternative embodiment of S250 in FIG. 2A
  • FIG. 2C It is a schematic structural diagram of a convolutional neural network model used in the second embodiment of the present invention.
  • the main difference between this embodiment and the first embodiment is that the contents of S210, S220, S260 and S270 are added on the basis of the first embodiment, and an alternative embodiment of S250 is further provided.
  • the initial image may be an image obtained by capturing a natural scene through a camera, typically an RGB image.
  • S220 Perform color space conversion on the initial image to obtain an image to be detected.
  • an image of 7 channels of R, G, B, Grayscale, H, S, and V is obtained as the image to be detected, and the subsequent images are performed on the 7 images. operating.
  • the MSER region can be extracted from the image to be detected by the MSER algorithm, mainly Cheng Wei: Binarize the detected image, adjust the binarization threshold to change within the range [0, 255], and determine the connected region when the area change amplitude V(i) of the connected region is less than the set change amplitude value.
  • the MSER region for example, when the grayscale image binarization processing of the detected image is performed, the pixel value whose pixel value is smaller than the binarization threshold value is set to a pixel value of 0, and the pixel value is not smaller than the binarization threshold pixel.
  • the corresponding binarized image undergoes a process from all black to all white (like a bird's-eye view with rising water level).
  • the area of some connected areas is two.
  • the variation of the value threshold is small, that is, V(i) is smaller than the set value of the change amplitude (such as 0.25), and the connected region is the MSER region.
  • Q i represents the area of the connected region when the binarization threshold is i; ⁇ represents a small change in the binarization threshold; the area change amplitude V(i) represents the change in the area of the connected region when the binarization threshold is slightly changed by i degree.
  • S250 Filter the MSER area to obtain a text area in the image to be detected.
  • filtering the MSER region may include four steps S251, S252, S253, and S254, where:
  • S251 Count the pixel value or the area aspect ratio of the MSER area.
  • the captured natural scene image has almost no text image of less than 30 pixels, and the aspect ratio of the general text area is also within a certain range, for example, the aspect ratio of the text area is usually In the range of 0.3-3, the non-text area in the MSER region can be initially filtered according to the determined pixel value or aspect ratio in the rectangular frame of the MSER region.
  • S252 Filter the MSER region whose pixel value is smaller than the preset pixel threshold or the area aspect ratio is not within the preset range.
  • an MSER region having a pixel number less than 30 or a region aspect ratio not in the range of 0.3-3 Domain filtering.
  • one of the plurality of rectangular frames may be selected to represent the text area.
  • the rectangular frame A and the rectangle are considered.
  • the frame B is located at the same position and represents the same text area.
  • the rectangular frame A and the rectangular frame B are merged, and all the remaining rectangular frames are traversed, and the rectangular frame that meets the above merge condition is merged with the rectangular frame A, and the detected image is also processed. Similar operations are done in other rectangular boxes to minimize subsequent calculations.
  • S253 Perform convolution and downsampling on the remaining MSER regions after filtering to obtain a feature map.
  • the convolutional neural network model is trained by using the binarized image extracted by the MSER region.
  • a 32*32 image is first input, and the input image is convoluted through six 5*5 kernel matrices to obtain six 28*28 feature maps of the C1 layer; characteristics of the C1 layer.
  • the map is subjected to downsampling processing, and a value is obtained every 4 pixels (2*2), and then six 14*14 feature maps of the S2 layer are obtained; then the feature map of the S2 layer is performed by using the 5*5 kernel matrix.
  • Convolution obtain 16 10*10 feature maps of C3 layer; similarly to S2, downsample the C3 layer feature map, and obtain 16 5*5 feature maps of S4 layer;
  • the 5*5 kernel matrix convolves the feature map of the S4 layer to obtain 120 1*1 feature maps of the C5 layer.
  • the feature map of the C5 layer is downsampled to obtain the F6 layer. 84 1*1 feature maps.
  • S254 Input the feature map into the classifier, and determine the MSER region as the text region according to the output result of the classifier.
  • the feature map of the F6 layer obtained in the above S253 is input into the softmax classifier, and the input image is determined to be a text image according to the output result of the softmax classifier, and the corresponding MSER
  • the area is a text area.
  • other classifiers such as SVM may also be employed.
  • the area of a single character or text in the image to be detected can be basically determined, and the rectangular frame of the non-text area is basically filtered, and the rectangular frame of the text area is retained.
  • the MSER region described in the embodiment of the present invention also represents an area image corresponding to the MSER region.
  • the initial image is received, the initial image is subjected to color space conversion, the image to be detected is acquired, and the maximum stable extreme value MSER region is extracted from the image to be detected, wherein the maximum stable extreme value region is a connected region.
  • the MSER region is filtered to obtain a text region in the image to be detected, and the text region is further merged between regions and word segmentation within the region.
  • the MSER region is extracted, and the MSER region is extracted as a candidate region by dividing the connected region.
  • the extracted MSER region is filtered and filtered, and finally the text region in the image to be detected is obtained, and the region division is beneficial to reduce the calculation amount and improve the detection efficiency. Simultaneous extraction of the MSER region can reduce the interference of the image background, and can improve the accuracy when detecting complex background images.
  • the following is an embodiment of an image detecting apparatus according to an embodiment of the present invention.
  • the image detecting apparatus and the image detecting method belong to the same inventive concept.
  • FIG. 3 is a schematic structural diagram of an image detecting apparatus according to Embodiment 3 of the present invention.
  • An image detecting apparatus 300 provided in this embodiment may include the following contents:
  • the image to be detected module 310 is configured to acquire an image to be detected.
  • the MSER region extraction module 320 is configured to extract a maximum stable extreme value MSER region from the image to be detected, where the MSER region is a connected region.
  • the MSER area filtering module 330 is configured to filter the MSER area to obtain a text area in the image to be detected.
  • the image to be detected is acquired, and the maximum stable extreme value MSER region is extracted from the image to be detected, wherein the maximum stable extreme value region is a connected region, and the MSER region is filtered to obtain a text region in the image to be detected.
  • the MSER region is extracted as a candidate region by dividing the connected region, and then the extracted MSER region is filtered and filtered, and finally the text region in the image to be detected is obtained, and the region division is beneficial to reduce the calculation. Increase the detection efficiency and extract the MSER region to reduce the interference of the image background, which can improve the detection background. Accuracy when complex images are used.
  • FIG. 4A is a schematic structural diagram of an image detecting apparatus according to Embodiment 4 of the present invention
  • FIG. 4B is a schematic structural diagram of an alternative embodiment of the MSER area filtering module 450 of FIG. 4A.
  • the main difference between the embodiment and the third embodiment is that the content of the initial image receiving module 410, the color space conversion module 420, the text region merging module 460 and the word segmentation module 470 are added on the basis of the third embodiment, and further provided.
  • An alternate embodiment of the MSER zone filtering module 450 is an alternate embodiment of the MSER zone filtering module 450.
  • An image detecting apparatus 400 provided in this embodiment may include the following contents:
  • the initial image receiving module 410 is configured to receive an initial image.
  • the color space conversion module 420 is configured to perform color space conversion on the initial image to obtain an image to be detected.
  • the image to be detected module 430 is configured to acquire an image to be detected.
  • the MSER region extraction module 440 is configured to extract a maximum stable extreme value MSER region from the image to be detected, where the MSER region is a connected region.
  • the MSER region extraction module 440 is specifically configured to:
  • the detection image is binarized, and the binarization threshold is adjusted within a range of [0, 255].
  • the area variation amplitude V(i) of the connected region is smaller than the set variation amplitude value, the connected region is determined to be the MSER region. ;
  • Q i represents the area of the connected region when the binarization threshold is i
  • represents a small change in the binarization threshold
  • the MSER area filtering module 450 is configured to filter the MSER area to obtain a text area in the image to be detected.
  • the MSER region filtering module 450 may include a statistics unit 451, a filtering unit 452, a feature map obtaining unit 453, and a text region determining unit 454, where:
  • the statistical unit 451 is configured to count the pixel value or the area aspect ratio of the MSER region.
  • the filtering unit 452 is configured to filter the MSER region whose pixel value is smaller than the preset pixel threshold or the area aspect ratio is not within the preset range.
  • the feature map obtaining unit 453 is configured to continuously perform convolution and downsampling processing on the remaining MSER regions after filtering to obtain a feature map.
  • the text area determining unit 454 is configured to input the feature map into the classifier, and determine the MSER area as the text area according to the output result of the classifier.
  • the text area merge module 460 is configured to merge adjacent text areas in the horizontal direction.
  • the word segmentation module 470 is configured to perform intra-region word segmentation on the merged text region.
  • the initial image is received, the initial image is subjected to color space conversion, the image to be detected is acquired, and the maximum stable extreme value MSER region is extracted from the image to be detected, wherein the maximum stable extreme value region is a connected region.
  • the MSER region is filtered to obtain a text region in the image to be detected, and the text region is further merged between regions and word segmentation within the region.
  • the MSER region is extracted as a candidate region by dividing the connected region, and then the extracted MSER region is filtered and filtered, and finally the text region in the image to be detected is obtained, and the region division is beneficial to reduce the calculation.
  • the quantity, the detection efficiency, and the extraction of the MSER region can reduce the interference of the image background, and can improve the accuracy when detecting complex background images.

Abstract

本发明实施例公开了一种图像检测方法和装置。其中,一种图像检测方法,包括获取待检测图像,从所述待检测图像中提取最大稳定极值MSER区域,其中,所述MSER区域为连通区域,过滤所述MSER区域,得到所述待检测图像中的文本区域。通过从待检测图像中提取MSER区域,以划分连通区域的方式提取MSER区域作为候选区域,再对提取到的MSER区域进行过滤筛选,最终得到待检测图像中的文本区域,区域划分有利于减少计算量、提高检测效率,同时提取MSER区域可减少图像背景的干扰,可提高在检测背景复杂的图像时的准确率。

Description

一种图像检测方法和装置 技术领域
本发明涉及图像处理技术领域,尤其涉及一种图像检测方法和装置。
背景技术
随着数码摄像设备的成熟和普及,人们已经能够非常方便快捷地记录现实世界在不同视角下的方方面面。而作为人类语言的可视化文本,在人类活动中具有特殊而不可替代的地位。自然场景文字检测是计算机视觉与模式识别技术在目标检测与识别领域中的重要研究课题之一。该技术目的在于在所拍摄的自然场景图像中准确地检测出文字信息,其在自然场景理解与分析、机器人辅助导航、视频检索、盲人辅助阅读及文字翻译等方面有广泛的应用前景。
目前,自然场景文本检测方法分为两种:基于滑动窗口的方法和基于连通区域的方法。
基于滑动窗口的方法,是指将多尺度的窗口在图像中从左到右、从上到下进行滑动,并对滑动窗口内的图像进行分类,判断其是否为文字区域,为了能够检测所有的文本区域,该方法通常需要大量的滑动窗口,导致计算复杂度增高,并不能达到实时的要求。
基于连通区域的方法,是指根据文本固有的属性,如颜色、纹理、笔划宽度等,对像素进行相似性聚类,生成大量的连通区域,并对连通区域进行特征(如文字高度、宽度和间距等)提取,过滤非文本区域,从而完成文本检测,相对于基于滑动窗口的方法,该方法的计算量相对减少,但是对要求连通区域 的提取有很高的要求,即所提取的连通区域要包括所有的文字区域,并且很难有效地应对复杂背景的情况。
发明内容
为解决相关技术问题,本发明提供一种图像检测方法和装置,可实现快速、准确地在复杂自然场景中提检测出文字区域。
为实现上述目的,本发明实施例采用如下技术方案:
第一方面,本发明实施例提供了一种图像检测方法,包括:
获取待检测图像;
从所述待检测图像中提取最大稳定极值MSER区域,其中,所述MSER区域为连通区域;
过滤所述MSER区域,得到所述待检测图像中的文本区域。
第二方面,本发明实施例还对应地提供了一种图像检测装置,包括:
待检测图像获取模块,用于获取待检测图像;
MSER区域提取模块,用于从所述待检测图像中提取最大稳定极值MSER区域,其中,所述MSER区域为连通区域;
MSER区域过滤模块,用于过滤所述MSER区域,得到所述待检测图像中的文本区域。
本发明实施例提供的技术方案带来的有益效果:
本技术方案中,获取待检测图像,从待检测图像中提取最大稳定极值MSER区域,其中,最大稳定极值区域为连通区域,过滤MSER区域,得到待检测图像中的文本区域。通过从待检测图像中提取MSER区域,以划分连通区域的方式提取MSER区域作为候选区域,再对提取到的MSER区域进行过滤筛选,最 终得到待检测图像中的文本区域,区域划分有利于减少计算量、提高检测效率,同时提取MSER区域可减少图像背景的干扰,可提高在检测背景复杂的待检测图像时的准确率。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对本发明实施例描述中所需要使用的附图作简单的介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据本发明实施例的内容和这些附图获得其他的附图。
图1是本发明实施例一提供的一种图像检测方法的流程示意图;
图2A是本发明实施例二提供的一种图像检测方法的流程示意图;
图2B是图2A中S250的可选实施方式的流程示意图;
图2C是本发明实施例二中使用的卷积神经网络模型的结构示意图;
图3是本发明实施例三提供的一种图像检测装置的架构示意图;
图4A是本发明实施例四提供的一种图像检测装置的架构示意图;
图4B是图4A中MSER区域过滤模块450的可选实施方式的架构示意图。
具体实施方式
为使本发明解决的技术问题、采用的技术方案和达到的技术效果更加清楚,下面将结合附图对本发明实施例的技术方案作进一步的详细描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
实施例一
请参考图1,其是本发明实施例一提供的一种图像检测方法的流程示意图。本实施例的方法可以由配置有摄像头的智能手机、平板电脑或笔记本电脑等移动设备来执行,可适用于检测识别自然场景图像中文本区域的情况。
本实施例提供的一种图像检测方法,可以包括以下步骤:
S110:获取待检测图像。
示例性的,在本发明实施例中,待检测图像可以为原始图像,也可以为对原始图像经过预处理的得到的图像。在本发明的一个实施例中,优选将原始图像进行预处理得到待检测图像。
S120:从待检测图像中提取最大稳定极值MSER区域。
示例性的,最大稳定极值(Maximally Stable Extrernal Regions,MSER)区域是指待检测图像经过一定的阈值变化后形成的连通区域,可以从待检测图像中提取出多个MSER区域,可以连通区域的最小外接矩形来表示MSER区域。其中,同一个连通区域内的颜色、纹理、字符笔画宽度等特征基本相同。
在待检测图像中所显示的每个矩形框均代表一个MSER区域,可以从待检测图像中提取出多个MSER区域,也可能提取不出MSER区域,即待检测图像中没有文本区域。
S130:过滤MSER区域,得到待检测图像中的文本区域。
示例性的,过滤MSER区域的方法有很多,例如根据MSER区域的区域特征来过滤。在本发明的实施例二提供了一种过滤MSER区域的可选实施方式,在此不加以赘述。
综上,在本技术方案中,获取待检测图像,从待检测图像中提取最大稳定极值MSER区域,其中,最大稳定极值区域为连通区域,过滤MSER区域,得 到待检测图像中的文本区域。通过从待检测图像中提取MSER区域,以划分连通区域的方式提取MSER区域作为候选区域,再对提取到的MSER区域进行过滤筛选,最终得到待检测图像中的文本区域,区域划分有利于减少计算量、提高检测效率,同时提取MSER区域可减少图像背景的干扰,可提高在检测背景复杂的图像时的准确率。
实施例二
请参考图2A、图2B和图2C,其中,图2A是本发明实施例二提供的一种图像检测方法的流程示意图,图2B是图2A中S250的可选实施方式的流程示意图,图2C是本发明实施例二中使用的卷积神经网络模型的结构示意图。本实施例与实施例一的主要区别在于,在实施例一的基础上增加了S210、S220、S260和S270的内容,并进一步提供了S250的可选实施方式。
本实施例提供的一种图像检测方法,可以包括如下步骤:
S210:接收初始图像。
示例性的,初始图像可以是通过摄像头拍摄自然场景得到的图像,通常是RGB图像。
S220:对初始图像进行颜色空间转换,以获得待检测图像。
示例性的,通过对初始图像进行颜色空间转换,得到R、G、B、Grayscale、H、S、V共7个通道的图像,作为待检测图像,后续步骤中均是对这7个图像进行操作。
S230:获取待检测图像。
S240:从待检测图像中提取最大稳定极值MSER区域。
示例性的,可以通过MSER算法从待检测图像中提取MSER区域,主要过 程为:对待检测图像进行二值化处理,调节二值化阈值在[0,255]范围内变化,当连通区域的面积变化幅度V(i)小于设定的变化幅度值时,确定连通区域为MSER区域;举例来说,对检测图像的灰度图二值化处理时,将像素值小于二值化阈值的像素点均设置像素值为0,将像素值不小于二值化阈值的像素点均设置像素值为255,则对应的二值化图像就经历一个从全黑到全白的过程(就像水位不断上升的俯瞰图),在这个过程中,有些连通区域的面积随着二值化阈值的变化而变化很小,即V(i)小于设定的变化幅度值(如0.25),这种连通区域就是MSER区域。
其中,
Figure PCTCN2017103283-appb-000001
Qi表示二值化阈值为i时连通区域的面积;Δ表示二值化阈值的微小变化;面积变化幅度V(i)表示当二值化阈值为i发生微小变化时,连通区域的面积变化程度。
S250:过滤MSER区域,得到待检测图像中的文本区域。
可选的,如图2B所示,过滤MSER区域可以包括S251、S252、S253和S254四个步骤,其中:
S251:统计MSER区域的像素值或区域长宽比。
示例性的,在实际应用中,拍摄到的自然场景图像几乎没有少于30个像素的文字图像,并且一般文字区域的长宽比也在一定的范围内,例如,文字区域的长宽比通常在0.3-3的范围内,因此可以根据确定的MSER区域矩形框内的像素值或长宽比,来初步过滤MSER区域中的非文本区域。
S252:将像素值小于预设像素阈值或区域长宽比不在预设范围内的MSER区域过滤。
示例性的,将像素数少于30,或区域长宽比不在0.3-3范围内的MSER区 域过滤。
此外,当一个文字区域有多个矩形框时,为减少计算量,可以从多个矩形框中选取其中一个来代表该文字区域。例如,对于任意一个矩形框A,当另一个矩形框B与矩形框A的重叠区域面积,与矩形框A和矩形框B并集的总面积的比值大于0.8时,则认为矩形框A和矩形框B位于同一个位置、代表的是同一个文字区域,将矩形框A和矩形框B合并,遍历剩余所有矩形框,将符合上述合并条件的矩形框与矩形框A合并,同时也对待检测图像中其他矩形框进行类似操作,可最大限度地减少后续计算量。
S253:连续对过滤后剩余的MSER区域进行卷积和下采样处理,获得特征映射图。
示例性的,本实施例采用MSER区域提取的二值化图像对卷积神经网络模型进行训练。如图2C所示,首先输入一张32*32的图像,经过6个5*5的核矩阵对输入图像进行卷积,得到C1层6个28*28的特征映射图;对C1层的特征映射图进行下采样处理,每4个像素(2*2)得到一个值,则得到S2层6个14*14的特征映射图;然后利用5*5的核矩阵对S2层的特征映射图进行卷积,得到C3层的16个10*10的特征映射图;和S2同理,对C3层的特征映射图进行下采样处理,可以得到S4层的16个5*5的特征映射图;利用5*5的核矩阵对S4层的特征映射图进行卷积,得到C5层的120个1*1的特征映射图;同理,对C5层的特征映射图进行下采样处理,可以得到F6层的84个1*1的特征映射图。
S254:将特征映射图输入到分类器中,根据分类器的输出结果确定MSER区域为文本区域。
示例性的,将上述S253中获得的F6层的特征映射图输入到softmax分类器中,根据softmax分类器的输出结果确定输入的图像为文本图像,相应的MSER 区域为文本区域。在其他实施例中,也可采用SVM等其他分类器。
经卷积神经网络模型对MSERA区域进行分类后,基本可以确定待检测图像中单个字符或文字的区域,基本过滤了非文本区域的矩形框,保留了文本区域矩形框。
S260:在水平方向上合并相邻文本区域。
示例性的,对于包含英文单词的待检测图像,还需要将各字符组合合并为单词。计算所有相邻字符区域之间的距离,并计算出平均距离;找到未被处理的最左侧的字符区域,然后在水平方向上依次寻找与字符区域最近的字符区域,当相邻两个字符区域的高度比在预设的高度比值范围内时,例如,高度比在0.5-2之间时,将这两个字符区域合并,当相邻两个字符区域之间的距离大于设定距离(如上述平均距离的3倍)时,停止迭代,这样可以划分出处于同一行的文本区域。
S270:对合并后的文本区域进行区域内单词分割。
示例性的,对于经上述S260合并后的每组文本区域内,若相邻两个字符区域之间的距离大于上述平均距离,则将该相邻的两个字符区域分割开,这样可以分割同一行中的不同单词。
重复S260和S270,直到所有文本区域均被处理。
需要说明的是,本发明实施例中所述的MSER区域,也表示MSER区域对应的区域图像。
综上,在本技术方案中,接收初始图像,对初始图像进行颜色空间转换,获取待检测图像,从待检测图像中提取最大稳定极值MSER区域,其中,最大稳定极值区域为连通区域,过滤MSER区域,得到待检测图像中的文本区域,并进一步对文本区域进行区域间合并及区域内单词分割。通过从待检测图像中 提取MSER区域,以划分连通区域的方式提取MSER区域作为候选区域,再对提取到的MSER区域进行过滤筛选,最终得到待检测图像中的文本区域,区域划分有利于减少计算量、提高检测效率,同时提取MSER区域可减少图像背景的干扰,可提高在检测背景复杂的图像时的准确率。
以下为本发明实施例提供的一种图像检测装置的实施例,图像检测装置与上述图像检测方法属于同一个发明构思,在装置的实施例中未详尽描述的细节内容,请参考上述方法的实施例。
实施例三
请参考图3,其是本发明实施例三提供的一种图像检测装置的架构示意图。
本实施例提供的一种图像检测装置300,可以包括以下内容:
待检测图像获取模块310,用于获取待检测图像。
MSER区域提取模块320,用于从待检测图像中提取最大稳定极值MSER区域,其中,MSER区域为连通区域。
MSER区域过滤模块330,用于过滤MSER区域,得到待检测图像中的文本区域。
综上,在本技术方案中,获取待检测图像,从待检测图像中提取最大稳定极值MSER区域,其中,最大稳定极值区域为连通区域,过滤MSER区域,得到待检测图像中的文本区域。通过从待检测图像中提取MSER区域,以划分连通区域的方式提取MSER区域作为候选区域,再对提取到的MSER区域进行过滤筛选,最终得到待检测图像中的文本区域,区域划分有利于减少计算量、提高检测效率,同时提取MSER区域可减少图像背景的干扰,可提高在检测背景 复杂的图像时的准确率。
实施例四
请参考图4A和图4B,其中,图4A是本发明实施例四提供的一种图像检测装置的架构示意图,图4B是图4A中MSER区域过滤模块450的可选实施方式的架构示意图。本实施例与实施例三的主要区别在于,在实施例三的基础上增加了初始图像接收模块410、颜色空间转换模块420、文本区域合并模块460和单词分割模块470的内容,并进一步提供了MSER区域过滤模块450的可选实施方式。
本实施例提供的一种图像检测装置400,可以包括如下内容:
初始图像接收模块410,用于接收初始图像。
颜色空间转换模块420,用于对初始图像进行颜色空间转换,以获得待检测图像。
待检测图像获取模块430,用于获取待检测图像。
MSER区域提取模块440,用于从待检测图像中提取最大稳定极值MSER区域,其中,MSER区域为连通区域。
优选的,MSER区域提取模块440,具体用于:
对待检测图像进行二值化处理,调节二值化阈值在[0,255]范围内变化,当连通区域的面积变化幅度V(i)小于设定的变化幅度值时,确定连通区域为MSER区域;
其中,
Figure PCTCN2017103283-appb-000002
Qi表示二值化阈值为i时连通区域的面积,Δ表示二值化阈值的微小变化。
MSER区域过滤模块450,用于过滤MSER区域,得到待检测图像中的文本区域。
可选的,如图4B所示,MSER区域过滤模块450可以包括统计单元451、过滤单元452、特征映射图获得单元453和文本区域确定单元454,其中:
统计单元451,用于统计MSER区域的像素值或区域长宽比。
过滤单元452,用于将像素值小于预设像素阈值或区域长宽比不在预设范围内的MSER区域过滤。
特征映射图获得单元453,用于连续对过滤后剩余的MSER区域进行卷积和下采样处理,获得特征映射图。
文本区域确定单元454,用于将特征映射图输入到分类器中,根据分类器的输出结果确定MSER区域为文本区域。
文本区域合并模块460,用于在水平方向上合并相邻文本区域。
单词分割模块470,用于对合并后的文本区域进行区域内单词分割。
综上,在本技术方案中,接收初始图像,对初始图像进行颜色空间转换,获取待检测图像,从待检测图像中提取最大稳定极值MSER区域,其中,最大稳定极值区域为连通区域,过滤MSER区域,得到待检测图像中的文本区域,并进一步对文本区域进行区域间合并及区域内单词分割。通过从待检测图像中提取MSER区域,以划分连通区域的方式提取MSER区域作为候选区域,再对提取到的MSER区域进行过滤筛选,最终得到待检测图像中的文本区域,区域划分有利于减少计算量、提高检测效率,同时提取MSER区域可减少图像背景的干扰,可提高在检测背景复杂的图像时的准确率。
注意,上述仅为本发明的较佳实施例及所运用技术原理。本领域技术人员 会理解,本发明不限于这里所述的特定实施例,对本领域技术人员来说能够进行各种明显的变化、重新调整和替代而不会脱离本发明的保护范围。因此,虽然通过以上实施例对本发明进行了较为详细的说明,但是本发明不仅仅限于以上实施例,在不脱离本发明构思的情况下,还可以包括更多其他等效实施例,而本发明的范围由所附的权利要求范围决定。

Claims (10)

  1. 一种图像检测方法,其特征在于,包括:
    获取待检测图像;
    从所述待检测图像中提取最大稳定极值MSER区域,其中,所述MSER区域为连通区域;
    过滤所述MSER区域,得到所述待检测图像中的文本区域。
  2. 如权利要求1所述的方法,其特征在于,所述接收待检测图像之前,还包括:
    接收初始图像;
    对所述初始图像进行颜色空间转换,以获得所述待检测图像。
  3. 如权利要求2所述的方法,其特征在于,所述从所述待检测图像中提取最大稳定极值MSER区域,包括:
    对所述待检测图像进行二值化处理,调节二值化阈值在[0,255]范围内变化,当所述连通区域的面积变化幅度V(i)小于设定的变化幅度值时,确定所述连通区域为MSER区域;
    其中,
    Figure PCTCN2017103283-appb-100001
    Qi表示所述二值化阈值为i时所述连通区域的面积,Δ表示所述二值化阈值的微小变化。
  4. 如权利要求3所述的方法,其特征在于,所述对所述过滤所述MSER区域,得到所述待检测图像中的文本区域,包括:
    统计所述MSER区域的像素值或区域长宽比;
    将像素值小于预设像素阈值或区域长宽比不在预设范围内的MSER区域过滤。
  5. 如权利要求4所述的方法,其特征在于,所述将像素值小于预设像素阈 值或区域长宽比不在预设范围内的MSER区域过滤之后,还包括:
    连续对过滤后剩余的MSER区域进行卷积和下采样处理,获得特征映射图;
    将所述特征映射图输入到分类器中,根据所述分类器的输出结果确定MSER区域为文本区域。
  6. 如权利要求1-5任一项所述的方法,其特征在于,所述过滤所述MSER区域,得到所述待检测图像中的文本区域之后,还包括:
    在水平方向上合并相邻文本区域;
    对合并后的文本区域进行区域内单词分割。
  7. 一种图像检测装置,其特征在于,包括:
    待检测图像获取模块,用于获取待检测图像;
    MSER区域提取模块,用于从所述待检测图像中提取最大稳定极值MSER区域,其中,所述MSER区域为连通区域;
    MSER区域过滤模块,用于过滤所述MSER区域,得到所述待检测图像中的文本区域。
  8. 如权利要求7所述的装置,其特征在于,所述装置还包括:
    初始图像接收模块,用于接收初始图像;
    颜色空间转换模块,用于对所述初始图像进行颜色空间转换,以获得所述待检测图像;
    文本区域合并模块,用于在水平方向上合并相邻文本区域;
    单词分割模块,用于对合并后的文本区域进行区域内单词分割。
  9. 如权利要求8所述的装置,其特征在于,所述MSER区域提取模块,具体用于:
    对所述待检测图像进行二值化处理,调节二值化阈值在[0,255]范围内变化, 当所述连通区域的面积变化幅度V(i)小于设定的变化幅度值时,确定所述连通区域为MSER区域;
    其中,
    Figure PCTCN2017103283-appb-100002
    Qi表示所述二值化阈值为i时所述连通区域的面积,Δ表示所述二值化阈值的微小变化。
  10. 如权利要求9所述的装置,其特征在于,所述MSER区域过滤模块包括:
    统计单元,用于统计所述MSER区域的像素值或区域长宽比;
    过滤单元,用于将像素值小于预设像素阈值或区域长宽比不在预设范围内的MSER区域过滤;
    特征映射图获得单元,用于连续对过滤后剩余的MSER区域进行卷积和下采样处理,获得特征映射图;
    文本区域确定单元,用于将所述特征映射图输入到分类器中,根据所述分类器的输出结果确定MSER区域为文本区域。
PCT/CN2017/103283 2017-02-13 2017-09-25 一种图像检测方法和装置 WO2018145470A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710076259.8 2017-02-13
CN201710076259.8A CN106846339A (zh) 2017-02-13 2017-02-13 一种图像检测方法和装置

Publications (1)

Publication Number Publication Date
WO2018145470A1 true WO2018145470A1 (zh) 2018-08-16

Family

ID=59127874

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/103283 WO2018145470A1 (zh) 2017-02-13 2017-09-25 一种图像检测方法和装置

Country Status (2)

Country Link
CN (1) CN106846339A (zh)
WO (1) WO2018145470A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110889843A (zh) * 2019-11-29 2020-03-17 西安电子科技大学 基于最大稳定极值区域的sar图像舰船目标检测方法
CN111027544A (zh) * 2019-11-29 2020-04-17 武汉虹信技术服务有限责任公司 一种基于视觉显著性检测的mser车牌定位方法及系统
CN111325199A (zh) * 2018-12-14 2020-06-23 中移(杭州)信息技术有限公司 一种文字倾斜角度检测方法及装置
CN111932581A (zh) * 2020-08-11 2020-11-13 沈阳帝信人工智能产业研究院有限公司 安全绳检测方法、装置、电子设备和可读存储介质

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106846339A (zh) * 2017-02-13 2017-06-13 广州视源电子科技股份有限公司 一种图像检测方法和装置
CN110334706B (zh) * 2017-06-30 2021-06-01 清华大学深圳研究生院 一种图像目标识别方法及装置
CN108268868B (zh) * 2017-07-28 2020-07-10 平安科技(深圳)有限公司 身份证图像的倾斜值获取方法及装置、终端、存储介质
CN107680108B (zh) * 2017-07-28 2019-06-21 平安科技(深圳)有限公司 倾斜图像的倾斜值获取方法、装置、终端及存储介质
CN108304835B (zh) 2018-01-30 2019-12-06 百度在线网络技术(北京)有限公司 文字检测方法和装置
CN108564084A (zh) * 2018-05-08 2018-09-21 北京市商汤科技开发有限公司 文字检测方法、装置、终端及存储介质
CN110058233B (zh) * 2019-04-28 2021-09-14 电子科技大学 一种多基地合成孔径雷达系统的抗欺骗性干扰方法
CN110379178B (zh) * 2019-07-25 2021-11-02 电子科技大学 基于毫米波雷达成像的无人驾驶汽车智能泊车方法
CN111368842A (zh) * 2020-02-29 2020-07-03 贵州电网有限责任公司 一种基于多层次最大稳定极值区域的自然场景文本检测方法
CN112036294B (zh) * 2020-08-28 2023-08-25 山谷网安科技股份有限公司 一种纸质表格结构自动识别的方法及装置
CN113793316B (zh) * 2021-09-13 2023-09-12 合肥合滨智能机器人有限公司 一种超声扫查区域提取方法、装置、设备和存储介质
CN114743025B (zh) * 2022-03-18 2023-03-24 北京理工大学 基于灰度稳定性的提高抗干扰性能的目标显著性检测方法

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750540A (zh) * 2012-06-12 2012-10-24 大连理工大学 基于形态滤波增强的最稳定极值区视频文本检测方法
CN103886319A (zh) * 2014-03-24 2014-06-25 北京大学深圳研究生院 一种基于机器视觉的举牌智能识别方法
CN104751142A (zh) * 2015-04-01 2015-07-01 电子科技大学 一种基于笔划特征的自然场景文本检测算法
CN105005764A (zh) * 2015-06-29 2015-10-28 东南大学 自然场景多方向文本检测方法
CN105868758A (zh) * 2015-01-21 2016-08-17 阿里巴巴集团控股有限公司 图像中文本区域检测方法、装置及电子设备
CN106156711A (zh) * 2015-04-21 2016-11-23 华中科技大学 文本行的定位方法及装置
CN106156777A (zh) * 2015-04-23 2016-11-23 华中科技大学 文本图片检测方法及装置
CN106296682A (zh) * 2016-08-09 2017-01-04 北京好运到信息科技有限公司 用于医学图像中文本区域检测的方法及装置
CN106846339A (zh) * 2017-02-13 2017-06-13 广州视源电子科技股份有限公司 一种图像检测方法和装置

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105447839A (zh) * 2015-11-20 2016-03-30 上海华力创通半导体有限公司 矩形框的合并方法及合并系统
CN105825216A (zh) * 2016-03-17 2016-08-03 中国科学院信息工程研究所 一种复杂背景图像中的文本定位方法

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750540A (zh) * 2012-06-12 2012-10-24 大连理工大学 基于形态滤波增强的最稳定极值区视频文本检测方法
CN103886319A (zh) * 2014-03-24 2014-06-25 北京大学深圳研究生院 一种基于机器视觉的举牌智能识别方法
CN105868758A (zh) * 2015-01-21 2016-08-17 阿里巴巴集团控股有限公司 图像中文本区域检测方法、装置及电子设备
CN104751142A (zh) * 2015-04-01 2015-07-01 电子科技大学 一种基于笔划特征的自然场景文本检测算法
CN106156711A (zh) * 2015-04-21 2016-11-23 华中科技大学 文本行的定位方法及装置
CN106156777A (zh) * 2015-04-23 2016-11-23 华中科技大学 文本图片检测方法及装置
CN105005764A (zh) * 2015-06-29 2015-10-28 东南大学 自然场景多方向文本检测方法
CN106296682A (zh) * 2016-08-09 2017-01-04 北京好运到信息科技有限公司 用于医学图像中文本区域检测的方法及装置
CN106846339A (zh) * 2017-02-13 2017-06-13 广州视源电子科技股份有限公司 一种图像检测方法和装置

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111325199A (zh) * 2018-12-14 2020-06-23 中移(杭州)信息技术有限公司 一种文字倾斜角度检测方法及装置
CN111325199B (zh) * 2018-12-14 2023-10-27 中移(杭州)信息技术有限公司 一种文字倾斜角度检测方法及装置
CN110889843A (zh) * 2019-11-29 2020-03-17 西安电子科技大学 基于最大稳定极值区域的sar图像舰船目标检测方法
CN111027544A (zh) * 2019-11-29 2020-04-17 武汉虹信技术服务有限责任公司 一种基于视觉显著性检测的mser车牌定位方法及系统
CN110889843B (zh) * 2019-11-29 2023-04-18 西安电子科技大学 基于最大稳定极值区域的sar图像舰船目标检测方法
CN111027544B (zh) * 2019-11-29 2023-09-29 武汉虹信技术服务有限责任公司 一种基于视觉显著性检测的mser车牌定位方法及系统
CN111932581A (zh) * 2020-08-11 2020-11-13 沈阳帝信人工智能产业研究院有限公司 安全绳检测方法、装置、电子设备和可读存储介质
CN111932581B (zh) * 2020-08-11 2023-09-26 沈阳帝信人工智能产业研究院有限公司 安全绳检测方法、装置、电子设备和可读存储介质

Also Published As

Publication number Publication date
CN106846339A (zh) 2017-06-13

Similar Documents

Publication Publication Date Title
WO2018145470A1 (zh) 一种图像检测方法和装置
CN109961049B (zh) 一种复杂场景下香烟品牌识别方法
CN110334706B (zh) 一种图像目标识别方法及装置
CN108171104B (zh) 一种文字检测方法及装置
Lu et al. Salient object detection using concavity context
CN104751142B (zh) 一种基于笔划特征的自然场景文本检测方法
WO2018018788A1 (zh) 一种基于图像识别的计量表抄表装置及其方法
CN104050471B (zh) 一种自然场景文字检测方法及系统
KR101403876B1 (ko) 차량 번호판 인식 방법과 그 장치
Sharma et al. Recent advances in video based document processing: a review
CN110929593A (zh) 一种基于细节辨别区别的实时显著性行人检测方法
CN110751154B (zh) 一种基于像素级分割的复杂环境多形状文本检测方法
CN104463134B (zh) 一种车牌检测方法和系统
CN111914698A (zh) 图像中人体的分割方法、分割系统、电子设备及存储介质
Zhu et al. Detecting natural scenes text via auto image partition, two-stage grouping and two-layer classification
CN113191421A (zh) 一种基于Faster-RCNN的手势识别系统及方法
Giri Text information extraction and analysis from images using digital image processing techniques
CN108564020B (zh) 基于全景3d图像的微手势识别方法
JP6377214B2 (ja) テキスト検出方法および装置
CN110276260B (zh) 一种基于深度摄像头的商品检测方法
CN110147755B (zh) 基于上下文级联cnn的人头检测方法
CN109784176B (zh) 车载热成像行人检测RoIs提取方法和装置
Sharma Extraction of text regions in natural images
Zhang et al. Salient object detection based on background model
Zhao et al. Comparative analysis of several vehicle detection methods in urban traffic scenes

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17896068

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 16.12.2019)

122 Ep: pct application non-entry in european phase

Ref document number: 17896068

Country of ref document: EP

Kind code of ref document: A1