CN108038481A

CN108038481A - A kind of combination maximum extreme value stability region and the text positioning method of stroke width change

Info

Publication number: CN108038481A
Application number: CN201711310281.0A
Authority: CN
Inventors: 张再跃; 潘立; 刘亮亮; 刘嘎琼; 武子毅
Original assignee: Jiangsu University of Science and Technology; Marine Equipment and Technology Institute Jiangsu University of Science and Technology
Current assignee: Jiangsu University of Science and Technology; Marine Equipment and Technology Institute Jiangsu University of Science and Technology
Priority date: 2017-12-11
Filing date: 2017-12-11
Publication date: 2018-05-15

Abstract

The invention relates to a text method combining the maximum extreme value stable area and stroke width variation, using MSER to detect text on images; then performing edge processing on images; and calculating stroke width values along the gradient direction of edge pixels; using form Remove the noise, fill the gap and calculate the connected domain; finally filter the non-text domain and merge the connected domain according to the rules. The advantage of the present invention is that: the present invention obtains the rough text domain through MSER detection, and combines the edge processing, the stroke width change feature and the morphological operation to realize the text positioning in the natural scene image. It is proved by experiments that the present invention has a high accuracy rate , which is beneficial to the subsequent text segmentation and text recognition work, has very obvious practical significance in the field of natural scene text positioning, and can be widely used.

Description

A text localization method combining maximum extremum stable region and stroke width variation

技术领域technical field

本发明涉及人工智能计算机领域中的图像处理，特别涉及一种利用图像处理来实现自然场景中的文本定位方法。The invention relates to image processing in the field of artificial intelligence computers, in particular to a method for realizing text positioning in natural scenes by using image processing.

背景技术Background technique

在进行自然场景文本定位的过程中，有一个基本的而且不可避免的问题：对有着复杂自然背景的图像来说，如何避免文本布局、字体类型、光照强度及拍摄角度等因素影响，准确获取文本位置。In the process of text positioning in natural scenes, there is a basic and unavoidable problem: for images with complex natural backgrounds, how to avoid the influence of factors such as text layout, font type, light intensity, and shooting angle, and accurately obtain text Location.

文本定位在文本检测过程中是至关重要的，文本定位效果的好坏直接决定了之后文本分割以及文本识别的准确度。文本定位在自然场景中的应用越来越广泛，然而复杂的自然场景环境给这项技术带来了许多挑战。不同于传统的文本定位技术，自然场景中有着大量的干扰物，同时拍摄角度及字体等因素会使文本发生形变，使文本定位更加困难。因此，需要寻找文本特征，使文本定位过程不受这些因素的影响。Text positioning is crucial in the text detection process, and the quality of the text positioning effect directly determines the accuracy of subsequent text segmentation and text recognition. Text localization is more and more widely used in natural scenes, but the complex natural scene environment brings many challenges to this technique. Different from the traditional text positioning technology, there are a lot of distracting objects in the natural scene. At the same time, factors such as shooting angle and font will deform the text, making text positioning more difficult. Therefore, it is necessary to find text features so that the text localization process is not affected by these factors.

自然场景中文本定位的方法有很多，主要分为滑动窗口方法和连通域分析方法两类。滑动窗口方法利用一个移动的窗口对图像的所有位置进行文本检测，连通域分析方法通过选取图像特征来获取候选连通域，筛选合并后实现文本定位。There are many methods for text localization in natural scenes, mainly divided into two categories: sliding window method and connected domain analysis method. The sliding window method uses a moving window to detect text in all positions of the image, and the connected domain analysis method obtains candidate connected domains by selecting image features, and realizes text positioning after screening and merging.

在自然场景文本定位中，常遇到几个难题需要解决：In natural scene text positioning, several problems are often encountered that need to be solved:

1)文本特征提取是一个自然场景中文本定位的步骤，因此在提取文本特征前需要进行图像预处理，然后提取图像中所需的文本特征，并根据所提取的特征生成候选连通域。1) Text feature extraction is a step of text positioning in a natural scene. Therefore, image preprocessing is required before extracting text features, and then the required text features in the image are extracted, and candidate connected domains are generated based on the extracted features.

2)如何来区分文本域和非文本域，自然场景中存在这大量与文本具有十分类似特征的干扰物，如植物、路标、栏杆等。因此在获得候选连通域以后，需要区分其中的非文本域并滤除。2) How to distinguish between text domains and non-text domains. There are a large number of distracting objects with very similar characteristics to texts in natural scenes, such as plants, road signs, railings, etc. Therefore, after obtaining the candidate connected domains, it is necessary to distinguish the non-text domains and filter them out.

3)自然场景中的文本形式多种多样，包含了不同的字体以及语言。因此如何使定位方法能够兼容各种语言及字体是需要解决的问题。3) Text forms in natural scenes are diverse, including different fonts and languages. Therefore, how to make the positioning method compatible with various languages and fonts is a problem to be solved.

因此想要实现在自然场景中进行文本定位并获得较高的准确率，需要考虑如下有待解决的问题：Therefore, in order to achieve text positioning in natural scenes and obtain a high accuracy rate, the following problems need to be considered:

技术问题1：自然场景图像预处理后文本特征的提取。如何选择要提取的文本特征，使得定位方法可以有效的克服自然场景干扰因素以及多字体兼容问题；Technical problem 1: Extraction of text features after natural scene image preprocessing. How to select the text features to be extracted, so that the positioning method can effectively overcome the interference factors of natural scenes and the compatibility of multiple fonts;

技术问题2：候选连通域过滤问题。如何设计规则来生成候选连通域并过滤区分非文本域；Technical problem 2: Candidate connected domain filtering problem. How to design rules to generate candidate connected domains and filter to distinguish non-text domains;

技术问题3：单字连通域合并问题。如何筛选单字连通域并合并成文本域；Technical problem 3: The problem of merging single-word connected domains. How to filter single-word connected domains and merge them into text domains;

针对上述难题和问题，本发明提出并且实现了结合最大极值稳定区域和笔画宽度变化特征的自然场景文本定位方法。Aiming at the above difficulties and problems, the present invention proposes and implements a natural scene text positioning method that combines the maximum extreme value stable region and stroke width variation features.

发明内容Contents of the invention

本发明要解决的技术问题是提供一种结合最大极值稳定区域和笔画宽度变化的文本定位方法，以实现在自然场景图像中实现准确的文本定位。The technical problem to be solved by the present invention is to provide a text positioning method that combines the maximum extreme value stable area and stroke width variation, so as to realize accurate text positioning in natural scene images.

为解决上述技术问题，本发明的技术方案为：一种结合最大极值稳定区域和笔画宽度变化的文本定位方法，其创新点在于：所述文本定位方法包括如下步骤：In order to solve the above-mentioned technical problems, the technical solution of the present invention is: a text positioning method combining the maximum extreme value stable area and stroke width variation, and its innovative point is that the text positioning method includes the following steps:

(1)利用MSER进行对文本域进行检测：对原始图像进行灰度化，并用0-255的整数来表示图像中各像素点的灰度值；在图像灰度值范围内任取一阈值，灰度值小于阈值的像素点定义为黑色，大于阈值的像素点为白色，当阈值为0时，整个图像为白色，在阈值从0到255变化过程中，黑色区域稳定不变且区域梯度最小，则该区域为最大稳定极值区域；(1) Use MSER to detect the text field: grayscale the original image, and use an integer of 0-255 to represent the grayscale value of each pixel in the image; choose a threshold within the grayscale value range of the image, Pixels whose gray value is less than the threshold are defined as black, and pixels greater than the threshold are white. When the threshold is 0, the entire image is white. When the threshold changes from 0 to 255, the black area is stable and the area gradient is the smallest. , then this area is the maximum stable extremum area;

(2)Canny算子边缘化处理图像：利用高斯滤波器平滑图像，对滤波后的图像计算其梯度幅度和梯度方向，对梯度幅度进行非极大值抑制，找出图像梯度中的局部极大值点，并将非局部最大值点置零，以此细化图像边缘，用双阈值算法检测和连接边缘；(2) Canny operator marginalizes the image: use the Gaussian filter to smooth the image, calculate the gradient magnitude and gradient direction of the filtered image, suppress the non-maximum value of the gradient magnitude, and find the local maximum in the image gradient Value points, and set the non-local maximum points to zero, so as to refine the image edge, and use the double threshold algorithm to detect and connect the edges;

(3)获取图像笔画宽度特征：对每一个边缘像素点，在垂直于边缘的梯度方向上定义一条射线，沿射线方向寻找对应的另一边缘像素点，在梯度方向上找到另一边缘像素点，且该点的梯度方向与原梯度方向近似相反，则这两个边缘像素点的距离被认为是笔画宽度；未找到对应像素点或对应像素点梯度方向不近似相反，则丢弃该射线，在更为复杂的笔画环境中，沿未舍弃的射线计算所有像素点的笔画宽度中值m，并将射线上所有笔画宽度值大于m的像素点笔画宽度中值都设为m；(3) Obtain image stroke width features: for each edge pixel point, define a ray in the gradient direction perpendicular to the edge, find another corresponding edge pixel point along the ray direction, and find another edge pixel point in the gradient direction , and the gradient direction of this point is approximately opposite to the original gradient direction, then the distance between the two edge pixels is considered as the stroke width; if no corresponding pixel is found or the gradient direction of the corresponding pixel is not approximately opposite, the ray is discarded, and in In a more complex stroke environment, calculate the median stroke width m of all pixels along the undiscarded ray, and set the median stroke width of all pixels on the ray whose stroke width is greater than m to m;

(4)形态学运算处理图像：对图像使用开、闭运算，开运算先对图像进行腐蚀操作，去除图像的边缘毛刺，然后进行膨胀操作，填充图像细小缝隙和小孔，闭运算先进行膨胀操作，填补图像的断裂区域，轮廓缺口，然后进行腐蚀操作，平滑图像边缘；(4) Morphological operation processing image: use opening and closing operations on the image. The opening operation first corrodes the image to remove the edge burrs of the image, and then performs the expansion operation to fill the small gaps and holes in the image. The closing operation first performs expansion Operation, fill the broken area of the image, outline the gap, and then perform an erosion operation to smooth the edge of the image;

(5)候选文本域生成：将文本像素点根据规则聚集成候选文本域，相邻像素点笔画宽度值在阈值范围内则归为同一连通域，计算连通域的宽高比及面积比，将超过阈值范围的连通域作为非文本域滤除；(5) Candidate text domain generation: the text pixels are gathered into candidate text domains according to the rules, and the stroke width values of adjacent pixels within the threshold range are classified as the same connected domain, and the aspect ratio and area ratio of the connected domain are calculated, and the Connected domains exceeding the threshold range are filtered out as non-text domains;

(6)文本域合并：对单字文本域作进一步过滤，相邻单字文本域内笔画宽度均值比、高度比、及像素点颜色均值比超过阈值，将偏差较大的连通域作为噪声滤除，将剩余连通域聚集成连，形成连续文本域。(6) Text field merging: the single-character text field is further filtered, and the stroke width average ratio, height ratio, and pixel point color average ratio in the adjacent single-character text field exceed the threshold value, and the connected domain with a large deviation is filtered out as noise, and The remaining connected domains are aggregated into links to form continuous text domains.

进一步地，所述利用MSER进行对文本域进行检测的步骤中，最大稳定极值区域是一种依赖于区域内部和边界像素关系，根据稳定性判定条件来获取最大稳定极值区域的算法；对于输入图像进行灰度化，在0-255的图像灰度值范围内任取一阈值，Q1，….，Qi，…是一系列嵌套极值区域，且满足q(i)＝|Q_i+Δ\Q_i-Δ|/|Q_i|在i^*有局部最小值，则Q_i*是最大极值稳定区域MSER。Further, in the step of using MSER to detect the text field, the maximum stable extremum region is an algorithm that relies on the relationship between the interior of the region and the boundary pixels, and obtains the maximum stable extremum region according to the stability determination conditions; for The input image is grayscaled, and a threshold value is randomly selected within the gray value range of 0-255. Q1,..., Qi,... are a series of nested extreme value regions, and If q(i)=|Q _i+Δ \Q _i-Δ |/|Q _i | has a local minimum at i ^* , then Q _i* is the maximum extremum stable region MSER.

进一步地，所述的Canny算子边缘检测是基于最优化思想的边缘检测算子，该算法采用合适的二维高斯函数分别按行和列对图像进行平滑去噪，计算图像梯度的幅度和方向，并通过对梯度幅度的最大值抑制，来找到图像梯度中的局部极大值点，置零非局部极大值点，使边缘得以细化，采用T₁、T₂双阈值算法检测，用T₁来获得每一条线段，用T₂来在线段两边寻找断裂处，并连接边缘；其中，所述二维高斯函数为：Further, the Canny operator edge detection is an edge detection operator based on the idea of optimization. This algorithm uses a suitable two-dimensional Gaussian function to smooth and denoise the image by row and column respectively, and calculates the magnitude and direction of the image gradient. , and by suppressing the maximum value of the gradient amplitude, find the local maximum point in the image gradient, set the non-local maximum point to zero, so that the edge can be refined, use T ₁ , T ₂ double threshold algorithm for detection, and use T ₁ is used to obtain each line segment, and T ₂ is used to find breaks on both sides of the line segment and connect the edges; wherein, the two-dimensional Gaussian function is:

I(x,y)＝G(x,y)*f(x,y)；I(x,y)=G(x,y)*f(x,y);

所述梯度幅度和梯度方向的计算公式为：The calculation formulas of the gradient magnitude and gradient direction are:

θ(x，y)＝arctan(g_y/g_x)；其中σ是高斯曲线的标准差，(g_x,g_y)表示梯度。θ(x, y)=arctan(g _y /g _x ); where σ is the standard deviation of the Gaussian curve, and (g _x , g _y ) represents the gradient.

进一步地，所述笔画宽度计算的步骤中，所述笔画宽度值为d_swt；所述笔画宽度值的计算步骤包括：将每个边缘像素点p的梯度方向称为d_p，梯度方向d_p垂直于边缘方向，定义一条射线r＝p+n·d_p，n>0，沿射线方向找另一个边缘像素点q，若q的梯度方向d_q与d_p近似相反(d_q＝-d_q+π/6)，则该像素点笔画宽度值d_swt为:其中x_p、y_p分别是像素点p的横、纵坐标，x_q、y_q分别是像素点q的横、纵坐标；在更为复杂的笔画环境中，上述计算流程获得的笔画宽度值并不准确，沿未舍弃的射线计算所有像素点的笔画宽度中值m，并将射线上所有笔画宽度值大于m的像素点笔画宽度中值都设为m。Further, in the step of calculating the stroke width, the value of the stroke width is d _swt ; the step of calculating the value of the stroke width includes: calling the gradient direction of each edge pixel point p d _p , and the gradient direction d _p Perpendicular to the edge direction, define a ray r=p+n d _p , n>0, find another edge pixel point q along the ray direction, if the gradient direction d _q of q is approximately opposite to d _p (d _q =-d _q +π/6), then the pixel stroke width value d _swt is: Among them, x _p and y _p are the horizontal and vertical coordinates of the pixel point p respectively, and x _q and y _q are the horizontal and vertical coordinates of the pixel point q respectively; in a more complex stroke environment, the stroke width value obtained by the above calculation process It is not accurate. Calculate the median stroke width m of all pixels along the undiscarded ray, and set the median stroke width of all pixels on the ray whose stroke width is greater than m to m.

进一步地，所述利用形态学运算处理图像的步骤中，主要包括了开、闭运算，开运算先对图像进行腐蚀操作，去除图像的边缘毛刺，然后进行膨胀操作，填充图像细小缝隙和小孔，闭运算先进行膨胀操作，填补图像的断裂区域，轮廓缺口，然后进行腐蚀操作，平滑图像边缘；所述的开运算记为定义为：所述的闭运算记为A·B，定义为：其中A为图像，B为结构元素。Further, the step of using morphological operations to process the image mainly includes opening and closing operations. The opening operation first performs an erosion operation on the image to remove edge burrs of the image, and then performs an expansion operation to fill small gaps and small holes in the image , the closing operation first performs the expansion operation to fill the fractured area of the image and the contour gap, and then performs the erosion operation to smooth the edge of the image; the opening operation is denoted as defined as: The closed operation is denoted as A·B, which is defined as: Where A is an image and B is a structural element.

进一步地，所述候选文本域生成的步骤中，主要通过计算连通域属性并设置规则和阈值来滤除非文本域，所包括的规则有：笔画宽度方差、宽高比、面积比；所述笔画宽度方差用来判断像素点是否属于同一连通域，若笔画宽度值相似，则将这像素点归为同一连通域。笔画宽度值均值μ_swt与方差σ_swt ²的计算公式为：其中N是连通域内像素点总数，是第i个像素点的笔画宽度值；所述宽高比用来滤除因噪声干扰而产生的细小狭长的连通域，连通域宽高比r＝d_height/d_width，宽高比阈值为2；所述面积比用来过滤面积过大或过小的连通域，连通域面积比阈值为2。Further, in the step of generating the candidate text domains, the non-text domains are mainly filtered out by calculating the attributes of the connected domains and setting rules and thresholds, the rules included include: stroke width variance, aspect ratio, area ratio; the stroke Width variance is used to judge whether the pixel points belong to the same connected domain. If the stroke width values are similar, the pixel is classified as the same connected domain. The calculation formula of stroke width value mean μ _swt and variance σ _swt ² is: where N is the total number of pixels in the connected domain, is the stroke width value of the i-th pixel; the aspect ratio is used to filter out the small and narrow connected domains caused by noise interference, the connected domain aspect ratio r=d _height /d _width , and the aspect ratio threshold is 2. The area ratio is used to filter the connected domains whose area is too large or too small, and the area ratio threshold of the connected domains is 2.

进一步地，所述文本域合并的步骤中，对单字候选域作进一步筛选，并将剩余单字连通域聚集成链，形成连续的文本域，单字连通域的筛选条件有笔画宽度比、高度比、颜色均值差；所述笔画宽度比用来判断相邻单字文本域是否属于同一文本域，相邻单字文本域笔画宽度比阈值为2；所述高度比用来判断相邻单字文本域是否属于同一水平方向文本域，相邻单字文本域高度比阈值为2；所述颜色均值用来判断相邻单字文本域是否属于同一文本域，相邻单字文本域颜色均值差阈值为40。Further, in the step of merging the text domains, the individual character candidate domains are further screened, and the remaining individual character connected domains are gathered into chains to form continuous text domains. The screening conditions of the single character connected domains include stroke width ratio, height ratio, Color mean value difference; The stroke width ratio is used to judge whether adjacent single-character text domains belong to the same text domain, and the stroke width ratio threshold of adjacent single-character text domains is 2; The height ratio is used to judge whether adjacent single-character text domains belong to the same text domain For text fields in the horizontal direction, the height ratio threshold of adjacent single-character text fields is 2; the color mean value is used to determine whether adjacent single-character text fields belong to the same text field, and the color mean difference threshold of adjacent single-character text fields is 40.

本发明的优点在于：本发明利用最大极值稳定区域的仿射不变性对图像进行MSER文本检测获取多个候选文本域；在此基础上通过Canny算子对图像进行边缘化处理；针对所有边缘像素点提取笔画宽度特征获得连通域；再对非文本域做进一步过滤，合并单字文本域，实现了自然场景中文本定位，经实验证明，本发明准确率高，可广泛推广使用。如结合之后的文本分割及文本识别，可以很好的实现在自然场景中进行文本检测的目的，在图像处理领域中具有十分明显的实用意义。The advantages of the present invention are: the present invention utilizes the affine invariance of the maximum extremum stable area to perform MSER text detection on images to obtain multiple candidate text domains; on this basis, the image is edged through the Canny operator; for all edges Pixel points extract stroke width features to obtain connected domains; then further filter non-text domains, merge single-character text domains, and realize text positioning in natural scenes. Experiments have proved that the invention has high accuracy and can be widely used. For example, after combining text segmentation and text recognition, the purpose of text detection in natural scenes can be well realized, and it has very obvious practical significance in the field of image processing.

本发明采用ICDAR2003文本定位竞赛数据集测试数据进行实验，实验结果表明：本发明提供的结合最大极值稳定区域和笔画宽度特征的方法能有效的在自然场景中进行文本定位。统计分析后得知，本发明提供的结合最大极值稳定区域和笔画宽度特征的自然场景文本定位方法，其定位的准确率达74.1％。The present invention uses the test data of the ICDAR2003 text positioning competition data set for experiments, and the experimental results show that: the method provided by the present invention combined with the maximum extreme value stable region and stroke width features can effectively perform text positioning in natural scenes. After statistical analysis, it is known that the natural scene text positioning method combined with the maximum extremum stable region and stroke width features provided by the present invention has a positioning accuracy rate of 74.1%.

附图说明Description of drawings

下面结合附图和具体实施方式对本发明作进一步详细的说明。The present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

图1为本发明结合最大极值稳定区域和笔画宽度变化的文本定位方法的流程图。Fig. 1 is a flow chart of the text positioning method combined with the maximum extreme value stable area and stroke width variation according to the present invention.

具体实施方式Detailed ways

下面的实施例可以使本专业的技术人员更全面地理解本发明，但并不因此将本发明限制在所述的实施例范围之中。The following examples can enable those skilled in the art to understand the present invention more comprehensively, but the present invention is not limited to the scope of the described examples.

实施例Example

如图1所示，本实施例提供的结合最大极值稳定区域和笔画宽度特征的文本定位方法，包括以下步骤：As shown in Figure 1, the text positioning method combined with the maximum extremum stable region and the stroke width feature provided by the present embodiment includes the following steps:

1.利用MSER对输入图像进行文本检测的步骤，包括：1. The steps of using MSER to perform text detection on the input image, including:

笔画宽度特征的提取依赖于图像的边缘特征效果，本发明结合MSER对图像进行文本检测，获取粗略的文本位置，提高了之后图像边缘化以及笔画宽度特征提取的准确度。The extraction of the stroke width feature depends on the edge feature effect of the image. The invention combines MSER to detect the text of the image to obtain a rough text position, which improves the accuracy of image edge and stroke width feature extraction.

所述最大稳定极值区域是一种依赖区域内部和边界像素关系的算法，对灰度图像I：其最大稳定极值区域的定义为：S是全序的，S＝(0,1,…，255}，且满足反对称性、传递性、完全性；定义4邻域的邻域关系当p，q∈D，如果满足p，q是邻接的，记作pAq；区域Q是D的一个连续子集，对于任意p，q∈Q，存在一个序列p，a₁，a₂，...，a_n，q，使得pAa₁，…，a_iAa_i+1，…，a_nAq；区域的边界区域Q的边缘是一像素集合，至少有一个像素与区域Q邻接但不属于区域Q；极值区域是一个区域，对于所有p∈Q，如果I(p)＞I(q)，则区域Q是极大区域，反之为极小区域；设Q₁，…，Q_i-1，Q_i，…是一系列嵌套极值区域，有如果满足q(i)＝|Q_i+Δ\Q_i-Δ|/|Q_i|在i^*有局部最小值，则Q_i*是最大极值稳定区域MSER。The maximum stable extremum region is an algorithm that depends on the relationship between the interior of the region and the boundary pixels. For the grayscale image I: Its maximum stable extremum region is defined as: S is a total order, S=(0,1,...,255}, and satisfies antisymmetry, transitivity, and completeness; define the neighborhood relationship of 4 neighborhoods When p, q∈D, if satisfy p, q are adjacent, denoted as pAq; region Q is a continuous subset of D, for any p, q∈Q, there is a sequence p, a ₁ , a ₂ ,..., a _n , q, such that pAa ₁ ,...,a _i Aa _i+1 ,...,a _n Aq; the boundary of the region The edge of region Q is a set of pixels, at least one pixel is adjacent to the area Q but does not belong to the area Q; the extremum area is a region, for all p ∈ Q, If I(p)>I(q), the area Q is a maximal area, otherwise it is a minimal area; let Q ₁ ,...,Q _i-1 ,Q _i ,... be a series of nested extremum areas, we have If q(i)=|Q _i+Δ \Q _i-Δ |/|Q _i | has a local minimum at i ^* , then Q _i* is the maximum extremum stable region MSER.

2.Canny算子边缘化处理图像的步骤，包括：2. The steps of Canny operator edge processing image, including:

采用合适的二维高斯函数分别按行和列对图像进行平滑去噪，计算图像梯度的幅度和方向；Use a suitable two-dimensional Gaussian function to smooth and denoise the image by row and column respectively, and calculate the magnitude and direction of the image gradient;

所述二维高斯函数为：The two-dimensional Gaussian function is:

I(x,y)＝G(x,y)*f(x,y) (2)；I(x,y)=G(x,y)*f(x,y) (2);

θ(x，y)＝arctan(g_y/g_x) (4)；θ(x, y) = arctan(g _y /g _x ) (4);

其中σ是高斯曲线的标准差，(g_x,g_y)表示梯度；Where σ is the standard deviation of the Gaussian curve, (g _x , g _y ) represents the gradient;

对梯度图像进行非极大值抑制处理，沿梯度方向比较每一个像素的8邻域的梯度幅度。若在梯度方向上的两个像素的幅度值均小于该像素点自身的幅度值，那么该像素点可能是边缘像素点，否则将该像素点的梯度幅度设为0。根据梯度直方图计算得到低阈值t₁和高阈值t₂，并对图像按t₁、t₂进行两次阈值处理，若梯度小于阈值，则将其灰度值设为0。Perform non-maximum value suppression processing on the gradient image, and compare the gradient magnitude of the 8 neighborhoods of each pixel along the gradient direction. If the amplitude values of two pixels in the gradient direction are both smaller than the amplitude value of the pixel itself, then the pixel may be an edge pixel, otherwise the gradient amplitude of the pixel is set to 0. The low threshold t ₁ and high threshold t ₂ are calculated according to the gradient histogram, and the image is thresholded twice according to t ₁ and t _2. If the gradient is smaller than the threshold, its gray value is set to 0.

3.笔画宽度特征的提取步骤，包括：3. The extraction steps of the stroke width feature, including:

将每个元素的初始笔画宽度值设为无穷大，在用Canny算子获取到边缘信息以后，将每个边缘像素点p的梯度方向称为d_p，由于边缘像素点p是在边缘上的，因此梯度方向d_p一定垂直于边缘方向；定义一条射线r＝p+n·dp，n>0，沿着这条射线方向找另一个边缘像素点q，如果q的梯度方向d_q与d_p方向近似相反(d_q＝-d_p+π/6)，则该像素点笔画宽度值d_swt为：Set the initial stroke width value of each element to infinity. After the edge information is obtained by the Canny operator, the gradient direction of each edge pixel point p is called _dp . Since the edge pixel point p is on the edge, Therefore, the gradient direction d _p must be perpendicular to the edge direction; define a ray r=p+n dp, n>0, find another edge pixel point q along this ray direction, if the gradient direction d _q of q is the same as d _p The directions are approximately opposite (d _q =-d _p +π/6), then the pixel stroke width value d _swt is:

如果没有找到对应的边缘像素点q或者边缘像素点q的梯度方向d_q和d_p不相反，则将该射线r丢弃；If the corresponding edge pixel point q is not found or the gradient direction d _q and d _p of the edge pixel point q are not opposite, the ray r is discarded;

然而在如笔画拐角这样更为复杂的笔画环境中，根据上述的计算流程得到的笔画宽度值并不准确，因此需再次沿所有未被舍弃的射线，计算其所有像素点的笔画宽度中值m，将射线上所有大于m的像素点的笔画宽度中值都设置为m。However, in a more complex stroke environment such as a stroke corner, the stroke width value obtained according to the above calculation process is not accurate, so it is necessary to calculate the median stroke width m of all pixels along all undiscarded rays , set the median stroke width of all pixels larger than m on the ray to m.

4.形态学运算处理图像的步骤，包括：4. Steps for image processing by morphological operations, including:

开运算操作可以使图像边缘更加平滑，去掉边缘上的一些参差的毛刺，去除狭窄的区域。闭运算操作相反，它能去除区域中的噪声，填充狭窄断裂的部分以及边缘的缺口，设整数空间Z中有图像A和集合B，将B对A的开运算记为定义为：The open operation can make the edge of the image smoother, remove some jagged burrs on the edge, and remove narrow areas. On the contrary, the closing operation operation can remove the noise in the area, fill the narrow and broken part and the gap at the edge. Assuming that there are image A and set B in the integer space Z, the opening operation of B to A is recorded as defined as:

相应的将结构元素B对图像A的闭运算记为A·B，定义为：Correspondingly, the closing operation of structure element B on image A is recorded as A·B, which is defined as:

其中A为图像，B为结构元素。Where A is an image and B is a structural element.

5.候选文本域生成的步骤，包括：5. The steps of candidate text domain generation, including:

候选文本域生成的步骤中，主要通过计算连通域属性并设置规则和阈值来滤除非文本域，所包括的规则有：笔画宽度方差、宽高比、面积比；In the step of generating candidate text domains, non-text domains are mainly filtered out by calculating connected domain attributes and setting rules and thresholds. The rules included are: stroke width variance, aspect ratio, and area ratio;

所述笔画宽度方差用来判断像素点是否属于同一连通域，若笔画宽度值相似，则将这像素点归为同一连通域，笔画宽度值均值μ_swt与方差σ_swt ²的计算公式为：Described stroke width variance is used for judging whether pixel point belongs to same connected domain, if stroke width value is similar, then this pixel point is classified as same connected domain, the calculation formula of stroke width value mean value μ _swt and variance σ _swt ² is:

其中N是连通域内像素点总数，是第i个像素点的笔画宽度值；where N is the total number of pixels in the connected domain, is the stroke width value of the i-th pixel;

所述宽高比用来滤除因噪声干扰而产生的细小狭长的连通域，连通域宽高比r＝d_height/d_width，宽高比阈值为2；The aspect ratio is used to filter out small and narrow connected domains caused by noise interference, the connected domain aspect ratio r=d _height /d _width , and the aspect ratio threshold is 2;

所述面积比用来过滤面积过大或过小的连通域，连通域面积比阈值为2。The area ratio is used to filter the connected domains whose area is too large or too small, and the area ratio threshold of the connected domains is 2.

6.文本域合并的步骤，包括：6. Steps for merging text fields, including:

对单字候选域作进一步筛选，并将剩余单字连通域聚集成链，形成连续的文本域，单字连通域的筛选条件有笔画宽度比、高度比、颜色均值差；Perform further screening on the single-character candidate domain, and gather the remaining single-character connected domains into a chain to form a continuous text domain. The screening conditions for the single-character connected domain include stroke width ratio, height ratio, and color mean difference;

所述笔画宽度比用来判断相邻单字文本域是否属于同一文本域，相邻单字文本域笔画宽度比阈值为2；The stroke width ratio is used to judge whether adjacent single-character text domains belong to the same text domain, and the stroke width ratio threshold value of adjacent single-character text domains is 2;

所述高度比用来判断相邻单字文本域是否属于同一水平方向文本域，相邻单字文本域高度比阈值为2；The height ratio is used to judge whether adjacent single-character text domains belong to the same horizontal direction text domain, and the height ratio threshold of adjacent single-character text domains is 2;

所述颜色均值用来判断相邻单字文本域是否属于同一文本域，相邻单字文本域颜色均值差阈值为40。The color mean value is used to judge whether adjacent single-character text fields belong to the same text field, and the color mean value difference threshold of adjacent single-character text fields is 40.

实验：利用本发明采用ICDAR2003文本定位竞赛数据集测试数据进行实验。实验结果表明：本发明提供的结合最大极值稳定区域和笔画宽度特征的方法能有效的在自然场景中进行文本定位。统计分析后得知，本发明提供的结合最大极值稳定区域和笔画宽度特征的自然场景文本定位方法，其定位的准确率达74.1％。从实验结果可见，本发明能有效的在自然场景中实现文本定位，准确率高，具有非常广泛的使用价值。Experiment: Utilize the present invention to adopt ICDAR2003 text localization competition data set test data to carry out experiment. Experimental results show that the method provided by the present invention that combines the features of maximum extreme value stable area and stroke width can effectively locate text in natural scenes. After statistical analysis, it is known that the natural scene text positioning method combined with the maximum extremum stable region and stroke width features provided by the present invention has a positioning accuracy rate of 74.1%. It can be seen from the experimental results that the present invention can effectively realize text positioning in natural scenes, has high accuracy and has very wide application value.

以上显示和描述了本发明的基本原理和主要特征以及本发明的优点。本行业的技术人员应该了解，本发明不受上述实施例的限制，上述实施例和说明书中描述的只是说明本发明的原理，在不脱离本发明精神和范围的前提下，本发明还会有各种变化和改进，这些变化和改进都落入要求保护的本发明范围内。本发明要求保护范围由所附的权利要求书及其等效物界定。The basic principles and main features of the present invention and the advantages of the present invention have been shown and described above. Those skilled in the industry should understand that the present invention is not limited by the above-mentioned embodiments. What are described in the above-mentioned embodiments and the description only illustrate the principle of the present invention. Without departing from the spirit and scope of the present invention, the present invention will also have Variations and improvements are possible, which fall within the scope of the claimed invention. The protection scope of the present invention is defined by the appended claims and their equivalents.

Claims

1. a text positioning method in conjunction with maximum extremum stable region and stroke width variation, is characterized in that: described text positioning method comprises the steps:

(1) Use MSER to detect the text field: grayscale the original image, and use an integer of 0-255 to represent the grayscale value of each pixel in the image; choose a threshold within the grayscale value range of the image, Pixels whose gray value is less than the threshold are defined as black, and pixels greater than the threshold are white. When the threshold is 0, the entire image is white. When the threshold changes from 0 to 255, the black area is stable and the area gradient is the smallest. , then this area is the maximum stable extremum area;

(2) Canny operator marginalizes the image: use the Gaussian filter to smooth the image, calculate the gradient magnitude and gradient direction of the filtered image, suppress the non-maximum value of the gradient magnitude, and find the local maximum in the image gradient Value points, and set the non-local maximum points to zero, so as to refine the image edge, and use the double threshold algorithm to detect and connect the edges;

(3) Obtain image stroke width features: for each edge pixel point, define a ray in the gradient direction perpendicular to the edge, find another corresponding edge pixel point along the ray direction, and find another edge pixel point in the gradient direction , and the gradient direction of this point is approximately opposite to the original gradient direction, then the distance between the two edge pixels is considered as the stroke width; if no corresponding pixel is found or the gradient direction of the corresponding pixel is not approximately opposite, the ray is discarded, and in In a more complex stroke environment, calculate the median stroke width m of all pixels along the undiscarded ray, and set the median stroke width of all pixels on the ray whose stroke width is greater than m to m;

(4) Morphological operation processing image: use opening and closing operations on the image. The opening operation first corrodes the image to remove the edge burrs of the image, and then performs the expansion operation to fill the small gaps and holes in the image. The closing operation first performs expansion Operation, fill the broken area of the image, outline the gap, and then perform an erosion operation to smooth the edge of the image;

(5) Candidate text domain generation: the text pixels are gathered into candidate text domains according to the rules, and the stroke width values of adjacent pixels within the threshold range are classified as the same connected domain, and the aspect ratio and area ratio of the connected domain are calculated, and the Connected domains exceeding the threshold range are filtered out as non-text domains;

(6) Text field merging: the single-character text field is further filtered, and the stroke width average ratio, height ratio, and pixel point color average ratio in the adjacent single-character text field exceed the threshold value, and the connected domain with a large deviation is filtered out as noise, and The remaining connected domains are aggregated into links to form continuous text domains.

2. the text localization method in conjunction with maximum extremum stable region and stroke width variation according to claim 1, is characterized in that: described utilize MSER to carry out in the step that text domain is detected, the maximum stable extremum region is a kind of Relying on the relationship between the interior of the region and the boundary pixels, the algorithm to obtain the maximum stable extreme value region according to the stability determination conditions; for the grayscale of the input image, a threshold is randomly selected within the gray value range of 0-255, Q1, ...., Qi, ... is a series of nested extremum regions, and If q(i)=|Q _i+Δ \Q _i-Δ |/|Q _i | has a local minimum at i ^* , then Q _i* is the maximum extremum stable region MSER.

3. the text localization method that combines maximum extremum stable region and stroke width variation according to claim 1, is characterized in that: described Canny operator edge detection is the edge detection operator based on optimization idea, and this algorithm adopts A suitable two-dimensional Gaussian function smoothes and denoises the image by row and column respectively, calculates the magnitude and direction of the image gradient, and finds the local maximum point in the image gradient by suppressing the maximum value of the gradient magnitude, and sets it to zero The non-local maximum point makes the edge thinner, using the T ₁ and T ₂ double-threshold algorithm for detection, using T ₁ to obtain each line segment, and using T ₂ to find the breaks on both sides of the line segment and connect the edges; among them, The two-dimensional Gaussian function is:

<mrow><mi>G</mi><mrow><mo>(</mo><mi>x</mi><mo>,</mo><mi>y</mi><mo>)</mo></mrow><mo>=</mo><mfrac><mn>1</mn><mrow><mn>2</mn><msup><mi>&pi;&sigma;</mi><mn>2</mn></msup></mrow></mfrac><mi>exp</mi><mrow><mo>(</mo><mo>-</mo><mfrac><mrow><msup><mi>x</mi><mn>2</mn></msup><mo>+</mo><msup><mi>y</mi><mn>2</mn></msup></mrow><mrow><mn>2</mn><msup><mi>&sigma;</mi><mn>2</mn></msup></mrow></mfrac><mo>)</mo></mrow><mo>;</mo></mrow>

I(x,y)=G(x,y)*f(x,y);

The calculation formulas of the gradient magnitude and gradient direction are:

θ(x, y)=arctan(g _y /g _x ); where σ is the standard deviation of the Gaussian curve, and (g _x , g _y ) represents the gradient.

4. the text localization method that combines maximum extremum stable area and stroke width variation according to claim 1, is characterized in that: in the step of described stroke width calculation, described stroke width value is _dswt ; Described stroke width The calculation steps of the value include: call the gradient direction of each edge pixel point p d _p , the gradient direction d _p is perpendicular to the edge direction, define a ray r=p+n·d _p , n>0, find along the ray direction Another edge pixel q, if the gradient direction d _q of q is approximately opposite to d _p (d _q =-d _p +π/6), then the stroke width d _swt of this pixel is: Among them, x _p and y _p are the horizontal and vertical coordinates of the pixel point p respectively, and x _q and y _q are the horizontal and vertical coordinates of the pixel point q respectively; in a more complex stroke environment, the stroke width value obtained by the above calculation process It is not accurate. Calculate the median stroke width m of all pixels along the undiscarded ray, and set the median stroke width of all pixels on the ray whose stroke width is greater than m to m.

5. The text positioning method combined with the maximum extremum stable region and stroke width variation according to claim 1, characterized in that: in the step of processing images using morphological operations, it mainly includes opening and closing operations, and opening operations First perform an erosion operation on the image to remove the edge burrs of the image, and then perform an expansion operation to fill the small gaps and holes in the image. The closing operation first performs the expansion operation to fill the broken area of the image and the contour gap, and then performs an erosion operation to smooth the edge of the image ; The described opening operation is denoted as defined as: The closed operation is denoted as A·B, which is defined as: Where A is an image and B is a structural element.

6. the text localization method that combines maximum extremum stable area and stroke width variation according to claim 1, is characterized in that: in the step of described candidate text domain generation, mainly by calculating connected domain attributes and setting rules and thresholds Filter non-text domains, the rules included are: stroke width variance, aspect ratio, area ratio; the stroke width variance is used to judge whether the pixel points belong to the same connected domain, if the stroke width values are similar, then this pixel point is classified into for the same connected domain. The calculation formula of stroke width value mean μ _swt and variance σ _swt ² is: where N is the total number of pixels in the connected domain, is the stroke width value of the i-th pixel; the aspect ratio is used to filter out the small and narrow connected domains caused by noise interference, the connected domain aspect ratio r=d _height /d _width , and the aspect ratio threshold is 2. The area ratio is used to filter the connected domains whose area is too large or too small, and the area ratio threshold of the connected domains is 2.

7. the text localization method that combines maximum extremum stable area and stroke width variation according to claim 1, is characterized in that: in the step of described text domain merging, further screening is carried out to single character candidate domain, and remaining single character is connected Domains are aggregated into chains to form continuous text domains. The screening conditions for single-character connected domains include stroke width ratio, height ratio, and color mean value difference; the stroke width ratio is used to judge whether adjacent single-character text domains belong to the same text domain. Single-character text field stroke width ratio threshold is 2; Said height ratio is used to judge whether adjacent single-character text fields belong to the same horizontal direction text field, and adjacent single-character text field height ratio threshold is 2; Described color mean value is used for judging adjacent Whether the single-character text fields belong to the same text field, the color mean difference threshold of adjacent single-character text fields is 40.