CN108038481A - A kind of combination maximum extreme value stability region and the text positioning method of stroke width change - Google Patents

A kind of combination maximum extreme value stability region and the text positioning method of stroke width change Download PDF

Info

Publication number
CN108038481A
CN108038481A CN201711310281.0A CN201711310281A CN108038481A CN 108038481 A CN108038481 A CN 108038481A CN 201711310281 A CN201711310281 A CN 201711310281A CN 108038481 A CN108038481 A CN 108038481A
Authority
CN
China
Prior art keywords
image
stroke width
text
gradient
edge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711310281.0A
Other languages
Chinese (zh)
Inventor
张再跃
潘立
刘亮亮
刘嘎琼
武子毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University of Science and Technology
Marine Equipment and Technology Institute Jiangsu University of Science and Technology
Original Assignee
Jiangsu University of Science and Technology
Marine Equipment and Technology Institute Jiangsu University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University of Science and Technology, Marine Equipment and Technology Institute Jiangsu University of Science and Technology filed Critical Jiangsu University of Science and Technology
Priority to CN201711310281.0A priority Critical patent/CN108038481A/en
Publication of CN108038481A publication Critical patent/CN108038481A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/34Smoothing or thinning of the pattern; Morphological operations; Skeletonisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The present invention relates to the text method that a kind of combination maximum extreme value stability region and stroke width change, and text detection is carried out to image using MSER;Then marginalisation processing is carried out to image;And calculate stroke width value along the gradient direction of edge pixel point;Noise, filling gap are removed using morphological operation and calculate acquisition connected domain;Finally according to the non-textual domain of rule-based filtering and merge connected domain.The advantage of the invention is that:The present invention obtains coarse textview field by MSER detections, jointing edge processing, stroke width variation characteristic and morphological operation, realize and String localization is carried out in natural scene image, it the experiment proved that, accuracy rate of the present invention is high, text segmentation and text identification work, have fairly obvious Practical significance in natural scene String localization field, can be widely popularized use after being conducive to.

Description

Text positioning method combining maximum extremum stable region and stroke width change
Technical Field
The invention relates to image processing in the field of artificial intelligence computers, in particular to a method for realizing text positioning in a natural scene by utilizing image processing.
Background
In the process of natural scene text localization, there is a fundamental and inevitable problem: for an image with a complex natural background, how to avoid the influence of factors such as text layout, font type, illumination intensity, shooting angle and the like and accurately acquire the text position.
Text positioning is crucial in the text detection process, and the accuracy of text segmentation and text recognition is directly determined by the text positioning effect. Text localization is increasingly used in natural scenes, however, the complex natural scene environment presents many challenges to this technology. Different from the traditional text positioning technology, a large number of interferents exist in a natural scene, and factors such as a shooting angle and a font can deform a text, so that the text positioning is more difficult. Therefore, it is necessary to find text features so that the text localization process is not affected by these factors.
There are many methods for locating texts in natural scenes, and the methods mainly include a sliding window method and a connected domain analysis method. The sliding window method utilizes a moving window to detect texts at all positions of an image, and the connected domain analysis method obtains candidate connected domains by selecting image characteristics, and realizes text positioning after screening and combining.
In natural scene text localization, several problems are often encountered and need to be solved:
1) The text feature extraction is a step of text positioning in a natural scene, so that image preprocessing is required before the text feature is extracted, then the text feature required in the image is extracted, and a candidate connected domain is generated according to the extracted feature.
2) How to distinguish text fields from non-text fields, there are a lot of interferents with very similar characteristics to the text, such as plants, road signs, railings, etc., in natural scenes. Therefore, after the candidate connected component is obtained, the non-text component needs to be distinguished and filtered out.
3) The text in natural scenes is in various forms, including different fonts and languages. Therefore, how to make the positioning method compatible with various languages and fonts is a problem to be solved.
Therefore, in order to achieve text positioning in natural scenes and obtain high accuracy, the following problems to be solved need to be considered:
technical problem 1: and extracting text features after natural scene image preprocessing. How to select the text features to be extracted enables the positioning method to effectively overcome the problems of natural scene interference factors and multi-font compatibility;
technical problem 2: candidate connected domain filtering problems. How to design rules to generate candidate connected domains and filter and distinguish non-text domains;
technical problem 3: single word connected domain merging problem. How to screen single character connected domains and combine them into a text domain;
aiming at the problems and problems, the invention provides and realizes a natural scene text positioning method combining the maximum extremum stable region and the stroke width change characteristics.
Disclosure of Invention
The invention aims to provide a text positioning method combining a maximum extremum stable region and stroke width change so as to realize accurate text positioning in a natural scene image.
In order to solve the technical problems, the technical scheme of the invention is as follows: a text positioning method combining a maximum extremum stable region and stroke width variation is characterized in that: the text positioning method comprises the following steps:
(1) Detecting a text field by using MSER: graying the original image, and expressing the gray value of each pixel point in the image by using an integer of 0-255; randomly selecting a threshold value in the gray value range of the image, defining the pixel points with the gray value smaller than the threshold value as black, and defining the pixel points with the gray value larger than the threshold value as white, wherein when the threshold value is 0, the whole image is white, and when the threshold value is changed from 0 to 255, a black area is stable and unchanged and the gradient of the area is minimum, the area is a maximum stable extremum area;
(2) Performing border processing on images by using a Canny operator: smoothing the image by using a Gaussian filter, calculating the gradient amplitude and the gradient direction of the filtered image, carrying out non-maximum suppression on the gradient amplitude, finding out a local maximum point in the image gradient, setting the non-local maximum point to zero so as to refine the image edge, and detecting and connecting the edge by using a dual-threshold algorithm;
(3) Acquiring the stroke width characteristics of the image: for each edge pixel point, defining a ray in the gradient direction perpendicular to the edge, searching for the corresponding other edge pixel point along the ray direction, and finding out the other edge pixel point in the gradient direction, wherein the gradient direction of the point is approximately opposite to the original gradient direction, and the distance between the two edge pixel points is considered as the stroke width; if no corresponding pixel point is found or the gradient directions of the corresponding pixel points are not approximate to each other, discarding the ray, in a more complex stroke environment, calculating stroke width median values m of all pixel points along the unreleased ray, and setting the stroke width median values of all pixel points with stroke width values larger than m on the ray as m;
(4) Morphological operations processing images: opening and closing operations are used for the image, the opening operation is firstly carried out on the image to carry out corrosion operation, edge burrs of the image are removed, then expansion operation is carried out, small gaps and small holes of the image are filled, the closing operation is firstly carried out on the image to fill up fracture areas and contour gaps of the image, then the corrosion operation is carried out, and the edge of the image is smoothed;
(5) And (3) generating a candidate text field: aggregating the text pixels into candidate text domains according to rules, classifying adjacent pixels into the same connected domain if the stroke width value is within the threshold range, calculating the aspect ratio and the area ratio of the connected domain, and filtering the connected domain exceeding the threshold range as a non-text domain;
(6) Text field merging: and further filtering the single character text fields, wherein the stroke width-to-mean ratio, the height ratio and the pixel point color-to-mean ratio in the adjacent single character text fields exceed threshold values, filtering connected fields with larger deviation as noise, and gathering the rest connected fields into a continuous text field.
Further, in the step of detecting the text field by using the MSER, the maximally stable extremal region is an algorithm that obtains the maximally stable extremal region according to a stability determination condition depending on a relationship between an interior of the region and a boundary pixel; graying the input image, selecting a threshold value within the range of image gray value of 0-255, Q1, \ 8230., qi, \ 8230, is a series of nested extremum regionsSatisfy Q (i) = | Q i+Δ \Q i-Δ |/|Q i L at i * With a local minimum, then Q i* Is the maximum extremum stable region MSER.
Furthermore, the Canny operator edge detection is an edge detection operator based on an optimization idea, the algorithm adopts a proper two-dimensional Gaussian function to smoothly denoise the image according to rows and columns, calculates the amplitude and the direction of the image gradient, finds local maximum points in the image gradient by inhibiting the maximum value of the gradient amplitude, sets zero non-local maximum points to refine the edge, and adopts T 1 、T 2 Dual threshold algorithmic detection with T 1 To obtain each line segment by T 2 Searching for fracture parts on two sides of the line segment and connecting edges; wherein the two-dimensional Gaussian function is:
I(x,y)=G(x,y)*f(x,y);
the calculation formula of the gradient amplitude and the gradient direction is as follows:
θ(x,y)=arctan(g y /g x ) (ii) a Where σ is the standard deviation of the Gaussian curve, (g) x ,g y ) The gradient is indicated.
Further, in the step of calculating the stroke width, the stroke width value is d swt (ii) a The stroke width value calculating step comprises the following steps: let the gradient direction of each edge pixel point p be called d p Direction of gradient d p Perpendicular to the edge direction, a ray r = p + n · d is defined p ,n&gt, 0, finding another edge pixel point q along the ray direction, if the gradient direction d of q q And d p Approximately opposite (d) q =-d q + π/6), then the stroke width value d of this pixel point swt Comprises the following steps:wherein x p 、y p Respectively the abscissa and ordinate, x, of pixel point p q 、y q Respectively the horizontal and vertical coordinates of the pixel point q; in a more complex stroke environment, the stroke width values obtained by the calculation process are inaccurate, the stroke width median values m of all pixel points are calculated along the rays which are not discarded, and the stroke width median values of all the pixel points with the stroke width values larger than m on the rays are set as m.
Further, the step of processing the image by using the morphological operation mainly comprises an opening operation and a closing operation, wherein the opening operation firstly carries out corrosion operation on the image to remove edge burrs of the image, then carries out expansion operation to fill small gaps and small holes of the image, the closing operation firstly carries out expansion operation to fill fracture areas and contour gaps of the image, and then carries out corrosion operation to smooth the edge of the image; said opening operation is noted asIs defined as follows:the closed operation is marked as A.B and defined as:wherein A is an image and B is a structural element.
Further, in the step of generating the candidate text field, the non-text field is filtered out mainly by calculating the property of the connected domain and setting rules and threshold values, wherein the rules include: stroke width variance, aspect ratio, area ratio; the stroke width variance is used for judging whether the pixel points belong to the same connected domain, and if the stroke width values are similar, the pixel points are classified into the same connected domain. Stroke width value mean mu swt And variance σ swt 2 The calculation formula of (c) is: where N is the total number of pixels in the connected domain,is the stroke width value of the ith pixel point; the aspect ratio is used for filtering a fine and long connected domain generated by noise interference, and the aspect ratio r = d of the connected domain height /d width The aspect ratio threshold is 2; the area ratio is used for filtering a connected domain with an overlarge or undersize area, and the area ratio threshold of the connected domain is 2.
Further, in the text field merging step, the single character candidate fields are further screened, and the remaining single character connected fields are gathered into a chain to form a continuous text field, wherein the screening conditions of the single character connected fields comprise stroke width ratio, height ratio and color average value difference; the stroke width ratio is used for judging whether the adjacent single character text fields belong to the same text field, and the stroke width ratio threshold of the adjacent single character text fields is 2; the height ratio is used for judging whether the adjacent single character text fields belong to the same horizontal direction text field, and the height ratio threshold of the adjacent single character text fields is 2; the color mean value is used for judging whether the adjacent single character text fields belong to the same text field, and the color mean value difference threshold value of the adjacent single character text fields is 40.
The invention has the advantages that: the method comprises the steps of carrying out MSER text detection on an image by utilizing affine invariance of a maximum extremum stable region to obtain a plurality of candidate text fields; performing marginalization processing on the image through a Canny operator on the basis; extracting stroke width characteristics aiming at all edge pixel points to obtain a connected domain; and then the non-text field is further filtered, and the single character text fields are combined, so that the text positioning in a natural scene is realized. For example, the purpose of text detection in a natural scene can be well realized by combining the text segmentation and the text recognition, and the method has very obvious practical significance in the field of image processing.
The invention adopts ICDAR2003 text positioning competition data set test data to carry out experiments, and the experimental results show that: the method combining the maximum extremum stable region and the stroke width characteristics can effectively position the text in a natural scene. After statistical analysis, the natural scene text positioning method combining the maximum extremum stable region and the stroke width features provided by the invention has the positioning accuracy rate of 74.1%.
Drawings
The invention is described in further detail below with reference to the drawings and the detailed description.
FIG. 1 is a flow chart of a text localization method incorporating the maximum extremum stable region and stroke width variation according to the present invention.
Detailed Description
The following examples are presented to enable one of ordinary skill in the art to more fully understand the present invention and are not intended to limit the scope of the embodiments described herein.
Examples
As shown in fig. 1, the text positioning method combining the maximum extremum stable region and the stroke width feature provided in this embodiment includes the following steps:
1. the method for detecting the text of the input image by using the MSER comprises the following steps:
the extraction of the stroke width features depends on the edge feature effect of the image, and the invention combines MSER to detect the text of the image to obtain the rough text position, thereby improving the accuracy of the subsequent image marginalization and stroke width feature extraction.
The maximum stable extremum region is an algorithm depending on the relationship between the interior of the region and the boundary pixels, and for a gray level image I:the maximum stable extremum region is defined as: s is fully ordered, S = (0, 1, \8230;, 255}, and satisfies antisymmetry, transitivity, completeness; a neighborhood relationship defining 4 neighborhoodsWhen p, q ∈ D, if satisfiedp, q are contiguous and are denoted pAq; the region Q is a contiguous subset of D, and for any p, Q ∈ Q, there is a sequence p, a 1 ,a 2 ,...,a n Q, making pAa 1 ,…,a i Aa i+1 ,…,a n Aq; boundaries of regionsEdge of region QIs a pixel set, at least one pixel is adjacent to the region Q but not belonging to the region Q; extreme regionIs a region that, for all p e Q,if I (p) > I (Q), the region Q is a maximum region, otherwise, the region Q is a minimum region; let Q 1 ,…,Q i-1 ,Q i 8230is a series of nested extremum regionsIf Q (i) = | Q is satisfied i+Δ \Q i-Δ |/|Q i L at i * With a local minimum, then Q i* Is the maximum extremum stable region MSER.
The canny operator marginalizing processing image step comprises the following steps:
adopting a proper two-dimensional Gaussian function to carry out smooth denoising on the image according to rows and columns respectively, and calculating the amplitude and the direction of the image gradient;
the two-dimensional gaussian function is:
I(x,y)=G(x,y)*f(x,y) (2);
the calculation formula of the gradient amplitude and the gradient direction is as follows:
θ(x,y)=arctan(g y /g x ) (4);
where σ is the standard deviation of the Gaussian curve, (g) x ,g y ) Represents a gradient;
and carrying out non-maximum suppression processing on the gradient image, and comparing the gradient amplitude of 8 neighborhoods of each pixel along the gradient direction. If the amplitude values of two pixels in the gradient directionAnd if the amplitude values are smaller than the amplitude value of the pixel point, the pixel point is probably an edge pixel point, and otherwise, the gradient amplitude of the pixel point is set to be 0. Calculating to obtain a low threshold t according to the gradient histogram 1 And a high threshold t 2 And pressing t against the image 1 、t 2 Threshold processing is performed twice, and if the gradient is smaller than the threshold, the gray value is set to 0.
3. The stroke width characteristic extraction step comprises the following steps:
setting the initial stroke width value of each element as infinity, and after acquiring edge information by using a Canny operator, designating the gradient direction of each edge pixel point p as d p Since the edge pixel point p is on the edge, the gradient direction d p Must be perpendicular to the edge direction; defining a ray r = p + n.dp, n&gt, 0, finding another edge pixel point q along the ray direction, if the gradient direction d of q q And d p Approximately opposite in direction (d) q =-d p + pi/6), the stroke width value d of the pixel point swt Comprises the following steps:
if the corresponding edge pixel point q or the gradient direction d of the edge pixel point q is not found q And d p If not, the ray r is discarded;
however, in a more complicated stroke environment such as a stroke corner, the stroke width value obtained according to the above calculation procedure is not accurate, and therefore, the stroke width median values m of all pixel points of the rays which are not discarded need to be calculated again along all the rays which are not discarded, and the stroke width median values of all the pixel points which are larger than m on the rays are set as m.
4. A step of morphological operations processing the image, comprising:
the opening operation can make the image edge smoother, remove some ragged burrs on the edge, and remove narrow areas. Closed operation is the opposite, it removes noise in the region, fills in narrow broken parts and edgesThe edge gap is defined as that there are image A and set B in integer space Z, and the opening operation of B to A is recorded asIs defined as:
correspondingly, the closed operation of the structural element B on the image A is denoted as A & B, and is defined as:
wherein A is an image and B is a structural element.
5. A step of candidate text field generation, comprising:
in the step of generating the candidate text field, the non-text field is filtered out mainly by calculating the property of the connected field and setting rules and threshold values, wherein the rules comprise: stroke width variance, aspect ratio, area ratio;
the stroke width variance is used for judging whether the pixel points belong to the same connected domain, if the stroke width values are similar, the pixel points are classified into the same connected domain, and the stroke width value mean value mu swt And variance σ swt 2 The calculation formula of (2) is as follows:
where N is the total number of pixels in the connected domain,is the stroke width value of the ith pixel point;
the aspect ratio is used for filtering a fine and long connected domain generated by noise interference, and the aspect ratio r = d of the connected domain height /d width The aspect ratio threshold is 2;
the area ratio is used for filtering a connected domain with an overlarge or undersize area, and the area ratio threshold of the connected domain is 2.
6. The text field merging step comprises the following steps:
further screening the single character candidate domain, and gathering the residual single character connected domains into a chain to form a continuous text domain, wherein the screening conditions of the single character connected domains comprise stroke width ratio, height ratio and color average value difference;
the stroke width ratio is used for judging whether the adjacent single character text fields belong to the same text field, and the stroke width ratio threshold of the adjacent single character text fields is 2;
the height ratio is used for judging whether the adjacent single character text fields belong to the same horizontal direction text field, and the height ratio threshold of the adjacent single character text fields is 2;
the color mean value is used for judging whether the adjacent single character text fields belong to the same text field, and the color mean value difference threshold value of the adjacent single character text fields is 40.
Experiment: the ICDAR2003 text positioning competition data set test data is adopted to carry out experiments. The experimental results show that: the method combining the maximum extremum stable region and the stroke width characteristics can effectively position the text in a natural scene. After statistical analysis, the natural scene text positioning method combining the maximum extremum stable region and the stroke width features provided by the invention has the positioning accuracy rate of 74.1%. According to experimental results, the method can effectively realize text positioning in a natural scene, has high accuracy and has very wide use value.
The foregoing shows and describes the general principles and features of the present invention, together with the advantages thereof. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (7)

1. A text positioning method combining a maximum extremum stable region and stroke width variation is characterized in that: the text positioning method comprises the following steps:
(1) Detecting a text field by using MSER: graying the original image, and expressing the gray value of each pixel point in the image by using an integer of 0-255; randomly selecting a threshold value in the gray value range of the image, defining the pixel points with the gray value smaller than the threshold value as black, and defining the pixel points with the gray value larger than the threshold value as white, wherein when the threshold value is 0, the whole image is white, and when the threshold value is changed from 0 to 255, a black area is stable and unchanged and the gradient of the area is minimum, the area is a maximum stable extremum area;
(2) Performing border processing on the image by using a Canny operator: smoothing the image by using a Gaussian filter, calculating the gradient amplitude and the gradient direction of the filtered image, performing non-maximum suppression on the gradient amplitude, finding out a local maximum point in the image gradient, setting the non-local maximum point to zero, refining the image edge, and detecting and connecting the edge by using a dual-threshold algorithm;
(3) Acquiring the stroke width characteristics of the image: for each edge pixel point, defining a ray in the gradient direction vertical to the edge, searching the corresponding other edge pixel point along the ray direction, and finding the other edge pixel point in the gradient direction, wherein the gradient direction of the point is approximately opposite to the original gradient direction, and the distance between the two edge pixel points is considered as the stroke width; if the corresponding pixel point is not found or the gradient directions of the corresponding pixel points are not approximate to each other, discarding the ray, in a more complex stroke environment, calculating the stroke width median values m of all the pixel points along the unrelushed ray, and setting the stroke width median values of all the pixel points with the stroke width values larger than m on the ray as m;
(4) Morphological operations processing images: opening and closing operations are used for the image, the opening operation is firstly carried out on the image to remove burrs at the edge of the image, then expansion operation is carried out to fill small gaps and small holes of the image, the closing operation is firstly carried out on the expansion operation to fill up the broken region and the outline gap of the image, then the corrosion operation is carried out to smooth the edge of the image;
(5) And (3) generating a candidate text field: aggregating the text pixels into candidate text domains according to rules, classifying adjacent pixels into the same connected domain if the stroke width value is within the threshold range, calculating the aspect ratio and the area ratio of the connected domain, and filtering the connected domain exceeding the threshold range as a non-text domain;
(6) Text field merging: and further filtering the single character text field, wherein the stroke width-to-mean ratio, the height ratio and the pixel point color-to-mean ratio in the adjacent single character text fields exceed threshold values, the connected fields with larger deviation are used as noise for filtering, and the residual connected fields are aggregated into a continuous text field.
2. The method of claim 1 for text localization combining maximum extremum stable region and stroke width variation, wherein: in the step of detecting the text field by using the MSER, the maximum stable extremum region is an algorithm which depends on the relationship between the region interior and the boundary pixel and obtains the maximum stable extremum region according to the stability judgment condition; graying the input image, selecting a threshold value within the range of image gray value of 0-255, Q1, \ 8230., qi, \ 8230, is a series of nested extremum regionsSatisfies Q (i) = | Q i+Δ \Q i-Δ |/|Q i L at i * With a local minimum, then Q i* Is the maximum extremum stable region MSER.
3. The method of claim 1, wherein the text is located in a maximum extremum stable region and the stroke width is varied by: the Canny operator edge detection is an edge detection operator based on an optimization idea, and the algorithm adopts a proper two-dimensional Gaussian functionThe image is subjected to smooth denoising according to rows and columns, the amplitude and the direction of the image gradient are calculated, local maximum value points in the image gradient are found through maximum value suppression of the gradient amplitude, non-local maximum value points are set to be zero, edges are refined, and T is adopted 1 、T 2 Double threshold algorithm detection with T 1 To obtain each line segment by T 2 Searching for fracture parts on two sides of the line segment and connecting edges; wherein the two-dimensional Gaussian function is:
I(x,y)=G(x,y)*f(x,y);
the calculation formula of the gradient amplitude and the gradient direction is as follows:
θ(x,y)=arctan(g y /g x ) (ii) a Where σ is the standard deviation of the Gaussian curve, (g) x ,g y ) Representing a gradient.
4. The method of claim 1, wherein the text is located in a maximum extremum stable region and the stroke width is varied by: in the step of calculating the stroke width, the stroke width value is d swt (ii) a The stroke width value calculating step includes: let the gradient direction of each edge pixel point p be called d p Direction of gradient d p Perpendicular to the edge direction, a ray r = p + n · d is defined p ,n&gt, 0, finding another edge pixel point q along the ray direction, if the gradient direction d of q q And d p Approximately opposite (d) q =-d p + pi/6), the stroke width value d of the pixel point swt Comprises the following steps:wherein x is p 、y p Are respectively pixel pointsAbscissa and ordinate of p, x q 、y q Respectively the horizontal and vertical coordinates of the pixel point q; in a more complex stroke environment, the stroke width values obtained by the calculation process are inaccurate, the stroke width median values m of all pixel points are calculated along the rays which are not discarded, and the stroke width median values of all the pixel points with the stroke width values larger than m on the rays are set as m.
5. The method of claim 1, wherein the text is located in a maximum extremum stable region and the stroke width is varied by: the step of processing the image by using the morphological operation mainly comprises opening and closing operations, wherein the opening operation is to perform corrosion operation on the image to remove edge burrs of the image, then perform expansion operation to fill small gaps and small holes of the image, the closing operation is to perform expansion operation to fill fracture areas and contour gaps of the image, and then perform corrosion operation to smooth the edge of the image; said opening operation is described asIs defined as:the closed operation is marked as A.B and defined as:wherein A is an image and B is a structural element.
6. The method of claim 1, wherein the text is located in a maximum extremum stable region and the stroke width is varied by: in the step of generating the candidate text field, the non-text field is filtered out mainly by calculating the property of the connected domain and setting rules and threshold values, wherein the rules comprise: stroke width variance, aspect ratio, area ratio; and the stroke width variance is used for judging whether the pixel belongs to the same connected domain, and if the stroke width values are similar, the pixel is classified as the same connected domain. Stroke width value mean mu swt And variance σ swt 2 The calculation formula of (2) is as follows:where N is the total number of pixels in the connected domain,is the stroke width value of the ith pixel point; the aspect ratio is used for filtering a fine and long connected domain generated by noise interference, and the aspect ratio r = d of the connected domain height /d width The aspect ratio threshold is 2; the area ratio is used for filtering a connected domain with an overlarge or undersize area, and the area ratio threshold of the connected domain is 2.
7. The method of claim 1 for text localization combining maximum extremum stable region and stroke width variation, wherein: in the text field merging step, the single character candidate fields are further screened, and the remaining single character connected fields are gathered into a chain to form a continuous text field, wherein the screening conditions of the single character connected fields comprise stroke width ratio, height ratio and color average value difference; the stroke width ratio is used for judging whether the adjacent single character text fields belong to the same text field, and the stroke width ratio threshold of the adjacent single character text fields is 2; the height ratio is used for judging whether the adjacent single character text fields belong to the same horizontal direction text field, and the height ratio threshold of the adjacent single character text fields is 2; the color mean value is used for judging whether the adjacent single character text fields belong to the same text field, and the color mean value difference threshold value of the adjacent single character text fields is 40.
CN201711310281.0A 2017-12-11 2017-12-11 A kind of combination maximum extreme value stability region and the text positioning method of stroke width change Pending CN108038481A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711310281.0A CN108038481A (en) 2017-12-11 2017-12-11 A kind of combination maximum extreme value stability region and the text positioning method of stroke width change

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711310281.0A CN108038481A (en) 2017-12-11 2017-12-11 A kind of combination maximum extreme value stability region and the text positioning method of stroke width change

Publications (1)

Publication Number Publication Date
CN108038481A true CN108038481A (en) 2018-05-15

Family

ID=62102252

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711310281.0A Pending CN108038481A (en) 2017-12-11 2017-12-11 A kind of combination maximum extreme value stability region and the text positioning method of stroke width change

Country Status (1)

Country Link
CN (1) CN108038481A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109344824A (en) * 2018-09-21 2019-02-15 泰康保险集团股份有限公司 A kind of line of text method for detecting area, device, medium and electronic equipment
CN109448000A (en) * 2018-10-10 2019-03-08 中北大学 A kind of dividing method of road sign image
CN109472221A (en) * 2018-10-25 2019-03-15 辽宁工业大学 A kind of image text detection method based on stroke width transformation
CN109670500A (en) * 2018-11-30 2019-04-23 平安科技(深圳)有限公司 A kind of character area acquisition methods, device, storage medium and terminal device
CN109978781A (en) * 2019-03-14 2019-07-05 北京工业大学 A kind of intravascular ultrasound image segmentation method based on extremal region detection
CN109993742A (en) * 2019-04-04 2019-07-09 哈尔滨工业大学 Bridge Crack method for quickly identifying based on diagonal operator reciprocal
CN110032997A (en) * 2019-01-07 2019-07-19 武汉大学 A kind of natural scene text positioning method based on image segmentation
CN110245600A (en) * 2019-06-11 2019-09-17 长安大学 Adaptively originate quick stroke width unmanned plane Approach for road detection
CN110944237A (en) * 2019-12-12 2020-03-31 成都极米科技股份有限公司 Subtitle area positioning method and device and electronic equipment
CN110991448A (en) * 2019-11-27 2020-04-10 云南电网有限责任公司电力科学研究院 Text detection method and device for nameplate image of power equipment
CN112488107A (en) * 2020-12-04 2021-03-12 北京华录新媒信息技术有限公司 Video subtitle processing method and processing device
CN112633197A (en) * 2020-12-28 2021-04-09 宁波江丰生物信息技术有限公司 Method and system for tissue region identification of fluorescence section
CN113298054A (en) * 2021-07-27 2021-08-24 国际关系学院 Text region detection method based on embedded spatial pixel clustering
CN115546232A (en) * 2022-10-12 2022-12-30 什维新智医疗科技(上海)有限公司 Liver ultrasonic image working area extraction method and system and electronic equipment
CN118135578A (en) * 2024-05-10 2024-06-04 沈阳出版社有限公司 Text learning and proofreading system based on text recognition

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6526170B1 (en) * 1993-12-14 2003-02-25 Nec Corporation Character recognition system
CN104751142A (en) * 2015-04-01 2015-07-01 电子科技大学 Natural scene text detection algorithm based on stroke features
CN104794479A (en) * 2014-01-20 2015-07-22 北京大学 Method for detecting text in natural scene picture based on local width change of strokes
CN106127118A (en) * 2016-06-15 2016-11-16 珠海迈科智能科技股份有限公司 A kind of English word recognition methods and device
CN106446920A (en) * 2016-09-05 2017-02-22 电子科技大学 Stroke width transformation method based on gradient amplitude constraint
CN107045634A (en) * 2017-05-02 2017-08-15 电子科技大学 A kind of text positioning method based on maximum stable extremal region and stroke width

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6526170B1 (en) * 1993-12-14 2003-02-25 Nec Corporation Character recognition system
CN104794479A (en) * 2014-01-20 2015-07-22 北京大学 Method for detecting text in natural scene picture based on local width change of strokes
CN104751142A (en) * 2015-04-01 2015-07-01 电子科技大学 Natural scene text detection algorithm based on stroke features
CN106127118A (en) * 2016-06-15 2016-11-16 珠海迈科智能科技股份有限公司 A kind of English word recognition methods and device
CN106446920A (en) * 2016-09-05 2017-02-22 电子科技大学 Stroke width transformation method based on gradient amplitude constraint
CN107045634A (en) * 2017-05-02 2017-08-15 电子科技大学 A kind of text positioning method based on maximum stable extremal region and stroke width

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109344824A (en) * 2018-09-21 2019-02-15 泰康保险集团股份有限公司 A kind of line of text method for detecting area, device, medium and electronic equipment
CN109344824B (en) * 2018-09-21 2022-06-10 泰康保险集团股份有限公司 Text line region detection method, device, medium and electronic equipment
CN109448000A (en) * 2018-10-10 2019-03-08 中北大学 A kind of dividing method of road sign image
CN109448000B (en) * 2018-10-10 2021-07-30 中北大学 Segmentation method of traffic direction sign image
CN109472221A (en) * 2018-10-25 2019-03-15 辽宁工业大学 A kind of image text detection method based on stroke width transformation
CN109670500A (en) * 2018-11-30 2019-04-23 平安科技(深圳)有限公司 A kind of character area acquisition methods, device, storage medium and terminal device
CN109670500B (en) * 2018-11-30 2024-06-28 平安科技(深圳)有限公司 Text region acquisition method and device, storage medium and terminal equipment
CN110032997A (en) * 2019-01-07 2019-07-19 武汉大学 A kind of natural scene text positioning method based on image segmentation
CN109978781B (en) * 2019-03-14 2021-03-16 北京工业大学 Intravascular ultrasound image segmentation method based on extremum region detection
CN109978781A (en) * 2019-03-14 2019-07-05 北京工业大学 A kind of intravascular ultrasound image segmentation method based on extremal region detection
CN109993742A (en) * 2019-04-04 2019-07-09 哈尔滨工业大学 Bridge Crack method for quickly identifying based on diagonal operator reciprocal
CN110245600A (en) * 2019-06-11 2019-09-17 长安大学 Adaptively originate quick stroke width unmanned plane Approach for road detection
CN110991448A (en) * 2019-11-27 2020-04-10 云南电网有限责任公司电力科学研究院 Text detection method and device for nameplate image of power equipment
CN110944237A (en) * 2019-12-12 2020-03-31 成都极米科技股份有限公司 Subtitle area positioning method and device and electronic equipment
CN110944237B (en) * 2019-12-12 2022-02-01 成都极米科技股份有限公司 Subtitle area positioning method and device and electronic equipment
CN112488107A (en) * 2020-12-04 2021-03-12 北京华录新媒信息技术有限公司 Video subtitle processing method and processing device
CN112633197A (en) * 2020-12-28 2021-04-09 宁波江丰生物信息技术有限公司 Method and system for tissue region identification of fluorescence section
CN113298054B (en) * 2021-07-27 2021-10-08 国际关系学院 Text region detection method based on embedded spatial pixel clustering
CN113298054A (en) * 2021-07-27 2021-08-24 国际关系学院 Text region detection method based on embedded spatial pixel clustering
CN115546232A (en) * 2022-10-12 2022-12-30 什维新智医疗科技(上海)有限公司 Liver ultrasonic image working area extraction method and system and electronic equipment
CN118135578A (en) * 2024-05-10 2024-06-04 沈阳出版社有限公司 Text learning and proofreading system based on text recognition

Similar Documents

Publication Publication Date Title
CN108038481A (en) A kind of combination maximum extreme value stability region and the text positioning method of stroke width change
CN104751142B (en) A kind of natural scene Method for text detection based on stroke feature
Gatos et al. ICFHR 2010 handwriting segmentation contest
US9235755B2 (en) Removal of underlines and table lines in document images while preserving intersecting character strokes
CN106295648B (en) A kind of low quality file and picture binary coding method based on multi-optical spectrum imaging technology
CN104361336A (en) Character recognition method for underwater video images
Paunwala et al. A novel multiple license plate extraction technique for complex background in Indian traffic conditions
WO2019204577A1 (en) System and method for multimedia analytic processing and display
CN112818952A (en) Coal rock boundary recognition method and device and electronic equipment
Gilly et al. A survey on license plate recognition systems
CN105809673A (en) SURF (Speeded-Up Robust Features) algorithm and maximal similarity region merging based video foreground segmentation method
CN105447489A (en) Character and background adhesion noise elimination method for image OCR system
CN108256518B (en) Character area detection method and device
Wu et al. Contour restoration of text components for recognition in video/scene images
Kumar An efficient text extraction algorithm in complex images
Feild et al. Scene text recognition with bilateral regression
Dhar et al. Bangladeshi license plate recognition using adaboost classifier
Giri Text information extraction and analysis from images using digital image processing techniques
Jin et al. A color image segmentation method based on improved K-means clustering algorithm
Mol et al. Text recognition using poisson filtering and edge enhanced maximally stable extremal regions
CN111191534B (en) Road extraction method in fuzzy aviation image
Sushma et al. Text detection in color images
Liao et al. An integrated approach for multilingual scene text detection
CN107153823B (en) Lane line feature extraction method based on visual correlation double spaces
Shekar et al. Text localization in video/scene images using Kirsch Directional Masks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180515

RJ01 Rejection of invention patent application after publication