CN106874905B

CN106874905B - A method of the natural scene text detection based on self study Color-based clustering

Info

Publication number: CN106874905B
Application number: CN201710021572.1A
Authority: CN
Inventors: 郭建京; 邹北骥; 吴慧; 杨文君; 徐子雯
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2017-01-12
Filing date: 2017-01-12
Publication date: 2019-06-11
Anticipated expiration: 2037-01-12
Also published as: CN106874905A

Abstract

The present invention provides a kind of methods of natural scene text detection based on self study Color-based clustering, first, hierarchical clustering and Parameter Self-learning strategy are combined, design a kind of adaptive Color-based clustering method, extract the candidate characters in image, the adaptive Color-based clustering method can learn automatically weight threshold for different images, there is preferable character recall rate.Then, by training Adaboost classifier, building character verifies model, removes non-text character；Finally, merging character constructs line of text, and text detection result is obtained by post-processing.Compared with traditional method, this method can obtain higher text detection recall rate, and the text results detected are more accurate.

Description

A method of the natural scene text detection based on self study Color-based clustering

Technical field

The invention belongs to mode identification technologies, are related to a kind of natural scene text inspection based on self study Color-based clustering Survey method.

Background technique

Natural scene image text includes mass efficient information, extracts the weight that image text is analysis of image content and understanding Premise is wanted, and can be widely applied to car plate detection, unmanned, content-based image retrieval, mobile phone text identification and machine The fields such as people's self-navigation.However, being increased since natural scene image Method for text detection is influenced by various factors The difficulty of text detection, wherein influence factor is broadly divided into following three classes:

Complicated image background: Image Acquisition color complexity in arbitrary scene, different images is different, and exists The interfering objects such as a large amount of leaf, brick, railing and tile, easily lead to text detection mistake.

Diversified text: the diversification of text size and pattern in natural scene image, and text character exists Different degrees of distortion and inclination.

Different degrees of disturbing factor: natural scene image collects for photographing outdoors, vulnerable to different degrees of illumination, The influence of shade, resolution ratio and shooting angle.

To overcome above-mentioned influence factor, the accuracy rate of text detection is improved, experts and scholars propose a large amount of natural scene Method for text detection is broadly divided into two classes: the method based on sliding window and the method based on connected domain.

Based on the Method for text detection of sliding window usually using multi-scale sliding window mouth, original image is scanned, extracts and waits Selection one's respective area is verified then in conjunction with candidate region color, gradient and Texture eigenvalue using the method for machine learning, Obtain text detection result.Due to the diversity of text size in image, based on the method for sliding window usually using multiple dimensioned Window scan image extracts candidate text, so that time-consuming for this method, the candidate region of generation is excessive, increases follow-up text The difficulty of verifying.

Method based on connected domain is Method for text detection more popular at present.This method is further divided into three sons and appoints Business: (1) candidate characters extract, the verifying of (2) candidate characters, (3) text filed analysis.

Candidate characters, which extract, usually considers that the pixel that text character includes in image has gray consistency, color consistent Property, the features such as stroke width homogeneity, and then extract the similar pixel of feature, construct candidate characters connected domain.

Candidate characters verifying usually by analyzing character and background area, extract a series of easily distinguishable backgrounds and The feature of text, and the method validation candidate characters connected domain of machine learning is combined, remove non-text character.

Text filed analysis is usually to carry out post-processing operation to the character retained after verifying.Generally, by analyzing character The spatial position of connected domain, color, Texture eigenvalue merge character similar in position, color and texture, constitute text Row, is then segmented and is verified to line of text using heuristic rule and the mode of machine learning, obtain final detection knot Fruit.

Since natural scene image background is complicated and changeable, the diversifications such as text color, font, size in image, and by The influence of different degrees of illumination, shade, shooting angle.Therefore, how effectively to be mentioned from the background of differing complexity Candidate characters are taken, are the key that based on connected region Method for text detection.

Summary of the invention

The present invention provides a kind of Method for text detection based on self study Color-based clustering, in order to overcome above-mentioned existing text Detection method there are the problem of, this method combines hierarchical clustering and Parameter Self-learning strategy, realizes adaptive Color-based clustering Algorithm constructs color layer, extracts the connected region in color layer, as candidate characters, and then positions the text in image.

A kind of natural scene Method for text detection based on self study Color-based clustering, comprising the following steps:

Step 1: R, G, B color-values of each pixel in pending text detection image I are projected into three-dimensional color In space, three-dimensional color space is equidistantly divided, each three-dimensional color space cube is as a hierarchical clustering base This unit；

Using the color mean value of all pixels point in each three-dimensional color space cube as hierarchical clustering basic unit Feature c；

C=(μ (r), μ (g), μ (b)), wherein μ (r), μ (g) and μ (b) are respectively all pictures in hierarchical clustering basic unit R, G, B color mean value of vegetarian refreshments；

Step 2: feature weight vector w, the w=(w of initialization hierarchical clustering basic unit_r,w_g,w_b,w_θ)；

Wherein, w_r, w_g, w_bThe respectively color distance weighting of R, G, B of hierarchical clustering basic unit pixel, w_θIt is poly- Class threshold value；

Step 3: with the feature weight vector w of hierarchical clustering basic unit, it is basic successively to calculate any two hierarchical clustering Color distance between unit；

d_i,j=w_r|μ_i(r)-μ_j(r)|+w_g|μ_i(g)-μ_j(g)|

+w_b|μ_i(b)-μ_j(b)|

Wherein, μ_iAnd μ_jRespectively indicate i and j hierarchical clustering basic unit；

Step 4: the smallest two hierarchical clustering basic units of color distance being merged, new hierarchical clustering base is obtained This unit, and the feature c of new hierarchical clustering basic unit is calculated, merged with hierarchical clustering basic unit and constructs corresponding level Clustering tree, return step 3, until hierarchical clustering basic unit quantity is 1；

Step 5: the feature vector of building positive sample and negative sample；

According to cluster threshold value w_θ, the hierarchical clustering tree constructed in step 4 is divided, hierarchical clustering forest is obtained, with The color distance of the initial hierarchical clustering basic unit of any two in hierarchical clustering forest under same stalk tree is as positive sample This feature vector, with the color of the initial hierarchical clustering basic unit of any two under subtrees different in hierarchical clustering forest Feature vector of the distance as negative sample；

Step 6: using the current value of the feature weight vector w of hierarchical clustering basic unit, and using activation primitive pair The feature vector of positive sample and negative sample that step 5 constructs carries out sample class prediction, and utilizes sample class predicted value and sample The category attribute of itself constructs the likelihood function of weight vectors w, acquires new hierarchical clustering base by maximizing likelihood function The feature weight vector w of this unit, if updated w makes the maximum value convergence of the likelihood function of building, with new level The feature weight vector w for clustering basic unit, rebuilds hierarchical clustering forest, otherwise, return step 3；

In order to facilitate likelihood function is solved, logarithm is taken to likelihood function both sides, obtains log-likelihood function:

It using stochastic gradient rise method, maximizes log-likelihood function l (w), solves weight vectors w.

Step 7: all initial hierarchical clusterings that each subtree includes in the hierarchical clustering forest successively obtained with step 6 Pixel merges in unit, constructs corresponding color layer；

Step 8: extracting connected domain from each color layer, obtain candidate characters, candidate characters are sieved with classifier Choosing carries out character merging to the candidate characters after screening, obtains line of text；Word division is carried out to line of text, obtains text This testing result.

Further, the activation primitive used in the step 6 is logistic regression function:

Wherein, h_w(x) prediction result of sample is corresponded to for input vector x；X is input vector, by positive sample or negative sample Feature vector and intercept item -1 form, x=(| μ_i(r)-μ_j(r)|,|μ_i(g)-μ_j(g)|,|μ_i(b)-μ_j(b)|,-1)。

Further, the likelihood function of the weight vectors w constructed in the step 6 is as follows:

Wherein, p (y⁽ⁱ⁾|x⁽ⁱ⁾；It w) is probability density function of the x about parameter w, x⁽ⁱ⁾And y⁽ⁱ⁾Respectively indicate i-th of sample Input vector and sample attribute classification itself, n be sample total；y⁽ⁱ⁾Value is 0 or 1,0 expression negative sample, and 1 indicates positive sample This.

Further, the classifier in the step 8 for being screened to candidate characters is Adaboost classifier, is adopted It is obtained with the training of following process:

Firstly, every piece image of training set in ICDAR2013 database is executed step 1-7, from obtained color layer Middle extraction candidate characters；

Then, candidate characters are subjected to pixel matching, the positive and negative sample set of building training with the character really demarcated；

Then, from the positive and negative sample set of training, 30000 trained positive samples and 30000 negative samples of training are randomly selected The training set of this conduct building Adaboost classifier；

Finally, extracting the geometrical characteristic and HOG feature of each sample in training set, training Adaboost classifier is obtained To the Adaboost classifier for verifying candidate characters.

Further, described that candidate characters are screened with classifier, refer to the geometry for extracting each candidate characters Feature and HOG feature input trained Adaboost classifier and carry out candidate characters verifying, remove non-text character, retain Text character.

Further, the described pair of candidate characters after screening carry out character merging, obtain the detailed process of line of text It is as follows:

By the character combination of two after verifying, form character pair, by the ratio of width to height, horizontal distance and color distance meet with The character of lower condition merges the text character pair comprising identical connected domain to text character pair is considered as, and constructs line of text:

|mean(R₁)-mean(R₂)|<80

Wherein, w () and h () respectively indicates the width and height of character；h_dAnd v_dRespectively indicate character zone R₁And R₂ Horizontal distance and vertical range between two central points；Mean (R) indicates the color mean value of pixel in character zone R.

Further, described that word division is carried out to line of text, refer to two adjacent character horizontal space d_hIt carries out Judgement, if meetingIt is then once divided, the word after being divided；

Wherein, d_hHorizontal space between adjacent character,For the mean value of all character horizontal spaces, α puts down for character Equal spacing zoom factor, value 1-2, β are the median of all character horizontal spaces.

The median of all character horizontal spaces refers to successively sort to the horizontal space size of all characters after Median.

Further, the Adaboost classifier described in the only word use comprising single character in the word after division It is verified, the word to keep score greater than 0.8, obtains final text detection result.

Beneficial effect

The present invention provides a kind of methods of natural scene text detection based on self study Color-based clustering, firstly, by layer Secondary cluster and Parameter Self-learning strategy combine, and design a kind of adaptive Color-based clustering method, extract the candidate characters in image, should Adaptive Color-based clustering method can learn automatically weight threshold for different images, there is preferable character recall rate.Then, pass through Training Adaboost classifier, building character verify model, remove non-text character；Finally, merging character constructs line of text, And text detection result is obtained by post-processing.Compared with traditional method, this method can obtain higher text detection and recall Rate, and the text results detected are more accurate.

Detailed description of the invention

Fig. 1 is the flow diagram of the method for the invention；

Fig. 2 is the text detection process of the method for the invention, wherein figure (a) is image to be detected；It is adaptive for scheming (b) The color layer of Color-based clustering method extraction is answered, the pixel of the same color layer is indicated using same color in figure；Scheming (c) is The candidate characters that color layer extracts, each candidate characters are identified using individual color；(d) is schemed for candidate characters verifying Result later；(e) is schemed for character merging, the line of text of building；Scheming (f) is obtained final text after line of text participle Testing result；

Fig. 3 is the schematic diagram that hierarchical clustering and positive negative sample construct.

Specific embodiment

Below in conjunction with specific embodiment, the present invention is described in more detail.

A method of the natural scene text detection based on self study Color-based clustering, process is as shown in Figure 1, include following Step:

For carrying out text detection to Fig. 2 (a), the specific steps are as follows:

Step 1: input image to be detected is expressed as image I, such as Fig. 2 (a)；

Step 2: R, G, B color-values of all pixels point in image I are extracted, it, will according to R, G, B value of pixel in image It projects to three-dimensional color space.By three-dimensional color space according to step-length 32,512 are equidistantly divided into having a size of 32 × 32 × 32 small cubes.

Step 3: extracting the small cubes comprising pixel, the basic unit as hierarchical clustering.Calculate each small cube R, G, B color mean value of pixel in body are expressed as c=(μ (r), μ (g), μ (b)) as the feature of cluster basic unit.

Step 4: initialization weight vectors w=(w_r,w_g,w_b,w_θ), wherein w_r, w_g, w_bRespectively μ (r), μ (g), μ (b) Color distance weighting, w_θTo cluster threshold value, w is initialized as (1,1,1,50).

Step 5: building level clustering tree；

Step 5.1: according to feature weight vector w in step 4, between any two by cluster cell in formula (1) calculating step 3 Color distance.

Wherein, μ (r), μ (g), μ (b) are respectively the R of pixel included in hierarchical clustering unit, G, B color mean value, μ_i And μ_jRespectively indicate i and j hierarchical clustering basic unit；

Step 5.2: merging the smallest cluster cell of color distance, construct new cluster cell；

Step 5.3: calculating the color characteristic of new unit, and update the color distance between other units；

Step 5.4: repeat step 5.2-5.3, until cluster element be 1, thus construct level clustering tree.

Step 6: according to cluster threshold value w_θ, the hierarchical clustering tree in step 5 is divided into hierarchical clustering forest, such as schemes (3) It is shown.Wherein, solid line divides to obtain different subtrees, and lower every stalk tree includes different cell node, shown in frame as dashed. Using the color distance of the initial hierarchical clustering basic unit of any two under same stalk tree as the feature vector of positive sample, Using the color distance of the initial hierarchical clustering basic unit of any two under subtrees different in hierarchical clustering forest as negative sample Thus this feature vector constructs the feature vector of positive and negative samples.

Step 7: updating weight vectors w.

Step 7.1: the sample characteristics extracted according to weight vectors w and step 6, herein using logistic regression function (such as public affairs Shown in formula 2) it is used as activation primitive, class prediction is carried out to sample.

Step 7.2: according to sample predictions value and its true tag, likelihood function is constructed, as shown in formula (3), wherein p (y⁽ⁱ⁾|x⁽ⁱ⁾；It w) is probability density function of the x about parameter w, x⁽ⁱ⁾And y⁽ⁱ⁾Respectively indicate the input vector and sample of i-th of sample This Attribute class itself is other, and n is sample total；y⁽ⁱ⁾Value is 0 or 1,0 expression negative sample, and 1 indicates positive sample.

Step 7.3: solving likelihood function for convenience, logarithm is taken to formula (3) both sides, obtains log-likelihood function, such as Shown in formula (4).

Step 7.4: using stochastic gradient rise method, maximize log-likelihood function l (w), thus solve weight vectors w. The gradient for seeking formula (4) w obtains shown in result such as formula (5).

Step 7.5: updating weight vectors w according to formula (6), wherein α is the learning rate of stochastic gradient rise method, setting It is 0.011.

w_j:=w_j+α((y-h_w(x))x_j) (6)

Step 7.6: repeating step 7.1-7.5, weight vectors w is updated, until gradientUntil being close to 0, it is believed that Likelihood function reaches maximum value.

Step 8: according to the weight vectors w updated in step 7, step 5-7 is repeated, until log-likelihood function l in step 7 (w) maximum value convergence, obtains optimal weight vectors, constructs optimal hierarchical clustering tree.

Step 9: according to final cluster threshold value w_θ, hierarchical clustering tree is divided, hierarchical clustering forest is obtained.

Step 10: all initial levels that each subtree includes in the hierarchical clustering forest successively obtained with step 9 are poly- Pixel merges in class unit, constructs corresponding color layer；Its color layer result such as Fig. 2 (b), same color layer in figure Pixel is labeled as same color.

Step 11: the connected domain in each color layer is marked, candidate characters are obtained, it is same in figure as a result as shown in Fig. 2 (c) One character pixels point is labeled as identical color.

Step 12: extracting the geometrical characteristic and HOG feature of each candidate characters, input trained Adaboost classification Device carries out candidate characters verifying, removes non-text character, retains text character, as a result as shown in Fig. 2 (d).

The training process of the Adaboost classifier is as follows:

Firstly, every piece image of training set in ICDAR2013 database is executed step 1-11, candidate characters are obtained；

Then, candidate characters are subjected to pixel matching with the character really demarcated, when matched pixel point accounts for candidate characters With 60% or more of the character phase and pixel quantity really demarcated, then it is assumed that candidate characters are considered as positive sample by successful match, Otherwise it is considered as negative sample, constructs sample set.

Finally, randomly selecting 30000 positive samples and 30000 negative samples as building from sample set The training set of Adaboost classifier.Extract the geometrical characteristic and HOG feature of each sample in training set, training Adaboost Classifier obtains candidate characters verifying model.

Step 13: character merges, and constructs line of text, as a result as shown in Fig. 2 (e).

By the character combination of two after step 12 verifying, character pair is formed, by the ratio of width to height, horizontal distance and color distance The character of (such as formula 7-9) is met certain condition to text character pair is considered as, merges the text character pair comprising identical connected domain, Construct line of text.

Wherein, w () and h () respectively indicates the width and height of character；h_dAnd v_dRespectively indicate character zone R₁And R₂ Horizontal distance and vertical range between two central points.

|mean(R₁)-mean(R₂)|<80 (9)

Wherein, mean (R) indicates the color mean value of pixel in character zone R.

Step 14: line of text post-processing obtains text detection as a result, as shown in Fig. 2 (f)；

Step 14.1: the line of text that step 13 is constructed, according to formula 10 to two adjacent character horizontal space d_hInto Row judgement, if meeting the condition of formula 10, is once divided, the word after being divided；

Wherein, d_hHorizontal space between adjacent character,For the mean value of all character horizontal spaces, α puts down for character Equal spacing zoom factor, value 1.5, β are the median of all character horizontal spaces.

Step 14.2: after being divided by step 14.1 line of text, only including the word of single character, use step 12 In Adaboost classifier further verify, the word only to keep score greater than 0.8 obtains final text detection result.

The foregoing is merely presently preferred embodiments of the present invention, is merely illustrative for the purpose of the present invention, and not restrictive 's.Those skilled in the art understand that many modifications can be carried out to it in the scope of the claims in the present invention, but all will It falls within the scope of protection of the present invention.

Claims

1. a kind of natural scene Method for text detection based on self study Color-based clustering, which comprises the following steps:

Step 1: R, G, B color-values of each pixel in pending text detection image I are projected into three-dimensional color space In, three-dimensional color space is equidistantly divided, each three-dimensional color space cube is substantially single as a hierarchical clustering Member；

Using the color mean value of all pixels point in each three-dimensional color space cube as the feature of hierarchical clustering basic unit c；

Wherein, w_r, w_g, w_bThe respectively color distance weighting of R, G, B of hierarchical clustering basic unit pixel, w_θTo cluster threshold Value；

Step 3: with the feature weight vector of hierarchical clustering basic unit, successively calculate any two hierarchical clustering basic unit it Between color distance；

Step 4: the smallest two hierarchical clustering basic units of color distance being merged, it is substantially single to obtain new hierarchical clustering Member, and the feature c of new hierarchical clustering basic unit is calculated, merged with hierarchical clustering basic unit and constructs corresponding hierarchical clustering Tree, return step 3, until hierarchical clustering basic unit quantity is 1；

Step 5: the feature vector of building positive sample and negative sample；

According to cluster threshold value w_θ, the hierarchical clustering tree constructed in step 4 is divided, hierarchical clustering forest is obtained, it is poly- with level Spy of the color distance of the initial hierarchical clustering basic unit of any two in class forest under same stalk tree as positive sample Vector is levied, is made with the color distance of the initial hierarchical clustering basic unit of any two under subtrees different in hierarchical clustering forest For the feature vector of negative sample；

Step 6: using the current value of the feature weight vector w of hierarchical clustering basic unit, and using activation primitive to step 5 The positive sample of building and the feature vector of negative sample carry out sample class prediction, and utilize sample class predicted value and sample itself Category attribute, construct weight vectors w likelihood function, by maximize likelihood function acquire new hierarchical clustering basic unit Feature weight vector w, if updated w make building likelihood function maximum value convergence, with new hierarchical clustering base The feature weight vector w of this unit, rebuilds hierarchical clustering forest, otherwise, return step 3；

Step 7: all initial level cluster cells that each subtree includes in the hierarchical clustering forest successively obtained with step 6 Middle pixel merges, and constructs corresponding color layer；

Step 8: connected domain is extracted from each color layer, obtains candidate characters, candidate characters are screened with classifier, it is right Candidate characters after screening carry out character merging, obtain line of text；Word division is carried out to line of text, obtains text detection As a result；

Classifier in the step 8 for being screened to candidate characters is Adaboost classifier, is instructed using following process Practice and obtain:

Firstly, every piece image of training set in ICDAR2013 database is executed step 1-7, mentioned from obtained color layer Take candidate characters；

Then, from the positive and negative sample set of training, 30000 trained positive samples is randomly selected and 30000 trained negative samples are made For the training set for constructing Adaboost classifier；

Finally, extracting the geometrical characteristic and HOG feature of each sample in training set, training Adaboost classifier is used In the Adaboost classifier of verifying candidate characters；

It is described that candidate characters are screened with classifier, refer to the geometrical characteristic and HOG feature for extracting each candidate characters, It inputs trained Adaboost classifier and carries out candidate characters verifying, remove non-text character, retain text character；

The described pair of candidate characters after screening carry out character merging, and obtaining line of text, detailed process is as follows:

By the character combination of two after verifying, character pair is formed, the ratio of width to height, horizontal distance and color distance are met into following item The character of part merges the text character pair comprising identical connected domain to text character pair is considered as, and constructs line of text:

|mean(R₁)-mean(R₂) | < 80

Wherein, w () and h () respectively indicates the width and height of character；h_dAnd v_dRespectively indicate character zone R₁And R₂Two Horizontal distance and vertical range between central point；Mean (R) indicates the color mean value of pixel in character zone R；

It is described that word division is carried out to line of text, refer to two adjacent intercharacter horizontal space d_hJudged, if meetingIt is then once divided, the word after being divided；

Wherein, d_hHorizontal space between adjacent character,For the mean value of all character horizontal spaces, α is between character is average Away from zoom factor, value 1-2, β are the median of all character horizontal spaces.

2. the method according to claim 1, wherein the activation primitive used in the step 6 is logistic regression Function:

Wherein, h_w(x) prediction result of sample is corresponded to for input vector x；X is input vector, by positive sample or the feature of negative sample Vector sum intercept item -1 forms, x=(| μ_i(r)-μ_j(r)|,|μ_i(g)-μ_j(g)|,|μ_i(b)-μ_j(b)|,-1)。

3. according to the method described in claim 2, it is characterized in that, the likelihood function of the weight vectors w constructed in the step 6 It is as follows:

Wherein, p (y⁽ⁱ⁾|x⁽ⁱ⁾；It w) is probability density function of the x about parameter w, x⁽ⁱ⁾And y⁽ⁱ⁾Respectively indicate the defeated of i-th of sample Incoming vector and sample itself attribute classification, n are sample total；y⁽ⁱ⁾Value is 0 or 1,0 expression negative sample, and 1 indicates positive sample.

4. the method according to claim 1, wherein to the list in the word after division only including single character Word, is verified using the Adaboost classifier, and the word to keep score greater than 0.8 obtains final text detection As a result.