CN104809481B

CN104809481B - A kind of natural scene Method for text detection based on adaptive Color-based clustering

Info

Publication number: CN104809481B
Application number: CN201510263154.4A
Authority: CN
Inventors: 邹北骥; 吴慧; 郭建京; 赵于前
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2015-05-21
Filing date: 2015-05-21
Publication date: 2017-10-20
Anticipated expiration: 2035-05-21
Also published as: CN104809481A

Abstract

The present invention proposes a kind of method of the natural scene text detection based on adaptive Color-based clustering, this method proposes adaptive Color-based clustering scheme first, for the image of differing complexity, the program, which can be clustered, obtains different number of color layer, effectively extracts text connected region；Then, by training extreme learning machine (ELM), neighborhood character model is built, merging forms character string, improves the robustness of method；Finally, in order to further improve the performance of system text detection, this method uses the strategy that convolutional neural networks (CNN) and SVMs (SVM) are combined, and verifies text-string, compared with conventional method, the accuracy of text detection is improved.

Description

A kind of natural scene Method for text detection based on adaptive Color-based clustering

Technical field

The invention belongs to mode identification technology, it is related to a kind of natural scene text inspection based on adaptive Color-based clustering Survey method.

Background technology

With the popularization of mobile phone and camera installation, the quantity of image and video is more and more.Wrapped in these images and video Contain the important information of many, how to extract and understand the information in image, it appears be particularly important.Text is most main in image Will, most direct information, extract and identification image in text, can secondary computer understand picture material.At present, block letter Text detection has been achieved for huge progress, and is widely used.However, the text in natural scene image, It is changeful due to its font size and pattern, while being influenceed by illumination, shade, shooting angle so that it detects effect Fruit is not good.Therefore, natural scene text detection is still a challenging job.

At present, already present natural scene Method for text detection can be divided into two major classes：Based on sliding window and based on even The method in logical region.Detection method based on sliding window is also referred to as the detection method based on region, and its operation principle is：It is first First, original image is scanned using the sliding window of different scale, obtains a series of subregion；Then, the line of subregion is extracted Manage feature, such as histogram of gradients, wavelet transformation；Finally, using the features training grader of extraction, subregion is verified, is obtained Final detection text.This method extracts subregion by multi-scale sliding window mouthful, and its time complexity is higher, and using by hand The signature verification subregion of design so that its Detection results is not good.In recent years, the Method for text detection based on connected region is obtained The extensive concern of related scholar.This method mainly includes 3 steps：1) by features such as the color of pixel, stroke widths, Connected region is extracted from image；2) feature of connected region is analyzed, rule is merged by character, text-string is obtained；3) Character string is verified, removal is non-legible, obtains final text detection result.Compared with the detection method based on sliding window, it is based on The detection method of connected region possesses higher accuracy rate, and its time complexity is relatively low.

Because the text in natural scene image is changeful, its background also shows different complexities.Therefore, such as What extracts text connected region from the image of differing complexity, and rationally removes non-textual region, is to be based on connected region The key of domain Method for text detection.

The content of the invention

The invention provides a kind of natural scene Method for text detection based on adaptive Color-based clustering, its object is to gram Take the problem of accuracy rate is not high when text detection background is complicated in the prior art.

A kind of natural scene Method for text detection based on adaptive Color-based clustering, including following steps：

Step 1：Obtain pending text detection image I edge image I_e；

Step 2：Edge image I is removed from pending text detection image I_eIn pixel, obtain master color image I_m；

Step 3：Initialize Color-based clustering center (μ⁰(r),μ⁰(g),μ⁰(b))；

Step 3.1, by master color image I_mIn pixel project to three-dimensional color space；

Step 3.2, step-length S is set, the three-dimensional color space is quantified, obtained (256/S)³Individual son of the same size Cube；

Step 3.3, the number of the pixel in each sub-cube is calculated, as the density of the sub-cube, and is found out The maximum sub-cube of density；

Step 3.4, the color average value of all pixels point in the maximum sub-cube of density is calculated, using the value as initial Color-based clustering center (μ⁰(r),μ⁰(g),μ⁰(b))；

Step 4：Update Color-based clustering center；

Step 4.1, the initial value for setting renewal iterations t is 0, and the Color-based clustering center that the t times iteration is obtained is (μ^t (r),μ^t(g),μ^t(b))；

Step 4.2, master color image I is calculated_mIn each pixel p to initial color cluster centre apart from d_c, pixel Point p R, G, B color channel values are represented sequentially as p_r、p_gAnd p_b：

Step 4.3, master color image I is found out_mIn meet condition d_c<L all pixels point, and calculate the institute for the condition that meets There is the color average value of pixel, be used as new Color-based clustering center (μ^t+1(r),μ^t+1(g),μ^t+1(b))；

L represents color distance threshold, and span is [24,88]；

Because the color of text-string is close, by normative text database ICDAR2003, ICDAR2011 Obtained with ICDAR2013 experiment tests, l span is between [24,88]；

Step 4.4, (μ is judged^t(r),μ^t(g),μ^t(b)) with (μ^t+1(r),μ^t+1(g),μ^t+1(b) it is) whether equal, if phase Deng then with (μ^t+1(r),μ^t+1(g),μ^t+1(b)) as final Color-based clustering center (μ (r), μ (g), μ (b)), otherwise, t is made =t+1, return to step 4.2, until the value at Color-based clustering center does not change；

Step 5：Build color tomographic image；

Step 5.1, according to final Color-based clustering center (μ (r), μ (g), μ (b)) is obtained in step 4, I is traveled through_mAnd I_eIn All pixels point, calculates each pixel q to Color-based clustering center (μ (r), μ (g), μ (b)) apart from d；

Step 5.2, meeting condition d<L pixel q constitutes a color tomographic image, is expressed as C_i, wherein, i is represented The color tomographic image that ith is obtained, while these pixels from I_mAnd I_eIt is middle to remove, obtain new master color image and edge Image；I initial value value is 1；

Step 5.3, the new master color image that step 5.2 is obtained, i=i+1, return to step 3, until described in step 2 Master color image I_mMiddle all pixels point is all assigned in corresponding color tomographic image, constructs all color tomographic images

Step 6：Binary conversion treatment is carried out to all color tomographic images, corresponding binary image is obtained, and extract institute There are the connected region in binary image, composition connected region set CCs；

Step 7：Build extreme learning machine classifier training collection；

First, the image of ICDAR2013 database trainings concentration is chosen as training sample, by every width figure in training sample As performing step 1-6, connected region set CCs is obtained；

Then, connected region adjacent in CCs is partnered two-by-two, if 2 connected regions of a centering are same It is in text-string and adjacent, then adjacent connected region is regarded as positive sample；If 2 connected regions of a centering are all texts This, and vertical repetitive rate is that the connected region of 0, i.e., 2 is distributed in 2 different character strings, or a centering 2 connected regions Domain, wherein it is non-textual to have one, then regards adjacent connected region as negative sample；

From all positive negative samples, 10000 positive samples and 10000 negative samples are randomly selected as the structure limit Habit machine classifier training collection；

Step 8：The characteristic vector for concentrating each sample with extreme learning machine classifier training trains grader, obtains neighborhood Character model；

The characteristic vector of each sample includes height and compares R_h, average stroke width compare R_sw, vertically superposed rate R_vol, water 5 features of flat space D and color similarity CS；

Step 9：Adjacent character merges；

Connected region in connected region set CCs corresponding to pending text detection image I, according to from top to bottom, Mark is numbered in order from left to right, is designated as (CC₁,CC₂...CC_n), n represents the number of connected region；

The extreme learning machine grader obtained using step 8 is to (CC₁,CC₂...CC_n) neighborhood character classification is carried out, by phase Adjacent character is merged, and obtains text-string, completes text detection.

The extreme learning machine grader obtained in the step 9 using step 8 is to (CC₁,CC₂...CC_n) carry out neighborhood word Symbol classification, adjacent character is merged, detailed process is as follows：

Step 9.1, choose and number most preceding connected region as initial connected region, be expressed as CC_L, find out and CC_LIt is adjacent Connected region, be expressed as CC_R, by CC_LAnd CC_RIt is used as one group of test sample；

Step 9.2：Obtain the characteristic vector of test sample；

Step 9.3：Using the extreme learning machine grader trained in step 8, according to the 5 of test sample sample spies Levy, judge CC_LAnd CC_RWhether it is neighborhood character；

Step 9.4：If CC_LAnd CC_RNon-neighborhood character is judged as, CC_LPreserved as character string, and from CCs Delete CC_L；If CC_LAnd CC_RNeighborhood character is judged as, then CC_LAnd CC_RDeleted from CCs, and CC_LAnd CC_RMerge Into a connected region, new CC is used as_L；

Then, CC is chosen again_LAdjacent connected region CC_R, obtain new test sample, return to step 9.2, until CCs Untill there is no connected region in set, the character string after all merging is obtained.

The characteristic vector of the test sample includes height and compares R_h, average stroke width compare R_sw, vertically superposed rate R_vol, water Flat space D and color similarity CS are calculated according to following calculation formula：

Highly compare：

Average stroke width ratio：

Vertically superposed rate：

Level interval：

Color similarity：

Wherein, CC_LIt is located at the connected region on the left side, CC for connected region centering_RIt is located at the company on the right for connected region centering Logical region；h_L、h_RDistribution represents CC_LAnd CC_RRegion height；sw_L、sw_RDistribution represents CC_LAnd CC_RAverage stroke width； v_LR、d_LRDistribution represents CC_LAnd CC_RBetween vertically superposed length, horizontal range length；r_L、g_L、b_LRespectively CC_LR, G, B The averaged color value of triple channel, r_R、g_R、b_RRespectively CC_RR, G, B triple channel averaged color value.

Using the convolutional neural networks CNN trained, first round checking is carried out to the text-string in step 9, removed Part non-textual, is comprised the following steps that：

Step 10.1：The confidence level of text-string is calculated by convolutional neural networks CNN, Score is expressed as；

Step 10.2：According to confidence level Score, text-string is divided into 3 classes：High, Middle and Low, wherein having Body rule is as follows：

High=Score | Score>1.4}

Middle=Score | 0.6≤Score≤1.4 }

Low=Score | Score<0.6}

Step 10.3：Confidence level Score is belonged into Low classes, deleted directly from candidate's text；By confidence level Score Belong to High classes, exported directly as final detection result；Confidence level Score is belonged into Middle classes, as to be identified Character string.

Using the convolutional Neural net built in the 21st International Model identification the 3304-3308 pages of collection of thesis of meeting in 2012 Network, to candidate text character string sort；

Character string to be identified in step 10 is verified using support vector machine classifier, the detection after being optimized As a result, detailed process is：

For what is obtained in step 10.3, belong to the candidate character strings of Middle classifications, extract HOG features, construction feature Vector, and character string checking is carried out using the support vector machine classifier trained, non-text character string is removed, is obtained after optimization Testing result；

The training process of the support vector machine classifier is as follows：

Step 11.1：The image of ICDAR2013 database trainings concentration is chosen as training sample, in training set Image, performs step 1-10, obtains candidate character strings；To candidate characters string sort, comprising character as positive sample, otherwise, It is considered as negative sample；

Step 11.2：For the positive negative sample in step 11.1, extract its histograms of oriented gradients feature, construction feature to Amount, Training Support Vector Machines grader.

Beneficial effect

The present invention proposes a kind of method of the natural scene text detection based on adaptive Color-based clustering, and this method is first Adaptive Color-based clustering scheme is proposed, for the image of differing complexity, the program, which can be clustered, obtains different number of color Layer, effectively extracts text connected region；Then, by training extreme learning machine (ELM), neighborhood character model is built, merges shape Into character string, improve the robustness of method；Finally, in order to further improve the performance of system text detection, this method is used The strategy that convolutional neural networks (CNN) and SVMs (SVM) are combined, verifies text-string, compared with conventional method, Improve the accuracy of text detection.

Brief description of the drawings

Fig. 1 is the schematic flow sheet of the method for the invention；

Fig. 2 is the process of text detection, wherein, (a) is image to be detected；(b) it is first color tomographic image；(c) it is same Neighborhood character amalgamation result in one color layer；(d) it is the result after convolutional neural networks (CNN) verification；(e) it is final inspection Survey result.

Fig. 3 is the samples pictures for training；Wherein, solid line indicate for positive sample merging process；Dotted line is indicated respectively Two kinds of merging modes of negative sample.

Embodiment

For the object, technical solutions and advantages of the present invention are more clearly understood, below in conjunction with specific embodiment, this reference Accompanying drawing, the present invention is described in more detail.

A kind of method of the natural scene text detection based on adaptive Color-based clustering, as shown in figure 1, including following step Suddenly：

Exemplified by carrying out text detection to Fig. 2 (a), comprise the following steps that：

Step 1：Input image to be detected, such as Fig. 2 (a)；

Step 2：Using Canny edge detection algorithms, the edge pixel point of image to be detected is extracted, edge image is constituted, It is expressed as I_e；Then by edge image I_eMiddle pixel, removes from original image, obtains master color image, be expressed as I_m；

Step 3：Initialize Color-based clustering center：

First, by the master color image I in step 2_mMiddle pixel projects to three-dimensional color space；Then step-length S is set =32, the three-dimensional color space is quantified, 8 × 8 × 8 sub-cubes of the same size are obtained；Secondly, calculate per height The number of pixel in cube, as the density of the sub-cube, finds out the maximum sub-cube of density；Finally, calculate The color average value of all pixels point in the maximum sub-cube of density, using the value as initial Color-based clustering center, is represented For (μ (r), μ (g), μ (b))；

Step 4：Update Color-based clustering center：

First, I is calculated_mIn pixel p (its R, G, B color channel values is expressed as p_r, p_g, p_b) clustered to initial color Center apart from d_c.As shown in formula (1)：

Wherein, t represents the t times iteration,For the t times iteration c_iThe color center of layer.

Then, I is found out_mIn meet condition d_c<(because the color of text-string is close, therefore l span should by l It should be obtained between [24,88] by normative text database ICDAR2003, ICDAR2011 and ICDAR2013 experiment test：Work as l= When 48, the degree of accuracy highest of text detection, therefore l values are all pixels point 48) in that patent, calculate its color average value, are made For new color center.Repeat step 4, until color center not change, i.e., Final color center is obtained, (μ is expressed as_i(r),μ_i(g),μ_i(b))；

Step 5：Build color layer：

First, according to obtaining final color center (μ in step 4_i(r),μ_i(g),μ_i(b) I), is traveled through_mAnd I_eIn own Pixel, calculates pixel p to color center (μ_i(r),μ_i(g),μ_i(b)) apart from d_c；Then, by I_mAnd I_eIn, meet bar Part d_c<L (wherein l=48) pixel composition color layer C_i, such as Fig. 2 (b), and by these pixels from I_mAnd I_eIt is middle to remove；

Step 6：Repeat the above steps 3-5, until master color image I_mMiddle all pixels point is all assigned to corresponding color Layer, constructs all color layersBinaryzation is carried out to each color layer picture, obtained Corresponding two-value picture, then using Connected Regions Extraction, obtains a series of connected regions, is expressed as CCs.

Step 7：Each 10000 positive negative samples are chosen, training set is built；

Step 7.1：The image for choosing ICDAR2013 database trainings concentration is used as training sample.By the figure in training set Piece performs step 1-6, obtains a series of connected region CCs, then connected region adjacent in CCs partners two-by-two, shape Into connected domain pair；

Step 7.2：Positive negative sample is divided into progress to the connected domain obtained in step 7.1 manually, the purpose is to allow just Negative sample training extreme learning machine (ELM), allows extreme learning machine (ELM) to know the feature of positive sample, and negative sample feature, with Just to data can automatic distinguishing be text or non-textual.

Wherein, the connected region pair that positive sample adjacent character in character string is constituted.Negative sample includes two parts：Character with The connected region pair of non-character composition；Character and character pair that vertical repetitive rate is zero.

Fig. 3 illustrates the structure of positive negative sample, wherein, the connected region of solid line connection is to for positive sample, dotted line connection Connected region to for negative sample.

Step 8：For the positive negative sample obtained in step 7.2, feature, construction feature vector, training limit study are extracted Machine grader (ELM), obtains neighborhood character model；

Step 8.1：Randomly select 10000 positive samples and 10000 negative samples；

Step 8.2：Calculate the height ratio (Height ratio) of sample, average stroke width ratio (Stroke width Ratio), vertically superposed rate (Vertical overlap ratio), level interval (Horizontal distance) and color Shown in 5 features of similarity (Color similarity), such as formula (2-6)：

Height is than (Height ratio):

Average stroke width ratio (Stroke width ratio):

Vertically superposed rate (Vertical overlap ratio):

Level interval (Horizontal distance):

Color similarity (Color similarity):

Wherein, h_L、h_RDistribution represents CC_LAnd CC_RRegion height；sw_L、sw_RDistribution represents CC_LAnd CC_RAverage stroke it is wide Spend (CC_LIt is located at the connected region on the left side, CC for connected region centering_RIt is located at the connected region on the right for connected region centering)； v_LR、d_LRDistribution represents CC_LAnd CC_RBetween vertically superposed length, horizontal range length；r_L、g_L、b_LRespectively CC_LR, G, B The averaged color value of triple channel, r_R、g_R、b_RRespectively CC_RR, G, B triple channel averaged color value.

Step 8.3：By 5 feature constructions of sample into characteristic vector, training extreme learning machine grader (ELM) is obtained Neighborhood character model；

Step 9：To the connected region CCs obtained in step 6, using the neighborhood character model obtained in step 8, judge to connect Whether logical region is adjacent character, so as to build character string, obtains Preliminary detection result, such as Fig. 2 (c)；

Step 9.1：By the connected region CCs in step 6 by position from top to bottom, from left to right in the way of sequencing numbers (CC₁,CC₂...CC_n)；

Step 9.2：Choose and number most preceding connected region as initial connected region, be expressed as CC_L, find out and CC_LIt is adjacent All connected regions, be expressed as CC_R, by CC_LAnd CC_RIt is used as one group of test sample.

Step 9.3：Step 8.2 is performed to test sample, 5 sample characteristics are obtained；

Step 9.4：Using the extreme learning machine (ELM) trained in step 8, according to the 5 of test sample sample characteristics, Judge CC_LAnd CC_RWhether it is neighborhood character；

Step 9.5：If CC_LAnd CC_RNon-neighborhood character is judged as, CC_LPreserved as character string, and from CCs Delete CC_L, then repeat step 9.2-9.5；If CC_LAnd CC_RNeighborhood character is judged as, then CC_LAnd CC_RDeleted from CCs Remove, and CC_LAnd CC_RA connected region is merged into, new CC is used as_L, then repeat step 9.2-9.5；

Step 9.6：Repeat step 9.2-9.5, untill not having connected region in CCs set, is obtained after all merging Character string；

Step 10：Using the convolutional neural networks (CNN) trained, the confidence level of candidate character strings in calculation procedure 9, and Text-string is classified according to confidence level.

The step 10 further comprises the steps：

Step 10.1：The confidence level of text-string is calculated by convolutional neural networks (CNN), Score is expressed as；

High=Score | Score>1.4}

Middle=Score | 0.6≤Score≤1.4 }

Low=Score | Score<0.6}

Step 10.3：Confidence level Score is belonged into Low classes, deleted directly from candidate's text；By confidence level Score Belong to High classes, exported directly as final detection result, such as Fig. 2 (d)；Confidence level Score is belonged into Middle classes, adopted Determined whether with step 11；

Step 11：The second wheel is carried out to remaining character string in step 10 to verify, obtain final using SVMs (SVM) Testing result, such as Fig. 2 (e).

Presently preferred embodiments of the present invention is the foregoing is only, is merely illustrative for the purpose of the present invention, and it is non-limiting 's.Those skilled in the art understands, many changes can be carried out to it in the spirit and scope that the claims in the present invention are limited, Modification, in addition it is equivalent, but fall within protection scope of the present invention.

Claims

1. a kind of natural scene Method for text detection based on adaptive Color-based clustering, it is characterised in that including following step Suddenly：

Step 1：Obtain pending text detection image I edge image I_e；

Step 3.2, step-length S is set, the three-dimensional color space is quantified, obtained (256/S)³Individual son cube of the same size Body；

Step 3.3, the number of the pixel in each sub-cube is calculated, as the density of the sub-cube, and density is found out Maximum sub-cube；

Step 3.4, the color average value of all pixels point in the maximum sub-cube of density is calculated, the value is regard as initial color Color cluster centre (μ⁰(r),μ⁰(g),μ⁰(b))；

Step 4：Update Color-based clustering center；

Step 4.1, the initial value for setting renewal iterations t is 0, and the Color-based clustering center that the t times iteration is obtained is (μ^t(r), μ^t(g),μ^t(b))；

Step 4.2, master color image I is calculated_mIn each pixel p to initial color cluster centre apart from d_c, pixel p R, G, B color channel values be represented sequentially as p_r、p_gAnd p_b：

Step 4.3, master color image I is found out_mIn meet condition d_c<L all pixels point, and calculate all pictures for the condition that meets The color average value of vegetarian refreshments, is used as new Color-based clustering center (μ^t+1(r),μ^t+1(g),μ^t+1(b))；

L represents color distance threshold, and span is [24,88]；

Step 4.4, (μ is judged^t(r),μ^t(g),μ^t(b)) with (μ^t+1(r),μ^t+1(g),μ^t+1(b) it is) whether equal, if equal, With (μ^t+1(r),μ^t+1(g),μ^t+1(b)) as final Color-based clustering center (μ (r), μ (g), μ (b)), otherwise, t=t+1 is made, Return to step 4.2, until the value at Color-based clustering center does not change；

Step 5：Build color tomographic image；

Step 5.1, according to final Color-based clustering center (μ (r), μ (g), μ (b)) is obtained in step 4, I is traveled through_mAnd I_eIn own Pixel, calculates each pixel q to Color-based clustering center (μ (r), μ (g), μ (b)) apart from d；

Step 5.2, meeting condition d<L pixel q constitutes a color tomographic image, is expressed as C_i, wherein, i represents ith Obtained color tomographic image, while these pixels from I_mAnd I_eIt is middle to remove, obtain new master color image and edge image；i Initial value value be 1；

Step 5.3, the new master color image that step 5.2 is obtained, i=i+1, return to step 3, until the mass-tone described in step 2 Coloured picture is as I_mMiddle all pixels point is all assigned in corresponding color tomographic image, constructs all color tomographic images

Step 6：Binary conversion treatment is carried out to all color tomographic images, corresponding binary image is obtained, and extract all two Connected region in value image, composition connected region set CCs；

Step 7：Build extreme learning machine classifier training collection；

First, the image of ICDAR2013 database trainings concentration is chosen as training sample, and each image in training sample is held Row step 1-6, obtains connected region set CCs；

Then, connected region adjacent in CCs is partnered two-by-two, if 2 connected regions of a centering are in one text It is in character string and adjacent, then adjacent connected region is regarded as positive sample；If 2 connected regions of a centering are all texts, and Vertical repetitive rate is that the connected region of 0, i.e., 2 is distributed in 2 different character strings, or a centering 2 connected regions, its In have one be non-textual, then regard adjacent connected region as negative sample；

From all positive negative samples, 10000 positive samples and 10000 negative samples are randomly selected as structure extreme learning machine Classifier training collection；

The characteristic vector of each sample includes height and compares R_h, average stroke width compare R_sw, vertically superposed rate R_vol, between level Away from 5 features of D and color similarity CS；

Step 9：Adjacent character merges；

Connected region in connected region set CCs corresponding to pending text detection image I, according to from top to bottom, from a left side Mark is numbered to right order, (CC is designated as₁,CC₂...CC_n), n represents the number of connected region；

The extreme learning machine grader obtained using step 8 is to (CC₁,CC₂...CC_n) neighborhood character classification is carried out, will be adjacent Character is merged, and obtains text-string, completes text detection.

2. a kind of natural scene Method for text detection based on adaptive Color-based clustering according to claim 1, its feature It is, the extreme learning machine grader obtained in the step 9 using step 8 is to (CC₁,CC₂...CC_n) carry out neighborhood character point Class, adjacent character is merged, detailed process is as follows：

Step 9.1, choose and number most preceding connected region as initial connected region, be expressed as CC_L, find out and CC_LAdjacent company Logical region, is expressed as CC_R, by CC_LAnd CC_RIt is used as one group of test sample；

Step 9.2：Obtain the characteristic vector of test sample；

Step 9.3：Using the extreme learning machine grader trained in step 8, according to the 5 of test sample sample characteristics, sentence Disconnected CC_LAnd CC_RWhether it is neighborhood character；

Step 9.4：If CC_LAnd CC_RNon-neighborhood character is judged as, CC_LPreserve, and deleted from CCs as character string CC_L；If CC_LAnd CC_RNeighborhood character is judged as, then CC_LAnd CC_RDeleted from CCs, and CC_LAnd CC_RIt is merged into one Individual connected region, is used as new CC_L；

Then, CC is chosen again_LAdjacent connected region CC_R, obtain new test sample, return to step 9.2, until CCs set In there is no connected region untill, obtain the character string after all merging.

3. a kind of natural scene Method for text detection based on adaptive Color-based clustering according to claim 2, its feature It is, the characteristic vector of the test sample includes height and compares R_h, average stroke width compare R_sw, vertically superposed rate R_vol, between level Calculated away from D and color similarity CS according to following calculation formula：

Highly compare：

Average stroke width ratio：

Vertically superposed rate：

Level interval：

Color similarity：

Wherein, CC_LIt is located at the connected region on the left side, CC for connected region centering_RIt is located at the connected region on the right for connected region centering Domain；h_L、h_RDistribution represents CC_LAnd CC_RRegion height；sw_L、sw_RDistribution represents CC_LAnd CC_RAverage stroke width；v_LR、d_LR Distribution represents CC_LAnd CC_RBetween vertically superposed length, horizontal range length；r_L、g_L、b_LRespectively CC_LR, G, B triple channel Averaged color value, r_R、g_R、b_RRespectively CC_RR, G, B triple channel averaged color value.

4. a kind of natural scene text detection side based on adaptive Color-based clustering according to claim any one of 1-3 Method, it is characterised in that using the convolutional neural networks CNN trained, test the text-string progress first round in step 9 Card, removes part non-textual, comprises the following steps that：

Step 10.2：According to confidence level Score, text-string is divided into 3 classes：High, Middle and Low, wherein specific rule It is then as follows：

High=Score | Score>1.4}

Middle=Score | 0.6≤Score≤1.4 }

Low=Score | Score<0.6}

Step 10.3：Confidence level Score is belonged into Low classes, deleted directly from candidate's text；Confidence level Score is belonged to High classes, exported directly as final detection result；Confidence level Score is belonged into Middle classes, word to be identified is used as Symbol string.

5. a kind of natural scene Method for text detection based on adaptive Color-based clustering according to claim 4, its feature It is, character string to be identified in step 10 is verified using support vector machine classifier, the detection knot after being optimized Really, detailed process is：

For what is obtained in step 10.3, belong to the candidate character strings of Middle classifications, extract HOG features, construction feature is vectorial, And character string checking is carried out using the support vector machine classifier trained, remove non-text character string, the inspection after being optimized Survey result；

The training process of the support vector machine classifier is as follows：

Step 11.1：The image of ICDAR2013 database trainings concentration is chosen as training sample, for the figure in training set Picture, performs step 1-10, obtains candidate character strings；To candidate characters string sort, comprising character as positive sample, otherwise, depending on For negative sample；

Step 11.2：For the positive negative sample in step 11.1, its histograms of oriented gradients feature is extracted, construction feature is vectorial, Training Support Vector Machines grader.