CN104809481B - A kind of natural scene Method for text detection based on adaptive Color-based clustering - Google Patents

A kind of natural scene Method for text detection based on adaptive Color-based clustering Download PDF

Info

Publication number
CN104809481B
CN104809481B CN201510263154.4A CN201510263154A CN104809481B CN 104809481 B CN104809481 B CN 104809481B CN 201510263154 A CN201510263154 A CN 201510263154A CN 104809481 B CN104809481 B CN 104809481B
Authority
CN
China
Prior art keywords
color
image
connected region
text
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201510263154.4A
Other languages
Chinese (zh)
Other versions
CN104809481A (en
Inventor
邹北骥
吴慧
郭建京
赵于前
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN201510263154.4A priority Critical patent/CN104809481B/en
Publication of CN104809481A publication Critical patent/CN104809481A/en
Application granted granted Critical
Publication of CN104809481B publication Critical patent/CN104809481B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2111Selection of the most significant subset of features by using evolutionary computational techniques, e.g. genetic algorithms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/285Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Physiology (AREA)
  • Image Analysis (AREA)

Abstract

The present invention proposes a kind of method of the natural scene text detection based on adaptive Color-based clustering, this method proposes adaptive Color-based clustering scheme first, for the image of differing complexity, the program, which can be clustered, obtains different number of color layer, effectively extracts text connected region;Then, by training extreme learning machine (ELM), neighborhood character model is built, merging forms character string, improves the robustness of method;Finally, in order to further improve the performance of system text detection, this method uses the strategy that convolutional neural networks (CNN) and SVMs (SVM) are combined, and verifies text-string, compared with conventional method, the accuracy of text detection is improved.

Description

A kind of natural scene Method for text detection based on adaptive Color-based clustering
Technical field
The invention belongs to mode identification technology, it is related to a kind of natural scene text inspection based on adaptive Color-based clustering Survey method.
Background technology
With the popularization of mobile phone and camera installation, the quantity of image and video is more and more.Wrapped in these images and video Contain the important information of many, how to extract and understand the information in image, it appears be particularly important.Text is most main in image Will, most direct information, extract and identification image in text, can secondary computer understand picture material.At present, block letter Text detection has been achieved for huge progress, and is widely used.However, the text in natural scene image, It is changeful due to its font size and pattern, while being influenceed by illumination, shade, shooting angle so that it detects effect Fruit is not good.Therefore, natural scene text detection is still a challenging job.
At present, already present natural scene Method for text detection can be divided into two major classes:Based on sliding window and based on even The method in logical region.Detection method based on sliding window is also referred to as the detection method based on region, and its operation principle is:It is first First, original image is scanned using the sliding window of different scale, obtains a series of subregion;Then, the line of subregion is extracted Manage feature, such as histogram of gradients, wavelet transformation;Finally, using the features training grader of extraction, subregion is verified, is obtained Final detection text.This method extracts subregion by multi-scale sliding window mouthful, and its time complexity is higher, and using by hand The signature verification subregion of design so that its Detection results is not good.In recent years, the Method for text detection based on connected region is obtained The extensive concern of related scholar.This method mainly includes 3 steps:1) by features such as the color of pixel, stroke widths, Connected region is extracted from image;2) feature of connected region is analyzed, rule is merged by character, text-string is obtained;3) Character string is verified, removal is non-legible, obtains final text detection result.Compared with the detection method based on sliding window, it is based on The detection method of connected region possesses higher accuracy rate, and its time complexity is relatively low.
Because the text in natural scene image is changeful, its background also shows different complexities.Therefore, such as What extracts text connected region from the image of differing complexity, and rationally removes non-textual region, is to be based on connected region The key of domain Method for text detection.
The content of the invention
The invention provides a kind of natural scene Method for text detection based on adaptive Color-based clustering, its object is to gram Take the problem of accuracy rate is not high when text detection background is complicated in the prior art.
A kind of natural scene Method for text detection based on adaptive Color-based clustering, including following steps:
Step 1:Obtain pending text detection image I edge image Ie
Step 2:Edge image I is removed from pending text detection image IeIn pixel, obtain master color image Im
Step 3:Initialize Color-based clustering center (μ0(r),μ0(g),μ0(b));
Step 3.1, by master color image ImIn pixel project to three-dimensional color space;
Step 3.2, step-length S is set, the three-dimensional color space is quantified, obtained (256/S)3Individual son of the same size Cube;
Step 3.3, the number of the pixel in each sub-cube is calculated, as the density of the sub-cube, and is found out The maximum sub-cube of density;
Step 3.4, the color average value of all pixels point in the maximum sub-cube of density is calculated, using the value as initial Color-based clustering center (μ0(r),μ0(g),μ0(b));
Step 4:Update Color-based clustering center;
Step 4.1, the initial value for setting renewal iterations t is 0, and the Color-based clustering center that the t times iteration is obtained is (μt (r),μt(g),μt(b));
Step 4.2, master color image I is calculatedmIn each pixel p to initial color cluster centre apart from dc, pixel Point p R, G, B color channel values are represented sequentially as pr、pgAnd pb
Step 4.3, master color image I is found outmIn meet condition dc<L all pixels point, and calculate the institute for the condition that meets There is the color average value of pixel, be used as new Color-based clustering center (μt+1(r),μt+1(g),μt+1(b));
L represents color distance threshold, and span is [24,88];
Because the color of text-string is close, by normative text database ICDAR2003, ICDAR2011 Obtained with ICDAR2013 experiment tests, l span is between [24,88];
Step 4.4, (μ is judgedt(r),μt(g),μt(b)) with (μt+1(r),μt+1(g),μt+1(b) it is) whether equal, if phase Deng then with (μt+1(r),μt+1(g),μt+1(b)) as final Color-based clustering center (μ (r), μ (g), μ (b)), otherwise, t is made =t+1, return to step 4.2, until the value at Color-based clustering center does not change;
Step 5:Build color tomographic image;
Step 5.1, according to final Color-based clustering center (μ (r), μ (g), μ (b)) is obtained in step 4, I is traveled throughmAnd IeIn All pixels point, calculates each pixel q to Color-based clustering center (μ (r), μ (g), μ (b)) apart from d;
Step 5.2, meeting condition d<L pixel q constitutes a color tomographic image, is expressed as Ci, wherein, i is represented The color tomographic image that ith is obtained, while these pixels from ImAnd IeIt is middle to remove, obtain new master color image and edge Image;I initial value value is 1;
Step 5.3, the new master color image that step 5.2 is obtained, i=i+1, return to step 3, until described in step 2 Master color image ImMiddle all pixels point is all assigned in corresponding color tomographic image, constructs all color tomographic images
Step 6:Binary conversion treatment is carried out to all color tomographic images, corresponding binary image is obtained, and extract institute There are the connected region in binary image, composition connected region set CCs;
Step 7:Build extreme learning machine classifier training collection;
First, the image of ICDAR2013 database trainings concentration is chosen as training sample, by every width figure in training sample As performing step 1-6, connected region set CCs is obtained;
Then, connected region adjacent in CCs is partnered two-by-two, if 2 connected regions of a centering are same It is in text-string and adjacent, then adjacent connected region is regarded as positive sample;If 2 connected regions of a centering are all texts This, and vertical repetitive rate is that the connected region of 0, i.e., 2 is distributed in 2 different character strings, or a centering 2 connected regions Domain, wherein it is non-textual to have one, then regards adjacent connected region as negative sample;
From all positive negative samples, 10000 positive samples and 10000 negative samples are randomly selected as the structure limit Habit machine classifier training collection;
Step 8:The characteristic vector for concentrating each sample with extreme learning machine classifier training trains grader, obtains neighborhood Character model;
The characteristic vector of each sample includes height and compares Rh, average stroke width compare Rsw, vertically superposed rate Rvol, water 5 features of flat space D and color similarity CS;
Step 9:Adjacent character merges;
Connected region in connected region set CCs corresponding to pending text detection image I, according to from top to bottom, Mark is numbered in order from left to right, is designated as (CC1,CC2...CCn), n represents the number of connected region;
The extreme learning machine grader obtained using step 8 is to (CC1,CC2...CCn) neighborhood character classification is carried out, by phase Adjacent character is merged, and obtains text-string, completes text detection.
The extreme learning machine grader obtained in the step 9 using step 8 is to (CC1,CC2...CCn) carry out neighborhood word Symbol classification, adjacent character is merged, detailed process is as follows:
Step 9.1, choose and number most preceding connected region as initial connected region, be expressed as CCL, find out and CCLIt is adjacent Connected region, be expressed as CCR, by CCLAnd CCRIt is used as one group of test sample;
Step 9.2:Obtain the characteristic vector of test sample;
Step 9.3:Using the extreme learning machine grader trained in step 8, according to the 5 of test sample sample spies Levy, judge CCLAnd CCRWhether it is neighborhood character;
Step 9.4:If CCLAnd CCRNon-neighborhood character is judged as, CCLPreserved as character string, and from CCs Delete CCL;If CCLAnd CCRNeighborhood character is judged as, then CCLAnd CCRDeleted from CCs, and CCLAnd CCRMerge Into a connected region, new CC is used asL
Then, CC is chosen againLAdjacent connected region CCR, obtain new test sample, return to step 9.2, until CCs Untill there is no connected region in set, the character string after all merging is obtained.
The characteristic vector of the test sample includes height and compares Rh, average stroke width compare Rsw, vertically superposed rate Rvol, water Flat space D and color similarity CS are calculated according to following calculation formula:
Highly compare:
Average stroke width ratio:
Vertically superposed rate:
Level interval:
Color similarity:
Wherein, CCLIt is located at the connected region on the left side, CC for connected region centeringRIt is located at the company on the right for connected region centering Logical region;hL、hRDistribution represents CCLAnd CCRRegion height;swL、swRDistribution represents CCLAnd CCRAverage stroke width; vLR、dLRDistribution represents CCLAnd CCRBetween vertically superposed length, horizontal range length;rL、gL、bLRespectively CCLR, G, B The averaged color value of triple channel, rR、gR、bRRespectively CCRR, G, B triple channel averaged color value.
Using the convolutional neural networks CNN trained, first round checking is carried out to the text-string in step 9, removed Part non-textual, is comprised the following steps that:
Step 10.1:The confidence level of text-string is calculated by convolutional neural networks CNN, Score is expressed as;
Step 10.2:According to confidence level Score, text-string is divided into 3 classes:High, Middle and Low, wherein having Body rule is as follows:
High=Score | Score>1.4}
Middle=Score | 0.6≤Score≤1.4 }
Low=Score | Score<0.6}
Step 10.3:Confidence level Score is belonged into Low classes, deleted directly from candidate's text;By confidence level Score Belong to High classes, exported directly as final detection result;Confidence level Score is belonged into Middle classes, as to be identified Character string.
Using the convolutional Neural net built in the 21st International Model identification the 3304-3308 pages of collection of thesis of meeting in 2012 Network, to candidate text character string sort;
Character string to be identified in step 10 is verified using support vector machine classifier, the detection after being optimized As a result, detailed process is:
For what is obtained in step 10.3, belong to the candidate character strings of Middle classifications, extract HOG features, construction feature Vector, and character string checking is carried out using the support vector machine classifier trained, non-text character string is removed, is obtained after optimization Testing result;
The training process of the support vector machine classifier is as follows:
Step 11.1:The image of ICDAR2013 database trainings concentration is chosen as training sample, in training set Image, performs step 1-10, obtains candidate character strings;To candidate characters string sort, comprising character as positive sample, otherwise, It is considered as negative sample;
Step 11.2:For the positive negative sample in step 11.1, extract its histograms of oriented gradients feature, construction feature to Amount, Training Support Vector Machines grader.
Beneficial effect
The present invention proposes a kind of method of the natural scene text detection based on adaptive Color-based clustering, and this method is first Adaptive Color-based clustering scheme is proposed, for the image of differing complexity, the program, which can be clustered, obtains different number of color Layer, effectively extracts text connected region;Then, by training extreme learning machine (ELM), neighborhood character model is built, merges shape Into character string, improve the robustness of method;Finally, in order to further improve the performance of system text detection, this method is used The strategy that convolutional neural networks (CNN) and SVMs (SVM) are combined, verifies text-string, compared with conventional method, Improve the accuracy of text detection.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of the method for the invention;
Fig. 2 is the process of text detection, wherein, (a) is image to be detected;(b) it is first color tomographic image;(c) it is same Neighborhood character amalgamation result in one color layer;(d) it is the result after convolutional neural networks (CNN) verification;(e) it is final inspection Survey result.
Fig. 3 is the samples pictures for training;Wherein, solid line indicate for positive sample merging process;Dotted line is indicated respectively Two kinds of merging modes of negative sample.
Embodiment
For the object, technical solutions and advantages of the present invention are more clearly understood, below in conjunction with specific embodiment, this reference Accompanying drawing, the present invention is described in more detail.
A kind of method of the natural scene text detection based on adaptive Color-based clustering, as shown in figure 1, including following step Suddenly:
Exemplified by carrying out text detection to Fig. 2 (a), comprise the following steps that:
Step 1:Input image to be detected, such as Fig. 2 (a);
Step 2:Using Canny edge detection algorithms, the edge pixel point of image to be detected is extracted, edge image is constituted, It is expressed as Ie;Then by edge image IeMiddle pixel, removes from original image, obtains master color image, be expressed as Im
Step 3:Initialize Color-based clustering center:
First, by the master color image I in step 2mMiddle pixel projects to three-dimensional color space;Then step-length S is set =32, the three-dimensional color space is quantified, 8 × 8 × 8 sub-cubes of the same size are obtained;Secondly, calculate per height The number of pixel in cube, as the density of the sub-cube, finds out the maximum sub-cube of density;Finally, calculate The color average value of all pixels point in the maximum sub-cube of density, using the value as initial Color-based clustering center, is represented For (μ (r), μ (g), μ (b));
Step 4:Update Color-based clustering center:
First, I is calculatedmIn pixel p (its R, G, B color channel values is expressed as pr, pg, pb) clustered to initial color Center apart from dc.As shown in formula (1):
Wherein, t represents the t times iteration,For the t times iteration ciThe color center of layer.
Then, I is found outmIn meet condition dc<(because the color of text-string is close, therefore l span should by l It should be obtained between [24,88] by normative text database ICDAR2003, ICDAR2011 and ICDAR2013 experiment test:Work as l= When 48, the degree of accuracy highest of text detection, therefore l values are all pixels point 48) in that patent, calculate its color average value, are made For new color center.Repeat step 4, until color center not change, i.e., Final color center is obtained, (μ is expressed asi(r),μi(g),μi(b));
Step 5:Build color layer:
First, according to obtaining final color center (μ in step 4i(r),μi(g),μi(b) I), is traveled throughmAnd IeIn own Pixel, calculates pixel p to color center (μi(r),μi(g),μi(b)) apart from dc;Then, by ImAnd IeIn, meet bar Part dc<L (wherein l=48) pixel composition color layer Ci, such as Fig. 2 (b), and by these pixels from ImAnd IeIt is middle to remove;
Step 6:Repeat the above steps 3-5, until master color image ImMiddle all pixels point is all assigned to corresponding color Layer, constructs all color layersBinaryzation is carried out to each color layer picture, obtained Corresponding two-value picture, then using Connected Regions Extraction, obtains a series of connected regions, is expressed as CCs.
Step 7:Each 10000 positive negative samples are chosen, training set is built;
Step 7.1:The image for choosing ICDAR2013 database trainings concentration is used as training sample.By the figure in training set Piece performs step 1-6, obtains a series of connected region CCs, then connected region adjacent in CCs partners two-by-two, shape Into connected domain pair;
Step 7.2:Positive negative sample is divided into progress to the connected domain obtained in step 7.1 manually, the purpose is to allow just Negative sample training extreme learning machine (ELM), allows extreme learning machine (ELM) to know the feature of positive sample, and negative sample feature, with Just to data can automatic distinguishing be text or non-textual.
Wherein, the connected region pair that positive sample adjacent character in character string is constituted.Negative sample includes two parts:Character with The connected region pair of non-character composition;Character and character pair that vertical repetitive rate is zero.
Fig. 3 illustrates the structure of positive negative sample, wherein, the connected region of solid line connection is to for positive sample, dotted line connection Connected region to for negative sample.
Step 8:For the positive negative sample obtained in step 7.2, feature, construction feature vector, training limit study are extracted Machine grader (ELM), obtains neighborhood character model;
Step 8.1:Randomly select 10000 positive samples and 10000 negative samples;
Step 8.2:Calculate the height ratio (Height ratio) of sample, average stroke width ratio (Stroke width Ratio), vertically superposed rate (Vertical overlap ratio), level interval (Horizontal distance) and color Shown in 5 features of similarity (Color similarity), such as formula (2-6):
Height is than (Height ratio):
Average stroke width ratio (Stroke width ratio):
Vertically superposed rate (Vertical overlap ratio):
Level interval (Horizontal distance):
Color similarity (Color similarity):
Wherein, hL、hRDistribution represents CCLAnd CCRRegion height;swL、swRDistribution represents CCLAnd CCRAverage stroke it is wide Spend (CCLIt is located at the connected region on the left side, CC for connected region centeringRIt is located at the connected region on the right for connected region centering); vLR、dLRDistribution represents CCLAnd CCRBetween vertically superposed length, horizontal range length;rL、gL、bLRespectively CCLR, G, B The averaged color value of triple channel, rR、gR、bRRespectively CCRR, G, B triple channel averaged color value.
Step 8.3:By 5 feature constructions of sample into characteristic vector, training extreme learning machine grader (ELM) is obtained Neighborhood character model;
Step 9:To the connected region CCs obtained in step 6, using the neighborhood character model obtained in step 8, judge to connect Whether logical region is adjacent character, so as to build character string, obtains Preliminary detection result, such as Fig. 2 (c);
Step 9.1:By the connected region CCs in step 6 by position from top to bottom, from left to right in the way of sequencing numbers (CC1,CC2...CCn);
Step 9.2:Choose and number most preceding connected region as initial connected region, be expressed as CCL, find out and CCLIt is adjacent All connected regions, be expressed as CCR, by CCLAnd CCRIt is used as one group of test sample.
Step 9.3:Step 8.2 is performed to test sample, 5 sample characteristics are obtained;
Step 9.4:Using the extreme learning machine (ELM) trained in step 8, according to the 5 of test sample sample characteristics, Judge CCLAnd CCRWhether it is neighborhood character;
Step 9.5:If CCLAnd CCRNon-neighborhood character is judged as, CCLPreserved as character string, and from CCs Delete CCL, then repeat step 9.2-9.5;If CCLAnd CCRNeighborhood character is judged as, then CCLAnd CCRDeleted from CCs Remove, and CCLAnd CCRA connected region is merged into, new CC is used asL, then repeat step 9.2-9.5;
Step 9.6:Repeat step 9.2-9.5, untill not having connected region in CCs set, is obtained after all merging Character string;
Step 10:Using the convolutional neural networks (CNN) trained, the confidence level of candidate character strings in calculation procedure 9, and Text-string is classified according to confidence level.
The step 10 further comprises the steps:
Step 10.1:The confidence level of text-string is calculated by convolutional neural networks (CNN), Score is expressed as;
Step 10.2:According to confidence level Score, text-string is divided into 3 classes:High, Middle and Low, wherein having Body rule is as follows:
High=Score | Score>1.4}
Middle=Score | 0.6≤Score≤1.4 }
Low=Score | Score<0.6}
Step 10.3:Confidence level Score is belonged into Low classes, deleted directly from candidate's text;By confidence level Score Belong to High classes, exported directly as final detection result, such as Fig. 2 (d);Confidence level Score is belonged into Middle classes, adopted Determined whether with step 11;
Using the convolutional Neural net built in the 21st International Model identification the 3304-3308 pages of collection of thesis of meeting in 2012 Network, to candidate text character string sort;
Step 11:The second wheel is carried out to remaining character string in step 10 to verify, obtain final using SVMs (SVM) Testing result, such as Fig. 2 (e).
Step 11.1:The image of ICDAR2013 database trainings concentration is chosen as training sample, in training set Image, performs step 1-10, obtains candidate character strings;To candidate characters string sort, comprising character as positive sample, otherwise, It is considered as negative sample;
Step 11.2:For the positive negative sample in step 11.1, extract its histograms of oriented gradients feature, construction feature to Amount, Training Support Vector Machines grader.
Presently preferred embodiments of the present invention is the foregoing is only, is merely illustrative for the purpose of the present invention, and it is non-limiting 's.Those skilled in the art understands, many changes can be carried out to it in the spirit and scope that the claims in the present invention are limited, Modification, in addition it is equivalent, but fall within protection scope of the present invention.

Claims (5)

1. a kind of natural scene Method for text detection based on adaptive Color-based clustering, it is characterised in that including following step Suddenly:
Step 1:Obtain pending text detection image I edge image Ie
Step 2:Edge image I is removed from pending text detection image IeIn pixel, obtain master color image Im
Step 3:Initialize Color-based clustering center (μ0(r),μ0(g),μ0(b));
Step 3.1, by master color image ImIn pixel project to three-dimensional color space;
Step 3.2, step-length S is set, the three-dimensional color space is quantified, obtained (256/S)3Individual son cube of the same size Body;
Step 3.3, the number of the pixel in each sub-cube is calculated, as the density of the sub-cube, and density is found out Maximum sub-cube;
Step 3.4, the color average value of all pixels point in the maximum sub-cube of density is calculated, the value is regard as initial color Color cluster centre (μ0(r),μ0(g),μ0(b));
Step 4:Update Color-based clustering center;
Step 4.1, the initial value for setting renewal iterations t is 0, and the Color-based clustering center that the t times iteration is obtained is (μt(r), μt(g),μt(b));
Step 4.2, master color image I is calculatedmIn each pixel p to initial color cluster centre apart from dc, pixel p R, G, B color channel values be represented sequentially as pr、pgAnd pb
<mrow> <msub> <mi>d</mi> <mi>c</mi> </msub> <mo>=</mo> <msqrt> <mrow> <msup> <mrow> <mo>(</mo> <msub> <mi>p</mi> <mi>r</mi> </msub> <mo>-</mo> <msup> <mi>&amp;mu;</mi> <mi>t</mi> </msup> <mo>(</mo> <mi>r</mi> <mo>)</mo> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>+</mo> <msup> <mrow> <mo>(</mo> <msub> <mi>p</mi> <mi>g</mi> </msub> <mo>-</mo> <msup> <mi>&amp;mu;</mi> <mi>t</mi> </msup> <mo>(</mo> <mi>g</mi> <mo>)</mo> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>+</mo> <msup> <mrow> <mo>(</mo> <msub> <mi>p</mi> <mi>b</mi> </msub> <mo>-</mo> <msup> <mi>&amp;mu;</mi> <mi>t</mi> </msup> <mo>(</mo> <mi>b</mi> <mo>)</mo> <mo>)</mo> </mrow> <mn>2</mn> </msup> </mrow> </msqrt> </mrow>
Step 4.3, master color image I is found outmIn meet condition dc<L all pixels point, and calculate all pictures for the condition that meets The color average value of vegetarian refreshments, is used as new Color-based clustering center (μt+1(r),μt+1(g),μt+1(b));
L represents color distance threshold, and span is [24,88];
Step 4.4, (μ is judgedt(r),μt(g),μt(b)) with (μt+1(r),μt+1(g),μt+1(b) it is) whether equal, if equal, With (μt+1(r),μt+1(g),μt+1(b)) as final Color-based clustering center (μ (r), μ (g), μ (b)), otherwise, t=t+1 is made, Return to step 4.2, until the value at Color-based clustering center does not change;
Step 5:Build color tomographic image;
Step 5.1, according to final Color-based clustering center (μ (r), μ (g), μ (b)) is obtained in step 4, I is traveled throughmAnd IeIn own Pixel, calculates each pixel q to Color-based clustering center (μ (r), μ (g), μ (b)) apart from d;
Step 5.2, meeting condition d<L pixel q constitutes a color tomographic image, is expressed as Ci, wherein, i represents ith Obtained color tomographic image, while these pixels from ImAnd IeIt is middle to remove, obtain new master color image and edge image;i Initial value value be 1;
Step 5.3, the new master color image that step 5.2 is obtained, i=i+1, return to step 3, until the mass-tone described in step 2 Coloured picture is as ImMiddle all pixels point is all assigned in corresponding color tomographic image, constructs all color tomographic images
Step 6:Binary conversion treatment is carried out to all color tomographic images, corresponding binary image is obtained, and extract all two Connected region in value image, composition connected region set CCs;
Step 7:Build extreme learning machine classifier training collection;
First, the image of ICDAR2013 database trainings concentration is chosen as training sample, and each image in training sample is held Row step 1-6, obtains connected region set CCs;
Then, connected region adjacent in CCs is partnered two-by-two, if 2 connected regions of a centering are in one text It is in character string and adjacent, then adjacent connected region is regarded as positive sample;If 2 connected regions of a centering are all texts, and Vertical repetitive rate is that the connected region of 0, i.e., 2 is distributed in 2 different character strings, or a centering 2 connected regions, its In have one be non-textual, then regard adjacent connected region as negative sample;
From all positive negative samples, 10000 positive samples and 10000 negative samples are randomly selected as structure extreme learning machine Classifier training collection;
Step 8:The characteristic vector for concentrating each sample with extreme learning machine classifier training trains grader, obtains neighborhood character Model;
The characteristic vector of each sample includes height and compares Rh, average stroke width compare Rsw, vertically superposed rate Rvol, between level Away from 5 features of D and color similarity CS;
Step 9:Adjacent character merges;
Connected region in connected region set CCs corresponding to pending text detection image I, according to from top to bottom, from a left side Mark is numbered to right order, (CC is designated as1,CC2...CCn), n represents the number of connected region;
The extreme learning machine grader obtained using step 8 is to (CC1,CC2...CCn) neighborhood character classification is carried out, will be adjacent Character is merged, and obtains text-string, completes text detection.
2. a kind of natural scene Method for text detection based on adaptive Color-based clustering according to claim 1, its feature It is, the extreme learning machine grader obtained in the step 9 using step 8 is to (CC1,CC2...CCn) carry out neighborhood character point Class, adjacent character is merged, detailed process is as follows:
Step 9.1, choose and number most preceding connected region as initial connected region, be expressed as CCL, find out and CCLAdjacent company Logical region, is expressed as CCR, by CCLAnd CCRIt is used as one group of test sample;
Step 9.2:Obtain the characteristic vector of test sample;
Step 9.3:Using the extreme learning machine grader trained in step 8, according to the 5 of test sample sample characteristics, sentence Disconnected CCLAnd CCRWhether it is neighborhood character;
Step 9.4:If CCLAnd CCRNon-neighborhood character is judged as, CCLPreserve, and deleted from CCs as character string CCL;If CCLAnd CCRNeighborhood character is judged as, then CCLAnd CCRDeleted from CCs, and CCLAnd CCRIt is merged into one Individual connected region, is used as new CCL
Then, CC is chosen againLAdjacent connected region CCR, obtain new test sample, return to step 9.2, until CCs set In there is no connected region untill, obtain the character string after all merging.
3. a kind of natural scene Method for text detection based on adaptive Color-based clustering according to claim 2, its feature It is, the characteristic vector of the test sample includes height and compares Rh, average stroke width compare Rsw, vertically superposed rate Rvol, between level Calculated away from D and color similarity CS according to following calculation formula:
Highly compare:
Average stroke width ratio:
Vertically superposed rate:
Level interval:
Color similarity:
Wherein, CCLIt is located at the connected region on the left side, CC for connected region centeringRIt is located at the connected region on the right for connected region centering Domain;hL、hRDistribution represents CCLAnd CCRRegion height;swL、swRDistribution represents CCLAnd CCRAverage stroke width;vLR、dLR Distribution represents CCLAnd CCRBetween vertically superposed length, horizontal range length;rL、gL、bLRespectively CCLR, G, B triple channel Averaged color value, rR、gR、bRRespectively CCRR, G, B triple channel averaged color value.
4. a kind of natural scene text detection side based on adaptive Color-based clustering according to claim any one of 1-3 Method, it is characterised in that using the convolutional neural networks CNN trained, test the text-string progress first round in step 9 Card, removes part non-textual, comprises the following steps that:
Step 10.1:The confidence level of text-string is calculated by convolutional neural networks CNN, Score is expressed as;
Step 10.2:According to confidence level Score, text-string is divided into 3 classes:High, Middle and Low, wherein specific rule It is then as follows:
High=Score | Score>1.4}
Middle=Score | 0.6≤Score≤1.4 }
Low=Score | Score<0.6}
Step 10.3:Confidence level Score is belonged into Low classes, deleted directly from candidate's text;Confidence level Score is belonged to High classes, exported directly as final detection result;Confidence level Score is belonged into Middle classes, word to be identified is used as Symbol string.
5. a kind of natural scene Method for text detection based on adaptive Color-based clustering according to claim 4, its feature It is, character string to be identified in step 10 is verified using support vector machine classifier, the detection knot after being optimized Really, detailed process is:
For what is obtained in step 10.3, belong to the candidate character strings of Middle classifications, extract HOG features, construction feature is vectorial, And character string checking is carried out using the support vector machine classifier trained, remove non-text character string, the inspection after being optimized Survey result;
The training process of the support vector machine classifier is as follows:
Step 11.1:The image of ICDAR2013 database trainings concentration is chosen as training sample, for the figure in training set Picture, performs step 1-10, obtains candidate character strings;To candidate characters string sort, comprising character as positive sample, otherwise, depending on For negative sample;
Step 11.2:For the positive negative sample in step 11.1, its histograms of oriented gradients feature is extracted, construction feature is vectorial, Training Support Vector Machines grader.
CN201510263154.4A 2015-05-21 2015-05-21 A kind of natural scene Method for text detection based on adaptive Color-based clustering Expired - Fee Related CN104809481B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510263154.4A CN104809481B (en) 2015-05-21 2015-05-21 A kind of natural scene Method for text detection based on adaptive Color-based clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510263154.4A CN104809481B (en) 2015-05-21 2015-05-21 A kind of natural scene Method for text detection based on adaptive Color-based clustering

Publications (2)

Publication Number Publication Date
CN104809481A CN104809481A (en) 2015-07-29
CN104809481B true CN104809481B (en) 2017-10-20

Family

ID=53694292

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510263154.4A Expired - Fee Related CN104809481B (en) 2015-05-21 2015-05-21 A kind of natural scene Method for text detection based on adaptive Color-based clustering

Country Status (1)

Country Link
CN (1) CN104809481B (en)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105320961A (en) * 2015-10-16 2016-02-10 重庆邮电大学 Handwriting numeral recognition method based on convolutional neural network and support vector machine
CN106599900B (en) * 2015-10-20 2020-04-21 华中科技大学 Method and device for recognizing character strings in image
CN105512683B (en) * 2015-12-08 2019-03-08 浙江宇视科技有限公司 Object localization method and device based on convolutional neural networks
CN105512640B (en) * 2015-12-30 2019-04-02 重庆邮电大学 A kind of people flow rate statistical method based on video sequence
CN105787465A (en) * 2016-03-21 2016-07-20 苏州东吴维桢信息技术有限公司 Image identification method based on position structure
CN106055675B (en) * 2016-06-06 2019-10-29 杭州量知数据科技有限公司 A kind of Relation extraction method based on convolutional neural networks and apart from supervision
CN107688576B (en) * 2016-08-04 2020-06-16 中国科学院声学研究所 Construction and tendency classification method of CNN-SVM model
CN107766774A (en) * 2016-08-17 2018-03-06 鸿富锦精密电子(天津)有限公司 Face identification system and method
CN106326921B (en) * 2016-08-18 2020-01-31 宁波傲视智绘光电科技有限公司 Text detection method
CN106874905B (en) * 2017-01-12 2019-06-11 中南大学 A method of the natural scene text detection based on self study Color-based clustering
CN107203606A (en) * 2017-05-17 2017-09-26 西北工业大学 Text detection and recognition methods under natural scene based on convolutional neural networks
US10163022B1 (en) * 2017-06-22 2018-12-25 StradVision, Inc. Method for learning text recognition, method for recognizing text using the same, and apparatus for learning text recognition, apparatus for recognizing text using the same
CN107330127B (en) * 2017-07-21 2020-06-05 湘潭大学 Similar text detection method based on text picture retrieval
CN107563379B (en) * 2017-09-02 2019-12-24 西安电子科技大学 Method for positioning text in natural scene image
CN108229397B (en) * 2018-01-04 2020-08-18 华南理工大学 Method for detecting text in image based on Faster R-CNN
CN109558876B (en) * 2018-11-20 2021-11-16 浙江口碑网络技术有限公司 Character recognition processing method and device
CN109886330B (en) * 2019-02-18 2020-11-27 腾讯科技(深圳)有限公司 Text detection method and device, computer readable storage medium and computer equipment
CN110059647A (en) * 2019-04-23 2019-07-26 杭州智趣智能信息技术有限公司 A kind of file classification method, system and associated component
CN114255467A (en) * 2020-09-22 2022-03-29 阿里巴巴集团控股有限公司 Text recognition method and device, and feature extraction neural network training method and device
CN116399401B (en) * 2023-04-14 2024-02-09 浙江年年发农业开发有限公司 Agricultural planting system and method based on artificial intelligence

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101615252A (en) * 2008-06-25 2009-12-30 中国科学院自动化研究所 A kind of method for extracting text information from adaptive images
CN104573685A (en) * 2015-01-29 2015-04-29 中南大学 Natural scene text detecting method based on extraction of linear structures

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4893861B1 (en) * 2011-03-10 2012-03-07 オムロン株式会社 Character string detection apparatus, image processing apparatus, character string detection method, control program, and recording medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101615252A (en) * 2008-06-25 2009-12-30 中国科学院自动化研究所 A kind of method for extracting text information from adaptive images
CN104573685A (en) * 2015-01-29 2015-04-29 中南大学 Natural scene text detecting method based on extraction of linear structures

Also Published As

Publication number Publication date
CN104809481A (en) 2015-07-29

Similar Documents

Publication Publication Date Title
CN104809481B (en) A kind of natural scene Method for text detection based on adaptive Color-based clustering
US10255691B2 (en) Method and system of detecting and recognizing a vehicle logo based on selective search
CN107346420B (en) Character detection and positioning method in natural scene based on deep learning
He et al. Accurate text localization in natural image with cascaded convolutional text network
Huang et al. Robust scene text detection with convolution neural network induced mser trees
Pan et al. A robust system to detect and localize texts in natural scene images
CN104050471B (en) Natural scene character detection method and system
CN106446015A (en) Video content access prediction and recommendation method based on user behavior preference
CN103049763B (en) Context-constraint-based target identification method
CN111079674B (en) Target detection method based on global and local information fusion
JP5522408B2 (en) Pattern recognition device
CN101719142B (en) Method for detecting picture characters by sparse representation based on classifying dictionary
CN104573685A (en) Natural scene text detecting method based on extraction of linear structures
CN105447522A (en) Complex image character identification system
CN110287952A (en) A kind of recognition methods and system for tieing up sonagram piece character
CN112766170B (en) Self-adaptive segmentation detection method and device based on cluster unmanned aerial vehicle image
CN107330027A (en) A kind of Weakly supervised depth station caption detection method
CN110008899B (en) Method for extracting and classifying candidate targets of visible light remote sensing image
Zhu et al. Deep residual text detection network for scene text
CN105654054A (en) Semi-supervised neighbor propagation learning and multi-visual dictionary model-based intelligent video analysis method
Kobchaisawat et al. Thai text localization in natural scene images using convolutional neural network
CN110728214B (en) Weak and small figure target detection method based on scale matching
CN111310820A (en) Foundation meteorological cloud chart classification method based on cross validation depth CNN feature integration
CN107357834A (en) Image retrieval method based on visual saliency fusion
CN109741351A (en) A kind of classification responsive type edge detection method based on deep learning

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20171020

Termination date: 20190521