CN103942550A

CN103942550A - Scene text recognition method based on sparse coding characteristics

Info

Publication number: CN103942550A
Application number: CN201410184072.6A
Authority: CN
Inventors: 王菡子; 王大寒; 章冬
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2014-05-04
Filing date: 2014-05-04
Publication date: 2014-07-23
Anticipated expiration: 2034-05-04
Also published as: CN103942550B

Abstract

The invention discloses a scene text recognition method based on sparse coding characteristics, and relates to computer vision and pattern recognition. The method includes the steps: inputting a natural scene text image to be recognized; by the aid of a multi-scale sliding window method, detecting and recognizing a window area in the image by a character classifier, for each character class, determining a large output area of the classifier as a candidate character area, determining a small output area as a background area, finding the candidate character area in the image, retaining the area with the largest output value of the classifier and the corresponding character class for the area with a large overlapping ratio by the aid of a non-maximum suppression method, and removing the repetitive and redundant candidate character area to obtain a character detection result; combining detected characters into a word or text line; outputting a scene text recognition result. Structural characteristics of the characters can be more effectively expressed and extracted, so that the recognition rate of a scene text is increased.

Description

A kind of scene text recognition methods based on sparse coding feature

Technical field

The present invention relates to computer vision and pattern-recognition, especially relate to a kind of scene text recognition methods based on sparse coding feature.

Background technology

Along with the product such as smart mobile phone and digital camera becomes more and more popular, to obtain picture and video and become easy, analysis and the understanding of image and video become one of research direction having broad prospect of application.In image and video, text message has comprised important semantic information, understanding to image and video has important value, such as the captions in the other billboard of cover page, the road of books, mark information, video etc. have all comprised bulk information, these information are convenient to the mankind and computer understanding and storage more.So at computer vision field, scene image text identification has attracted increasing concern.Due to scene image background complexity, size, the font of scene word, vary in color, and be subject to the impact of illumination variation and image degradation, this makes the identification of scene text have larger challenge.

Traditional OCR (optical character identification) technology can well be identified the fairly simple scan text document of background, but while being used for identifying on scene text, discrimination is very low.Scene text is different from the text document of scanning, in scene text, due to background more complicated, text filed just can identify afterwards must first be detected when text in identification.And in text document, carry out simple binary conversion treatment just can obtain text filedly clearly, adopt OCR to identify and just can obtain reasonable effect.So the identification of scene text not only will be identified text, has also comprised the detection to text.

The current thought that the identification of scene text is mainly adopted to the target detection in computer vision is carried out text detection and identification simultaneously.Its basic thought is, each class character is used as to a sensation target, and then detects character zone from scene text image, also provided identification classification and the identification mark in candidate characters region simultaneously.On the basis of character detection and Identification, then candidate characters region and corresponding character class are coupled together, obtain the recognition result of scene text.This method of simultaneously carrying out detection and Identification puts forward at international top-level meeting ICCV2011, has shown the recognition performance that is better than traditional OCR.The research of this respect has also been carried out in a lot of research that has for several years afterwards, has improved the performance of scene text identification.But, in the scene text recognition methods detecting at these based targets, (what use due to character detection and Identification is same sorter to character classification device, the unified character classification device of using below) what adopt is gradient orientation histogram feature (being HOG, Histogram of Gradients) conventional in target detection.HOG feature can be expressed local appearance features and the shape facility of target preferably, and to illumination-insensitive, so HOG feature is widely used in the Computer Vision Task such as face detection, pedestrian detection.In the scene text recognizer of current proposition, HOG feature extracting method is also used as the feature extraction algorithm of character classification device.

Although HOG feature can represent local feature (such as edge etc.), HOG feature can not effective expression structural information.Especially to character recognition, the structural information of character is very important information, can effectively distinguish the textural difference between character, thereby improves character identification rate.Scene text recognition methods based on sparse coding feature, does not also have the report of Patents or document.

Summary of the invention

The object of the invention is to for the feature extraction of character classification device in current scene text identification can not effective expression charcter topology information etc. problem, a kind of scene text recognition methods based on sparse coding feature is provided.

The present invention includes following steps:

Step S1: input natural scene text image to be identified;

Step S2: the method that adopts multi-scale sliding window mouth, window area in image is carried out to detection and Identification with character classification device, to each character class, be candidate characters region by the larger regional determination of sorter output, export less region and think background area, find out like this candidate characters region comprising in image, adopt again non-maximum value inhibition method, the larger region of Duplication is only retained to region and the corresponding character class of sorter output valve maximum, the candidate characters region of removing repeated and redundant, obtains character testing result;

In step S2, the feature extraction of described character classification device can adopt the feature based on sparse coding, sorter training adopts training comparatively simply and recognition speed Random Fern sorter or svm classifier device faster, and the characteristic extraction procedure of described sparse coding comprises the steps:

Step S201: by a large amount of natural scene picture data, obtain a sparse coding dictionary with general applicability with K-SVD Algorithm Learning;

In step S201, described K-SVD algorithm is when study dictionary (representing with D), each element of dictionary D is designed to 9 × 9 picture, represent the total architectural feature that study obtains, dictionary D comprises 100 elements (size that is dictionary is 100) altogether, this makes dictionary have higher expression ability, makes calculated amount be controlled at acceptable scope simultaneously.

Step S202: the sparse coding dictionary that study is obtained is preserved, wherein, what in dictionary, each element was described is some important structural informations;

Step S203: utilize the dictionary of preserving in step S202, extract the sparse coding feature of image;

In step S203, the concrete grammar of the sparse coding feature of described extraction image can be: to each pixel of image, decode and obtain the sparse coding of pixel by Orthogonal Matching Pursuit (OMP) algorithm, the sparse coding obtaining being added up to the histogram that obtains sparse coding (is Histogram of Sparse Codes again, HSC), thereby obtain the sparse coding feature of image, i.e. HSC feature;

The described histogram that obtains sparse coding that the sparse coding obtaining is added up, thereby the method that obtains the sparse coding feature of image can be: when sparse coding is added up to the histogram that obtains sparse coding, having adopted and being similar to histogram of gradients feature (is HOG feature, Histogram of Oriented Gradients) method, concrete steps comprise:

First, the picture of input is divided into 8 × 8 junior unit piece, adds up the sparse coding of each junior unit piece;

Then, use bilinear interpolation to utilize the adjacent block of each junior unit piece to calculate the sparse coding feature of each junior unit piece, also the feature on each junior unit piece is to ask interpolation to obtain on the neighborhood of 16 × 16;

Finally, the proper vector of all junior unit pieces is linked up to the sparse coding feature that obtains whole image, i.e. HSC feature.

Step S3: the character detecting is merged into a word or line of text;

In step S3, described the character detecting is merged into a word or line of text, owing to each character class having been retained to a large amount of candidate characters regions, when being merged into word, character has a large amount of array modes, therefore can adopt dynamic programming algorithm search to obtain identifying the character combination mode of mark maximum, obtain final text identification result;

Described employing dynamic programming algorithm search obtains identifying the character combination mode of mark maximum, needs an objective function to evaluate the score of each combination; The design of described objective function can adopt following methods:

With w=(c ₁, c ₂..., c _n) expression candidate word, wherein a c _i(i=1,2 ..., character class n) comprising in expression candidate word, n is character number (being text size), x _irepresent c _icandidate characters region, objective function is designed to:

O = Σ_{i = 1}^{n} S (c_{i}, x_{i}) + α Σ_{i = 1}^{n - 1} g (x_{i}, x_{i + 1}) + βn,

Wherein S (c _i, x _i) be that character classification device is by candidate characters x _ibe identified as c _iscore, g (x _i, x _i+1) be the output of geometric model, candidate characters x has been described _iand x _i+1compatibility in geometric relationship, α and β are two and regulate parameter.

In described objective function, geometric model g (x _i, x _i+1) what describe is whether two geometric properties between adjacent character are intercharacter features, two class classification problems, geometric properties is carried out to modeling with a svm classifier device, the geometric properties extracting when modeling comprises Duplication, the distance of up-and-down boundary etc. of yardstick similarity, adjacent character.

In described objective function, consider the impact of text size, therefore (additive method is not considered the number of character can to overcome the impact of character length on recognition result, character number is larger, objective function can be larger, cause the recognition result of identifying additive method to tend to the more text of number of characters), thus text identification rate improved.

In described objective function, regulate parameter alpha and β to adopt minimum classification error rate training method (Minimum Classification Error Training) to obtain at scene text database learning.

Step S4: output scene text identification result.

The present invention proposes a kind of scene text recognition methods based on sparse coding feature, character classification device of the present invention has adopted the feature extracting method based on sparse coding, can more effectively represent and extract the architectural feature of character, thereby improve the discrimination of scene text.

The sparse coding feature adopting in the present invention, i.e. Histogram of Sparse Codes (HSC) feature, can automatic learning and represent the structural information of character, thereby can describe better the feature of character, improves text identification rate.Meanwhile, text recognition method of the present invention is the also integrated output of character classification device, the output of geometric model, and considered the impact of text size (character number comprising in text) on recognition result.Parameter in text identification obtains by minimum classification error rate training method automatic learning, the higher performance of gain of parameter that this sets than experience.The present invention can be widely used in the occasions such as scene text identification.

Scene text recognition methods based on sparse coding feature provided by the invention, compared with additive method, the advantage and the beneficial effect that have are as follows:

1, character classification device of the present invention has adopted the feature extraction algorithm (being HSC) based on sparse coding, this feature extraction algorithm can represent abundant structural information better, improve the discriminating power of feature, thus detection and Identification character better.

2, the feature extraction algorithm based on sparse coding of the present invention is in the time extracting feature, and feature is directly learnt to obtain by sparse decode procedure, does not therefore need manual setting.

3, method of the present invention is in the time that search obtains optimum character combination mode, how much compatible (being geometric model) between candidate characters are also considered, this has effectively utilized the effective informations such as the geometric properties between character, has therefore improved text identification rate.

4, in method of the present invention, objective function has been considered the impact of text size, therefore can overcome the impact of text size on recognition result, thereby has improved scene text discrimination.

5, in method of the present invention, the parameter in objective function is obtained by MCE automatic learning, therefore can obtain more superior recognition effect.

6, method of the present invention, applicable to the scene text recognition methods of Chinese or the language such as English, in the time of training character sorter, adopts corresponding character database to train.

Brief description of the drawings

Fig. 1 is method flow block diagram of the present invention.

Fig. 2 is the sparse coding dictionary obtaining with K-SVD Algorithm Learning.Wherein, be (a) 5 × 5, be (b) 7 × 7, be (c) 9 × 9.

Fig. 3 is the comparative examples of HSC and HOG feature extraction result.Wherein, (1) is original character image, and (2) are HSC mark sheet diagram, and (3) are HOG mark sheet diagram.

Fig. 4 implements identifying and the result example that the present invention obtains.

Embodiment

For technical method of the present invention and advantage are further explained, below in conjunction with the drawings and specific embodiments, the present invention is described further.

As shown in the method flow diagram in Fig. 1, the present invention includes following steps:

Step S1: input natural scene text image to be identified;

Step S2: the method that adopts multi-scale sliding window mouth, window area in image is carried out to detection and Identification with character classification device, to each character class, be candidate characters region by the larger regional determination of sorter output, export less region and think background area, find out like this candidate characters region comprising in image.Adopt again non-maximum value inhibition method, the larger region of Duplication is only retained to region and the corresponding character class of sorter output valve maximum, remove like this candidate characters region of a large amount of repeated and redundant, obtain character testing result;

In this step, use a character classification device that precondition is good.Character classification device of the present invention adopts the feature extracting method based on sparse coding, and sorter adopts conventional Random Fern or svm classifier device, and other machine learning algorithm is such as Boosting, neural network etc., all can be used for learning character sorter.The database adopting when training is individual character database, can select as required English data number (for english identification) or Chinese database (Chinese sentence identification) to train.

Wherein, above, the leaching process of the described feature extraction algorithm based on sparse coding is as follows:

Step S201: by a large amount of natural scene picture data, obtain a sparse coding dictionary with general applicability with K-SVD Algorithm Learning.

Wherein, described K-SVD algorithm is when study dictionary (representing with D), each element of dictionary D is designed to 9 × 9 picture, represent the total architectural feature that study obtains, dictionary D comprises 100 elements (size that is dictionary is 100) altogether, this makes dictionary have higher expression ability, makes calculated amount be controlled at acceptable scope simultaneously.As shown in Figure 2, be the dictionary example obtaining with K-SVD Algorithm Learning, wherein dictionary comprises 100 elements, each element can make the image of 5 × 5,7 × 7 or 9 × 9 sizes, the pixel of element is more, and the structural information that can express is abundanter, but corresponding calculated amount is also larger.In the specific embodiment of the present invention, selecting image size is 9 × 9.

Step S202: the sparse coding dictionary that study is obtained saves, wherein, what in dictionary, each element was described is some important structural informations.

Step S203: utilize the dictionary obtaining in step S202, extract the sparse coding feature of image, while extracting feature, to each pixel of image, decode and obtain the sparse coding of pixel by Orthogonal Matching Pursuit (OMP) algorithm, then the sparse coding obtaining is added up to the histogram (being Histogram of Sparse Codes, HSC) that obtains sparse coding, thereby obtain the sparse coding feature of image, i.e. HSC feature.

Wherein, the process that sparse coding statistics is obtained to HSC feature is as follows: when sparse coding is added up to the histogram that obtains sparse coding, having adopted and being similar to histogram of gradients feature (is HOG feature, Histogram of Oriented Gradients) method, concrete steps comprise:

As shown in Figure 3, it is the visualization result contrast of the feature to several character samples and the extraction of non-character sample with HSC and HOG feature extraction algorithm, the abundanter structural information of can having found out HSC character representation, such as information such as texture, edge, angle points, the information of HOG character representation is taking edge as main, and the structural information that does not have HSC to provide is abundant.

Step S3, the character detecting is merged into a word (or being line of text), owing to each character class having been retained to a large amount of candidate characters regions, when being merged into word, character has a large amount of array modes, therefore this step adopts dynamic programming algorithm, the character combination mode that obtains identifying mark maximum according to described objective function search below, obtains final text identification result.Objective function is:

O = Σ_{i = 1}^{n} S (c_{i}, x_{i}) + α Σ_{i = 1}^{n - 1} g (x_{i}, x_{i + 1}) + βn,

Wherein, S (c _i, x _i) be that character classification device is by candidate characters x _ibe identified as c _iscore, g (x _i, x _i+1) be candidate characters x _iand x _i+1compatibility in geometric relationship, α and β are two and regulate parameter.Regulate parameter alpha and β to be obtained by minimum classification error rate training method (being MCE training method) study, the database using is text database.The optimum character combination mode that the search of dynamic programming searching algorithm obtains, is last text identification result.

Step S4: output scene text identification result.

As shown in Figure 4, be to implement identifying and the result example that the present invention obtains.

Essential characteristic of the present invention is that character classification device has used the feature extracting method based on sparse coding, can describe better the structural information of character, effectively improves the accuracy that character identification rate and character detect, thereby improves scene text discrimination.The present invention has overcome HOG feature and can not better describe the problem of character feature, in conjunction with the feature of scene text detection and Identification, designs the scene text recognition methods based on sparse coding feature.

Claims

1. the scene text recognition methods based on sparse coding feature, is characterized in that comprising the steps:

Step S1: input natural scene text image to be identified;

Step S3: the character detecting is merged into a word or line of text;

Step S4: output scene text identification result.

2. a kind of scene text recognition methods based on sparse coding feature as claimed in claim 1, it is characterized in that in step S2, the feature extraction of described character classification device is the feature adopting based on sparse coding, and sorter training adopts training comparatively simply and recognition speed Random Fern sorter or svm classifier device faster.

3. a kind of scene text recognition methods based on sparse coding feature as claimed in claim 2, is characterized in that the characteristic extraction procedure of described sparse coding comprises the steps:

Step S203: utilize the dictionary of preserving in step S202, extract the sparse coding feature of image.

4. a kind of scene text recognition methods based on sparse coding feature as claimed in claim 3, it is characterized in that in step S201, described K-SVD algorithm is in the time of study dictionary, each element of dictionary is designed to 9 × 9 picture, represented the total architectural feature that study obtains, dictionary comprises 100 elements altogether, and the size of dictionary is 100, this makes dictionary have higher expression ability, makes calculated amount be controlled at acceptable scope simultaneously.

5. a kind of scene text recognition methods based on sparse coding feature as claimed in claim 3, it is characterized in that in step S203, the concrete grammar of the sparse coding feature of described extraction image is: to each pixel of image, decode and obtain the sparse coding of pixel by Orthogonal Matching Pursuit algorithm, again the sparse coding obtaining is added up to the histogram that obtains sparse coding, thereby obtain the sparse coding feature of image, i.e. HSC feature.

6. a kind of scene text recognition methods based on sparse coding feature as claimed in claim 5, it is characterized in that the described histogram that obtains sparse coding that the sparse coding obtaining is added up, thereby the method that obtains the sparse coding feature of image is: when sparse coding is added up to the histogram that obtains sparse coding, adopted the method that is similar to histogram of gradients feature, concrete steps comprise:

7. a kind of scene text recognition methods based on sparse coding feature as claimed in claim 1, it is characterized in that in step S3, described the character detecting is merged into a word or line of text, owing to each character class having been retained to a large amount of candidate characters regions, when being merged into word, character has a large amount of array modes, therefore adopt dynamic programming algorithm search to obtain identifying the character combination mode of mark maximum, obtain final text identification result.

8. a kind of scene text recognition methods based on sparse coding feature as claimed in claim 7, it is characterized in that described employing dynamic programming algorithm search obtains identifying the character combination mode of mark maximum, needs an objective function to evaluate the score of each combination; The design of described objective function adopts following methods:

O = Σ_{i = 1}^{n} S (c_{i}, x_{i}) + α Σ_{i = 1}^{n - 1} g (x_{i}, x_{i + 1}) + βn,

9. a kind of scene text recognition methods based on sparse coding feature as claimed in claim 8, is characterized in that in described objective function geometric model g (x _i, x _i+1) what describe is whether two geometric properties between adjacent character are intercharacter features, two class classification problems, geometric properties is carried out to modeling with a svm classifier device, the geometric properties extracting when modeling comprises the Duplication of yardstick similarity, adjacent character, the distance of up-and-down boundary.

10. a kind of scene text recognition methods based on sparse coding feature as claimed in claim 8, it is characterized in that in described objective function, consider the impact of text size, therefore can overcome the impact of character length on recognition result, thereby improved text identification rate; Regulate parameter alpha and β can adopt minimum classification error rate training method to obtain at scene text database learning.