CN106709494A

CN106709494A - Coupled spatial learning-based scene character recognition method

Info

Publication number: CN106709494A
Application number: CN201710014236.4A
Authority: CN
Inventors: 张重; 王红; 刘爽
Original assignee: Tianjin Normal University
Current assignee: Zhongfang Information Technology Tianjin Co ltd
Priority date: 2017-01-10
Filing date: 2017-01-10
Publication date: 2017-05-24
Anticipated expiration: 2037-01-10
Also published as: CN106709494B

Abstract

The embodiments of the invention disclose a coupled spatial learning-based scene character recognition method. The method includes the following steps that: inputted scene character images are preprocessed, so that trained scene character images are obtained; recognition feature extraction is performed on the trained scene character images, so that a spatial dictionary can be obtained; the spatial dictionary is utilized to perform spatial coding on the recognition features of corresponding images, so that corresponding spatial coding vectors can be obtained; maximization extraction is performed on the spatial coding vectors, so that feature vectors can be obtained; a linear support vector machine is utilized to perform training based on the feature vectors, so that a scene character recognition classification model is obtained; and the feature vectors of a test scene character image is obtained and is inputted into the scene character recognition classification model, so that a scene character recognition result can be obtained. According to the coupled spatial learning-based scene character recognition method of the invention, the spatial dictionary is crated and is utilized to perform spatial coding, and therefore, the textual information of a space can be effectively integrated into the feature vectors, so that spatial information can be effectively mined, and therefore, the correct rate of scene character recognition can be improved.

Description

A kind of scene character recognition method based on coupled room study

Technical field

The invention belongs to mode identification technology, and in particular to a kind of scene Text region based on coupled room study Method.

Background technology

Scene Text region plays important role in area of pattern recognition, it can be applied directly to image retrieval, The fields such as intelligent transportation, man-machine interaction.In actual applications, scene Text region is a research direction for very challenging property, Because scene word can be influenceed by extraneous factors such as uneven illumination, distortion, complicated backgrounds.

Recent decades scene Text region is widely studied, and the method for some early stages is carried out using OCR Scene Text region.But, OCR has significant limitation, such as scene character image binarization operation. In recent years, the method for a large amount of scene Text regions is suggested, and achieves larger progress.Wherein, most representative work It is the scene character recognition method based on target identification.Method based on target identification has skipped scene character image binaryzation Process and each scene word is regarded as a special target, and area of pattern recognition achieve it is certain into Work(.Such as：Newell et al. is entered using multiple dimensioned HOG (Histogram of Oriented Gradients, histogram of gradients) Row character representation.Zhang et al. extracts sparse coding histogram (histograms of sparse codes, HSC) feature and enters Row character representation.Shi et al. considers local feature information and global structure information.Although these methods achieve certain Effect, but they largely ignore spatial context information.Because different literals may be wrapped in different positions Characteristic information containing identical, this can cause reconstructed error, therefore in order to solve this problem, Gao et al. in the character representation stage Stroke storehouse is proposed to consider spatial context information.The method that Shi et al. is proposed is the extension of Gao et al. methods, and they use The multiple dimensioned stroke storehouse of identification represents feature.Tian et al. proposes to consider the symbiosis between HOG features to add sky Between contextual information.In addition, Gao et al. also proposes to be embedded in dictionary based on position to consider spatial context information.Although more than Method achieves larger success, but only from the aspect of single spatial context information, i.e. dictionary study stage or Coding stage, so effective spatial context information can not sufficiently be retained.

The content of the invention

Larger technology is influenceed to ask scene Text region result the invention aims to solve spatial context information Topic, therefore, the present invention provides a kind of scene character recognition method based on coupled room study.

In order to realize the purpose, scene character recognition method of the present invention based on coupled room study includes following step Suddenly：

Step S1, pretreatment operation is carried out to N width input scene character images respectively, obtains N width Training scene word graphs Picture；

Step S2, feature extraction is identified for N width Training scene character images respectively, obtains N number of space dictionary；

Step S3, space volume is carried out using the space dictionary of every width Training scene character image to the identification feature of the image Code, obtains corresponding space encoding vector；

Step S4, the space encoding vector for every width Training scene character image carries out maximization extraction, obtains described The corresponding characteristic vector of Training scene character image；

Step S5, based on the characteristic vector of the Training scene character image, is trained using linear SVM, Obtain scene Text region disaggregated model；

Step S6, the characteristic vector of test scene character image, input to the scene are obtained according to the step S1-S4 Text region disaggregated model obtains scene Text region result.

Alternatively, the step S1 is comprised the following steps：

Step S11, gray scale scene character image is converted to by the input scene character image；

Step S12, is H × W by the size normalization of the gray scale scene character image, the gray scale scene text after normalization Word image as the Training scene character image, wherein, H and W represents the height and width of gray scale scene character image respectively.

Alternatively, the step S2 is comprised the following steps：

Step S21, in the P of every width Training scene character image_i(i=1,2 ..., m) position punishment be you can well imagine and take an identification Feature, wherein, m is that the identification feature of every width Training scene character image extracts number of positions；

Step S22, for N width Training scene character images, to from P_iAll identification features that extraction is obtained at position are entered Row cluster, obtains sub- dictionary C_i(i=1,2 ..., m), and by the sub- dictionary C_iPosition be designated as P_i；

Step S23, the m sub- dictionary series connection that will carry positional information obtains space dictionary.

Alternatively, the identification feature is HOG features.

Alternatively, in the step S22, identification feature is clustered using k-means clustering algorithms.

Alternatively, the space dictionary is expressed as：

D={ C, P }={ (C₁,P₁),(C₂,P₂),...,(C_m,P_m),

Wherein, D representation spaces dictionary, C=(C₁,C₂,…,C_m) it is the m set of sub- dictionary, P=(P₁,P₂,…,P_m) table Show the positional information set of sub- dictionary set C.

Alternatively, in the step S3, the identification by the object function shown in following formula to Training scene character image is special Levy and be spatially encoded：

Wherein, | | | |²Represent l₂Norm, ⊙ represents the point multiplication operation of corresponding element in two matrixes, f_jRepresent that identification is special Levy, a_jRepresent f_jCorresponding space encoding vector, A=[a₁,a₂,…,a_j...] the vectorial set of all space encodings is represented, | | f_j-Ca_j||²Represent that utilization space dictionary is reconstructed the error of generation to identification feature；||d_jF⊙a_j||²It is local regular terms, The distance between code word in expression feature space in identification feature and sub- dictionary restriction relation；||d_jE⊙a_j||²For space just Then item, represents the position relationship between the code word in binding characteristic and sub- dictionary in theorem in Euclid space；α and β is regularization parameter,Representation space coding vector a_jIn all elements sum be equal to 1；d_jFRepresent feature space in identification feature and The distance between code word in sub- dictionary, d_jERepresent identification feature f in theorem in Euclid space_jCode word pair in corresponding position and sub- dictionary The distance between position answered.

Alternatively, the distance between code word in feature space in identification feature and sub- dictionary d_jFIt is expressed as：

Wherein, σ_FIt it is one for adjusting d_jFThe parameter of weight decrease speed, dist (f_j, C) it is defined as：

dist(f_j, C) and=[dist (f_j,C₁),dist(f_j,C₂),...,dist(f_j,C_m)]^T

Wherein, dist (f_j,C_i) (i=1,2 ..., m) represent feature f_jWith sub- dictionary C_iIn it is European between all code words Distance.

Alternatively, identification feature f in theorem in Euclid space_jIn corresponding position and sub- dictionary between the corresponding position of code word away from From d_jEIt is expressed as：

Wherein, σ_EIt it is one for adjusting d_jEThe parameter of weight decrease speed, dist (l_j, P) it is defined as：

dist(l_j, P) and=[dist (l_j,P₁),…,dist(l_j,P₁),dist(l_j,P₂),…,dist(l_j,P₂),…, dist(l_j,P_m),…,dist(l_j,P_m)]^TWherein, dist (l_j,P_i) (i=1,2 ..., m) represent identification feature f_jPosition l_j With sub- dictionary C_iPosition P_iBetween Euclidean distance.

Alternatively, in the step S4, entered for the space encoding vector of every width Training scene character image using following formula Row is maximized and extracted：

A=max { a₁,a₂,...,a_j,...,a_m,

Wherein, a represents the characteristic vector of Training scene character image, a_j(j=1,2 ..., m) representation space encode to Amount.

Beneficial effects of the present invention are：The present invention is by creating space dictionary and carrying out space using the space dictionary for creating , can be effectively combined spatial context information in characteristic vector by coding, reach the purpose of effective excavated space information, from And improve the accuracy of scene Text region.

It should be noted that the present invention obtained project of national nature science fund project No.61401309, No.61501327, Tianjin application foundation and cutting edge technology research plan youth fund project No.15JCQNJC01700, Tianjin Normal University doctor The subsidy of fund project No.5RL134, No.52XB1405.

Brief description of the drawings

Fig. 1 is the flow of the scene character recognition method learnt based on coupled room proposed according to one embodiment of the invention Figure.

Specific embodiment

To make the object, technical solutions and advantages of the present invention of greater clarity, with reference to specific embodiment and join According to accompanying drawing, the present invention is described in more detail.It should be understood that these descriptions are merely illustrative, and it is not intended to limit this hair Bright scope.Additionally, in the following description, the description to known features and technology is eliminated, to avoid unnecessarily obscuring this The concept of invention.

Fig. 1 is the flow of the scene character recognition method learnt based on coupled room proposed according to one embodiment of the invention Figure, illustrates more of the invention to implement flow below by taking Fig. 1 as an example.The method of the present invention is a kind of based on coupled room The scene character recognition method of study, its specific steps include：

Wherein, the pretreatment operation is comprised the following steps：

Further, the step S2 is comprised the following steps：

Step S21, in the P of every width Training scene character image_i(i=1,2 ..., m) position punishment be you can well imagine and take an identification Feature, wherein, m is the feature extraction number of positions of every width Training scene character image, so every width Training scene character image M identification feature can be obtained；

Wherein, the identification feature can be HOG features, or other identification feature.

Step S22, for N width Training scene character images, to from P_iAll identification features that extraction is obtained at position are entered Row cluster, obtains sub- dictionary C_i(i=1,2 ..., m), and by the sub- dictionary C_iPosition be designated as P_i, so, for m feature Extract position and obtain m sub- dictionary；

Wherein, cluster operation is carried out using clustering algorithms such as k-means.

Wherein, the space dictionary can be expressed as：

D={ C, P }={ (C₁,P₁),(C₂,P₂),...,(C_m,P_m),

Wherein, D representation spaces dictionary, C=(C₁,C₂,…,C_m) it is the m set of sub- dictionary, corresponding P=(P₁, P₂,…,P_m) represent the positional information set of sub- dictionary set C.

Step S3, sky is carried out using the space dictionary of every width Training scene character image to m identification feature of the image Between encode, obtain corresponding m space encoding vectorial；

In the step S3, by following object function using the space dictionary to every width Training scene character image M identification feature is spatially encoded：

Wherein, | | | |²Represent l₂Norm, ⊙ represents the point multiplication operation of corresponding element in two matrixes, f_jRepresent that identification is special Levy, a_jRepresent f_jCorresponding space encoding vector, corresponding A=[a₁,a₂,…,a_j...] and represent the vectorial collection of all space encodings Close, | | f_j-Ca_j||²Represent that utilization space dictionary is reconstructed the error of generation to identification feature；||d_jF⊙a_j||²For part just Then item, represents the distance between code word in feature space in identification feature and sub- dictionary restriction relation；||d_jE⊙a_j||²It is sky Between regular terms, represent the position relationship between code word in binding characteristic and sub- dictionary in theorem in Euclid space；α and β is regularization Parameter,Representation space coding vector a_jIn all elements sum be equal to 1；d_jFRepresent identification in feature space The distance between code word in feature and sub- dictionary, embodies form as follows：

Wherein, σ_FIt it is one for adjusting d_jFThe parameter of weight decrease speed, dist (f_j, C) it is defined as follows：

dist(f_j, C) and=[dist (f_j,C₁),dist(f_j,C₂),...,dist(f_j,C_m)]^T

d_jERepresent identification feature f in theorem in Euclid space_jCorresponding position l_jThe distance between with P, embody form as follows It is shown：

Wherein, σ_EIt it is one for adjusting d_jEThe parameter of weight decrease speed.dist(l_j, P) it is defined as follows：dist(l_j, P)=[dist (l_j,P₁),…,dist(l_j,P₁),dist(l_j,P₂),…,dist(l_j,P₂),…,dist(l_j,P_m),…, dist(l_j,P_m)]^TWherein, dist (l_j,P_i) (i=1,2 ..., m) represent identification feature f_jPosition l_jWith sub- dictionary C_iPosition P_i Between Euclidean distance.

Above-mentioned object function is chosen one group of code word using local regular terms in feature space and weight is carried out to identification feature Structure, at the same in theorem in Euclid space utilization space regular terms come constrain the position between the code word in identification feature and sub- dictionary close System.

To above-mentioned object function derivation, an analytic solutions can be obtained, it is as follows：

Wherein, A_j=(C^T-1f_j ^T)(C^T-1f_j ^T)^TCovariance matrix is represented, using formulaCan be to solving 'sIt is normalized operation.

The optimization process direct solution that can avoid complexity by above-mentioned analytic solutions goes out the corresponding space encoding of identification feature Vector.

In the step S4, carried out most for the space encoding vector of every width Training scene character image using equation below Bigization is extracted：

A=max { a₁,a₂,...,a_j,...,a_m,

Wherein, a_j(j=1,2 ..., m) representation space coding vector, a represent the feature of Training scene character image to Amount.

By above-mentioned formula to m space encoding of width Training scene character image vector it is every one-dimensional take maximum come Obtain the characteristic vector a of the Training scene character image.

Using online disclosed scene text image data storehouse as test object, such as on ICDAR2003 databases, when H × W=64 × 32, when position m is 128, the accuracy of scene Text region is 83.2%, it can be seen that the inventive method has Effect property.

It should be appreciated that above-mentioned specific embodiment of the invention is used only for exemplary illustration or explains of the invention Principle, without being construed as limiting the invention.Therefore, that is done without departing from the spirit and scope of the present invention is any Modification, equivalent, improvement etc., should be included within the scope of the present invention.Additionally, appended claims purport of the present invention In the whole changes covered in the equivalents for falling into scope and border or this scope and border and repair Change example.

Claims

1. it is a kind of based on coupled room study scene character recognition method, it is characterised in that the method is comprised the following steps：

Step S1, pretreatment operation is carried out to N width input scene character images respectively, obtains N width Training scene character images；

Step S3, is spatially encoded using the space dictionary of every width Training scene character image to the identification feature of the image, Obtain corresponding space encoding vector；

Step S4, the space encoding vector for every width Training scene character image carries out maximization extraction, obtains the training The corresponding characteristic vector of scene character image；

Step S5, based on the characteristic vector of the Training scene character image, is trained using linear SVM, is obtained Scene Text region disaggregated model；

Step S6, the characteristic vector of test scene character image, input to the scene word are obtained according to the step S1-S4 Identification disaggregated model obtains scene Text region result.

2. method according to claim 1, it is characterised in that the step S1 is comprised the following steps：

Step S12, is H × W, the gray scale scene word graph after normalization by the size normalization of the gray scale scene character image As the Training scene character image, wherein, H and W represents the height and width of gray scale scene character image respectively.

3. method according to claim 1, it is characterised in that the step S2 is comprised the following steps：

Step S21, in the P of every width Training scene character image_i(i=1,2 ..., m) position punishment be you can well imagine and take an identification feature, Wherein, m is the identification feature extraction number of positions of every width Training scene character image；

Step S22, for N width Training scene character images, to from P_iAll identification features that extraction is obtained at position are gathered Class, obtains sub- dictionary C_i(i=1,2 ..., m), and by the sub- dictionary C_iPosition be designated as P_i；

4. method according to claim 3, it is characterised in that the identification feature is HOG features.

5. method according to claim 3, it is characterised in that in the step S22, using k-means clustering algorithms pair Identification feature is clustered.

6. method according to claim 3, it is characterised in that the space dictionary is expressed as：

D={ C, P }={ (C₁,P₁),(C₂,P₂),...,(C_m,P_m),

Wherein, D representation spaces dictionary, C=(C₁,C₂,…,C_m) it is the m set of sub- dictionary, P=(P₁,P₂,…,P_m) represent son The positional information set of dictionary set C.

7. method according to claim 1, it is characterised in that in the step S3, by the object function shown in following formula Identification feature to Training scene character image is spatially encoded：

Wherein, | | | |²Represent l₂Norm, ⊙ represents the point multiplication operation of corresponding element in two matrixes, f_jRepresent identification feature, a_j Represent f_jCorresponding space encoding vector, A=[a₁,a₂,…,a_j...] the vectorial set of all space encodings is represented, | | f_j-Ca_j ||²Represent that utilization space dictionary is reconstructed the error of generation to identification feature；||d_jF⊙a_j||²It is local regular terms, represents special Levy the distance between code word in space in identification feature and sub- dictionary restriction relation；||d_jE⊙a_j||²It is space regular terms, table Show the position relationship between the code word in theorem in Euclid space in binding characteristic and sub- dictionary；α and β is regularization parameter,Representation space coding vector a_jIn all elements sum be equal to 1；d_jFRepresent feature space in identification feature and The distance between code word in sub- dictionary, d_jERepresent identification feature f in theorem in Euclid space_jCode word pair in corresponding position and sub- dictionary The distance between position answered.

8. method according to claim 7, it is characterised in that code word in feature space in identification feature and sub- dictionary it Between apart from d_jFIt is expressed as：

d_{j F} = \exp (\frac{d i s t (f_{j}, C)}{σ_{F}}),

dist(f_j, C) and=[dist (f_j,C₁),dist(f_j,C₂),...,dist(f_j,C_m)]^T

Wherein, dist (f_j,C_i) (i=1,2 ..., m) represent feature f_jWith sub- dictionary C_iIn Euclidean distance between all code words.

9. method according to claim 7, it is characterised in that identification feature f in theorem in Euclid space_jCorresponding position and sub- word The distance between corresponding position of code word d in allusion quotation_jEIt is expressed as：

d_{j E} = \exp (\frac{d i s t (l_{j}, P)}{σ_{E}}),

dist(l_j, P) and=[dist (l_j,P₁),…,dist(l_j,P₁),dist(l_j,P₂),…,dist(l_j,P₂),…,dist (l_j,P_m),…,dist(l_j,P_m)]^T

Wherein, dist (l_j,P_i) (i=1,2 ..., m) represent identification feature f_jPosition l_jWith sub- dictionary C_iPosition P_iBetween Europe Formula distance.

10. method according to claim 1, it is characterised in that in the step S4, using following formula for every width training place The space encoding vector of scape character image carries out maximization extraction：

A=max { a₁,a₂,...,a_j,...,a_m,

Wherein, a represents the characteristic vector of Training scene character image, a_j(j=1,2 ..., m) representation space coding vector.