CN103310237A - Handwritten digit recognition method and system - Google Patents
Handwritten digit recognition method and system Download PDFInfo
- Publication number
- CN103310237A CN103310237A CN2013102864494A CN201310286449A CN103310237A CN 103310237 A CN103310237 A CN 103310237A CN 2013102864494 A CN2013102864494 A CN 2013102864494A CN 201310286449 A CN201310286449 A CN 201310286449A CN 103310237 A CN103310237 A CN 103310237A
- Authority
- CN
- China
- Prior art keywords
- vector data
- vector
- data
- subset
- neighbour
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Abstract
An embodiment of the invention discloses a handwritten digit recognition method and system. During dimensionality reduction of handwritten digits, all image data are represented linearly through K neighbors, weighting coefficients of all image data when the image data are represented linearly through the K neighbors are obtained through an orthogonal matching algorithm, besides, weighting coefficient matrixes are established to perform dimensionality reduction on training image data, and dimensionality reduction is performed on images to be recognized through weighting coefficient vectors and vector data of the K neighbors of the images after dimensionality reduction. According to experiments, by the aid of the handwritten digit recognition method, the recognition rate of handwritten digit recognition is improved.
Description
Technical field
The present invention relates to area of pattern recognition, more particularly, relate to a kind of Handwritten Numeral Recognition Method and system.
Background technology
Along with the develop rapidly of computer technology and digital image processing techniques, Handwritten Digital Recognition is in ecommerce, and the occasions such as the automatic input of machine obtain practical application.
Yet handwriting digital is high dimensional data, if directly to its identification, not only the time long, and computation complexity is large, thereby, often handwriting digital is carried out identifying behind the dimensionality reduction, based on this, the people such as saul have proposed a kind of digit recognition method that embeds based on linearizing local linear again.But, can use least square method in the method finds the solution the local linear of data and represents coefficient, relate to and ask inverse of a matrix, if matrix is singular matrix, and there is not inverse matrix in singular matrix, so, will be without solution, so that the discrimination of Handwritten Digital Recognition is lower when the local linear of finding the solution data with least square method represents coefficient.
Therefore, the discrimination that how to improve Handwritten Digital Recognition becomes problem demanding prompt solution.
Summary of the invention
The purpose of this invention is to provide a kind of Handwritten Numeral Recognition Method and system, to improve the discrimination of Handwritten Digital Recognition.
For achieving the above object, the invention provides following technical scheme:
A kind of Handwritten Numeral Recognition Method is characterized in that, comprising:
Obtain sets of image data, described sets of image data comprises training image data subset and view data subset to be identified;
Each view data in the described sets of image data is stretched, obtain the vector data set, described vector data set comprises the primary vector data subset corresponding with described training image data subset, and the secondary vector data subset corresponding with described view data subset to be identified;
According to the distance between each vector data in i vector data and the described primary vector data subset in the vector data set, obtain K neighbour's vector data of described i vector data;
With described K described i the vector data of neighbour's vector data linear expression:
Wherein, X
iBe described i vector data;
Be j vector data in described K neighbour's vector data;
For with described K neighbour's vector data in j vector data
Corresponding weighting coefficient,
Obtain by orthogonal matching pursuit algorithm;
When described i vector data belongs to described primary vector data subset, obtain first weight vector corresponding with described i vector data
M element
M vector data in the corresponding described primary vector data subset, wherein, when m vector data in the described primary vector data subset is j vector data in K neighbour's vector data of described i vector data,
When described m vector data do not belong to K neighbour's vector data of described i vector data,
, comprising described primary vector data subset dimensionality reduction according to described the first weight vector:
Obtain weighting coefficient matrix
Wherein,
(m=1,2 ..., M) be with described primary vector subset in m the weight vector that vector data is corresponding;
Structural matrix
Wherein I is unit matrix;
To matrix M
TrainCarry out feature decomposition, obtain eigenwert, wherein, q eigenwert is λ
q, q eigenwert characteristic of correspondence vector is v
q, v
qColumn vector for M * 1 dimension;
The size of eigenwert according to value sorted, obtain the 2nd to d+1 the corresponding proper vector of eigenwert according to order from small to large, and with the proper vector composition of vector data matrix Y that obtains
Train=[v
2, v
3..., v
D+1], wherein, d is the dimension behind the dimensionality reduction of presetting, m vector data X in the described primary vector data subset
mVector data behind the corresponding dimensionality reduction is x
mDescribed vector data matrix Y
TrainThe capable vector of m;
When described i vector data belongs to described secondary vector data subset, obtain second weight vector corresponding with described i vector data
J element
J vector data in K neighbour's vector data of corresponding described i vector data,
, comprising described secondary vector data subset dimensionality reduction according to described the second weight vector:
According to second weight vector corresponding with n vector data in the described secondary vector data subset
And the vector data collection behind the dimensionality reduction corresponding with K neighbour's vector data of n vector data in the described secondary vector data subset
Obtain n vector data X in the described secondary vector data subset
nVector data x behind the corresponding dimensionality reduction
n, be specially:
According to the distance between n vector data in any vector data in the primary vector data subset behind the dimensionality reduction and the secondary vector data subset behind the dimensionality reduction, obtain K neighbour's vector data of n vector data in the secondary vector data subset behind the dimensionality reduction;
According to the numeric type of view data corresponding to K neighbour's vector data of described n vector data definite with described dimensionality reduction after the secondary vector data subset in the numeric type of view data to be identified corresponding to n vector data.
Said method, preferred, described according to the distance between each vector data in i vector data and the described primary vector data subset in the vector data set, K the neighbour's vector data that obtains described i vector data comprises:
According to the Euclidean distance between each vector data in i vector data and the described primary vector data subset in the vector data set, obtain K neighbour's vector data of described i vector data.
Said method, preferred, describedly obtain by orthogonal matching pursuit algorithm
Comprise at least one times interative computation:
In interative computation each time, in K neighbour's vector data of described i vector data, select a vector data of not determining weights
So that
Minimum, thus determine
Wherein,
Be in the j time interative computation with selected vector data
Corresponding weights; F=1,2 ..., u, u is default iterations.
Said method, preferred, described each view data in the described sets of image data is stretched comprises:
Each view data in the described sets of image data is stretched or stretches by row by row.
Said method, preferred, the dimension behind the described default dimensionality reduction is 2 or 3.
Said method, preferred, also comprise:
Each vector data in the secondary vector data subset after in d dimension coordinate system, showing described dimensionality reduction.
A kind of Handwritten Numeral Recognition System is characterized in that, comprising:
Image data processing module is used for obtaining sets of image data, and described sets of image data comprises training image data subset and view data subset to be identified; Each view data in the described sets of image data is stretched, obtain the vector data set, described vector data set comprises the primary vector data subset corresponding with described training image data subset, and the secondary vector data subset corresponding with described view data subset to be identified; According to the distance between each vector data in i vector data and the described primary vector data subset in the vector data set, obtain K neighbour's vector data of described i vector data; With described K described i the vector data of neighbour's vector data linear expression:
Wherein, X
iBe described i vector data;
Be j vector data in described K neighbour's vector data;
For with described K neighbour's vector data in j vector data
Corresponding weighting coefficient,
Obtain by orthogonal matching pursuit algorithm;
The first dimension-reduction treatment module is used for obtaining first weight vector corresponding with described i vector data
M element
M vector data in the corresponding described primary vector data subset, wherein, when m vector data in the described primary vector data subset is j vector data in K neighbour's vector data of described i vector data,
When described m vector data do not belong to K neighbour's vector data of described i vector data,
, comprising: obtain weighting coefficient matrix described primary vector data subset dimensionality reduction according to described the first weight vector
Wherein,
(m=1,2 ..., M) be with described primary vector subset in m the weight vector that vector data is corresponding; Structural matrix
Wherein I is unit matrix; To matrix M
TrainCarry out feature decomposition, obtain eigenwert, wherein, q eigenwert is λ
q, q eigenwert characteristic of correspondence vector is v
q, v
qColumn vector for M * 1 dimension; The size of eigenwert according to value sorted, obtain the 2nd to d+1 the corresponding proper vector of eigenwert according to order from small to large, and with the proper vector composition of vector data matrix Y that obtains
Train=[v
2, v
3..., v
D+1], wherein, d is the dimension behind the dimensionality reduction of presetting, m vector data X in the described primary vector data subset
mVector data behind the corresponding dimensionality reduction is x
mDescribed vector data matrix Y
TrainThe capable vector of m;
The second dimension-reduction treatment module is used for obtaining second weight vector corresponding with described i vector data
J element
J vector data in K neighbour's vector data of corresponding described i vector data,
, comprising described secondary vector data subset dimensionality reduction according to described the second weight vector: according to second weight vector corresponding with n vector data in the described secondary vector data subset
And the vector data collection behind the dimensionality reduction corresponding with K neighbour's vector data of n vector data in the described secondary vector data subset
Obtain n vector data X in the described secondary vector data subset
nVector data x behind the corresponding dimensionality reduction
n, be specially:
Identification module, be used for according to the distance between n vector data in any vector data of the primary vector data subset behind the dimensionality reduction and the secondary vector data subset behind the dimensionality reduction, obtain K neighbour's vector data of n vector data in the secondary vector data subset behind the dimensionality reduction; According to the numeric type of view data corresponding to K neighbour's vector data of described n vector data definite with described dimensionality reduction after the secondary vector data subset in the numeric type of view data to be identified corresponding to n vector data.
By above scheme as can be known, the Handwritten Numeral Recognition Method that the application provides, handwriting digital is being carried out in the process of dimensionality reduction, come linear expression for each view data by K neighbour, weighting coefficient to each view data during by K neighbour's linear expression then quadrature matching algorithm obtains, and, come the training image data are carried out dimensionality reduction by the structure weighting coefficient matrix, treat recognition image and then carry out dimensionality reduction by the vector data behind weight vector and K neighbour's thereof the dimensionality reduction, by experiment as can be known, the Handwritten Numeral Recognition Method that the embodiment of the present application provides has improved the discrimination of Handwritten Digital Recognition.
Description of drawings
In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, the below will do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art, apparently, accompanying drawing in the following describes only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain according to these accompanying drawings other accompanying drawing.
The process flow diagram of a kind of Handwritten Numeral Recognition Method that Fig. 1 provides for the embodiment of the present application;
Foundation the first weight vector that Fig. 2 provides for the embodiment of the present application is to the process flow diagram of the method for primary vector data subset dimensionality reduction;
The structural representation of a kind of Handwritten Numeral Recognition System that Fig. 3 provides for the embodiment of the present application.
Term " first " in instructions and claims and the above-mentioned accompanying drawing, " second ", " the 3rd " " 4th " etc. (if exist) are be used to distinguishing similar part, and needn't be used for describing specific order or precedence.The data that should be appreciated that such use suitably can exchanged in the situation, so that the application's described herein embodiment can be with except the order enforcement here illustrated.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that obtains under the creative work prerequisite.
See also Fig. 1, the process flow diagram of a kind of Handwritten Numeral Recognition Method that Fig. 1 provides for the embodiment of the present application comprises:
Step S101: obtain sets of image data, described sets of image data comprises training image data subset and view data subset to be identified;
In the embodiment of the present application, comprise two class view data in the sets of image data, be respectively training image data and view data to be identified, wherein, the training image data are known digital types, and namely each training image data representation is which numeral is known.
Step S102: each view data in the described sets of image data is stretched, obtain the vector data set, described vector set comprises the primary vector data subset corresponding with described training image data subset, and the secondary vector data subset corresponding with described view data subset to be identified;
Suppose that raw image data is that a * b ties up, so, after the stretching, the vector data of acquisition is ab * 1 dimension.
View data is stretched be, can stretch by row, also can stretch by row, need to prove, stretch to all view data according to the same stretching mode.
The below illustrates and how view data is stretched, because a view data is a two-dimensional matrix, supposes that a view data is
So, this view data is specially by the row stretching, from the first row of two-dimensional matrix, each row is in turn connected into a vector, obtain vector data: [1 1122233 3]
T
This view data is specially by the row stretching, from the first row of two-dimensional matrix, each leu is connected into a vector, obtain vector data: [1 2312312 3]
T
Step S103: according to the distance between each vector data in i vector data and the described primary vector data subset in the vector data set, obtain K neighbour's vector data of described i vector data; That is to say, no matter described i vector data belongs to the primary vector data subset or belong to the secondary vector data subset, the K of i vector data neighbour's vector data all sought from described primary vector data subset.
Described distance can be absolute distance, that is,
Wherein, the distance between s (x, y) expression vector data x and the vector data y, x
kK the element of expression vector data x, y
kK the element of expression vector data y;
Preferably, in the embodiment of the present application, described distance is Euclidean distance, that is,
Wherein, the distance between s (x, y) expression vector data x and the vector data y, x
kK the element of expression vector data x, y
kK the element of expression vector data y.
The K of described i vector data neighbour's vector data can for, in the described primary vector data subset with the distance of described i vector data all vector datas less than preset value;
The K of described i vector data neighbour's vector data also can for, in the described primary vector data subset with minimum K the vector data of the distance of described i vector data.
Described K is positive integer, and the value of K can rule of thumb be worth to come definite, also can determine by emulation, and preferred, the value of described K can be carried out value between 3~20.
Step S104: with described K described i the vector data of neighbour's vector data linear expression:
Wherein, X
iBe described i vector data;
Be j vector data in described K neighbour's vector data;
For with described K neighbour's vector data in j vector data
Corresponding weighting coefficient,
Obtain by orthogonal matching pursuit algorithm;
Step S105: when described i vector data belongs to described primary vector data subset, obtain first weight vector corresponding with described i vector data
Be the vector of M * 1 dimension,
M element
M vector data in the corresponding described primary vector data subset, wherein, when m vector data in the described primary vector data subset is j vector data in K neighbour's vector data of described i vector data,
When described m vector data do not belong to K neighbour's vector data of described i vector data,
In the embodiment of the present application, when m vector data in the primary vector data subset is certain vector in K neighbour's vector data of described i vector data, m the weighting coefficient that vector data is corresponding in the primary vector data subset
The weighting coefficient of value for obtaining by above-mentioned quadrature matching algorithm, otherwise, m the weighting coefficient that vector data is corresponding in the primary vector data subset
Value be 0.
Step S106: to described primary vector data subset dimensionality reduction, concrete steps can comprise as shown in Figure 2 according to described the first weight vector:
Step S1061: obtain weighting coefficient matrix
Be the matrix of M * M dimension, wherein,
(m=1,2 ..., M) be with described primary vector subset in m the weight vector that vector data is corresponding;
Step S1062: structural matrix
Wherein I is unit matrix;
Step S1063: to matrix M
TrainCarry out feature decomposition, obtain eigenwert, wherein, q eigenwert is λ
q, q eigenwert characteristic of correspondence vector is v
q, v
qColumn vector for M * 1 dimension;
Specifically how to carry out feature decomposition, and then obtain eigenwert and vectorial for the common practise of this area with the eigenwert characteristic of correspondence, repeat no more here.
Step S1064: the size of eigenwert according to value sorted, obtain the 2nd to d+1 the corresponding proper vector of eigenwert according to order from small to large, and with the proper vector composition of vector data matrix Y that obtains
Train=[v
2, v
3..., v
D+1], Y
TrainBe the matrix of M * d dimension, wherein, d is the dimension behind the dimensionality reduction of presetting, m vector data X in the described primary vector data subset
mVector data behind the corresponding dimensionality reduction is x
mDescribed vector data matrix Y
TrainThe capable vector of m;
Preferably, in the embodiment of the present application, for the ease of view data is analyzed, the value of the dimension d behind the dimensionality reduction can be 2 or 3; Certainly, also can be 4,5,6 or other round values.
For example, when d=2, Y
Train=[v
2, v
3]; When d=3, Y
Train=[v
2, v
3, v
4].
Step S107: when described i vector data belongs to described secondary vector data subset, obtain second weight vector corresponding with described i vector data
Be the vector of K * 1 dimension,
J element
J vector data in K neighbour's vector data of corresponding described i vector data,
That is to say in the embodiment of the present application, to only have K element in the second weight vector.
Step S108: according to second weight vector corresponding with n vector data in the described secondary vector data subset
N vector data dimensionality reduction in the secondary vector data subset can comprise:
According to second weight vector corresponding with n vector data in the described secondary vector data subset
And the vector data collection behind the dimensionality reduction corresponding with K neighbour's vector data of n vector data in the described secondary vector data subset
Obtain n vector data X in the described secondary vector data subset
nVector data x behind the corresponding dimensionality reduction
n, be specially:
Wherein, T represents vector
Carry out the transposition computing.
Step S109: according to the distance between n vector data in any vector data in the primary vector data subset behind the dimensionality reduction and the secondary vector data subset behind the dimensionality reduction, obtain K neighbour's vector data of n vector data in the secondary vector data subset behind the dimensionality reduction;
That is to say, in this step, K neighbour's data vector of n vector data in the secondary vector data subset the primary vector data subset behind dimensionality reduction behind the searching dimensionality reduction;
Described distance can be absolute distance, also can be Euclidean distance, and the method for specifically obtaining K neighbour's vector data can referring to aforementioned content, repeat no more here.
Step S110: according to the numeric type of view data corresponding to K neighbour's vector data of described n vector data definite with described dimensionality reduction after the secondary vector data subset in the numeric type of view data to be identified corresponding to n vector data.
In the embodiment of the present application, in described K neighbour's vector data, when the numeric type that satisfies view data corresponding to the vector data of preset ratio all is same numeric type, determine that then numeric type corresponding to described n vector data is the numeric type of the view data of described preset data.
For example, if the numeric type of the view data that the data of described predetermined number are corresponding all is 6, the represented handwriting digital of view data corresponding to data that is described predetermined number all is 6, then, the numeric type of the view data to be identified that n vector data is corresponding is 6, and namely the represented handwriting digital of view data to be identified is 6.
The Handwritten Numeral Recognition Method that the application provides, handwriting digital is being carried out in the process of dimensionality reduction, come linear expression for each view data by K neighbour, weighting coefficient to each view data during by K neighbour's linear expression then quadrature matching algorithm obtains, and, come the training image data are carried out dimensionality reduction by the structure weighting coefficient matrix, treat recognition image and then carry out dimensionality reduction by the vector data behind weight vector and K neighbour's thereof the dimensionality reduction, by experiment as can be known, the Handwritten Numeral Recognition Method that the embodiment of the present application provides has improved the discrimination of Handwritten Digital Recognition.
Above-described embodiment, preferred, describedly obtain weighting coefficient by orthogonal matching pursuit algorithm
Can comprise at least one times interative computation:
In interative computation each time, in K neighbour's vector data of described i vector data, select a vector data of not determining weights
So that
Minimum, thus determine
Be in the j time interative computation with selected vector data
Corresponding weights; F=1,2 ..., u, u is default iterations.
After all interative computations are all finished, be that weights corresponding to selecteed vector data are 0 in the K of described i vector data neighbour's vector data.
Illustrate, suppose that i vector data has 3 neighbour's vector datas, is respectively
With
Default iterations is 2;
For the first time during iteration, by optimizing algorithm from
One of middle selection two norm operation values minimums supposes
Value is minimum, so, determines
Corresponding weights are
Be convenient narration, will
Be designated as
I.e. selected vector data in the 1st interative computation.
For the second time during iteration, by optimizing algorithm from
One of middle selection two norm operation values minimums supposes
Value is minimum, so, determines
Corresponding weights are
Iterations has reached default iterations, thereby determines that the value that remaining vector data is corresponding is 0, namely among 3 neighbours
Corresponding weights are 0.
Thereby concrete which kind of optimizing algorithm that adopts selects minimum definite weights to belong to the common practise of this area from above-mentioned two norm operation values, repeats no more here.
Again for example, suppose that i vector data has 3 neighbour's vector datas, is respectively
With
Default iterations is 1; So, only need to carry out an iteration just passable, concrete,
For the first time during iteration, by optimizing algorithm from
One of middle selection two norm operation values minimums supposes
Value is minimum, so, determines
Corresponding weights are
Be convenient narration, will
Be designated as
I.e. selected vector data in the 1st interative computation.
After having carried out interative computation, iterations is the iterations preset of famous director, thereby determines that the weights that other two vector datas are corresponding all are 0, namely among 3 neighbours
Corresponding weights are 0,
Corresponding weights also are 0.
Above-described embodiment, preferred, default iterations is 2, i.e. u=2 namely obtains the weighting coefficient of two vector datas in K neighbour's vector data by twice iteration, and the weighted value of other vector data is 0 in described K neighbour's vector data.
Above-described embodiment, preferred, for further facilitating the staff view data is carried out observation analysis, the dimension d value behind default dimensionality reduction is 2 or 3 o'clock, each vector data in the secondary vector data subset show described dimensionality reduction in d dimension coordinate system after.
Corresponding with embodiment of the method, the structural representation of a kind of Handwritten Numeral Recognition System that the embodiment of the present application provides comprises as shown in Figure 3:
Image data processing module 301, the first dimension-reduction treatment modules, 302, the second dimension-reduction treatment modules 303 and identification module 304; Wherein,
Image data processing module 301 is used for obtaining sets of image data, and described sets of image data comprises training image data subset and view data subset to be identified; Each view data in the described sets of image data is stretched, obtain the vector data set, described vector data set comprises the primary vector data subset corresponding with described training image data subset, and the secondary vector data subset corresponding with described view data subset to be identified; According to the distance between each vector data in i vector data and the described primary vector data subset in the vector data set, obtain K neighbour's vector data of described i vector data; With described K described i the vector data of neighbour's vector data linear expression:
Wherein, X
iBe described i vector data;
Be j vector data in described K neighbour's vector data;
For with described K neighbour's vector data in j vector data
Corresponding weighting coefficient,
Obtain by orthogonal matching pursuit algorithm;
The first dimension-reduction treatment module 302 is used for obtaining first weight vector corresponding with described i vector data
M element
M vector data in the corresponding described primary vector data subset, wherein, when m vector data in the described primary vector data subset is j vector data in K neighbour's vector data of described i vector data,
When described m vector data do not belong to K neighbour's vector data of described i vector data,
, comprising: obtain weighting coefficient matrix described primary vector data subset dimensionality reduction according to described the first weight vector
Wherein,
(m=1,2 ..., M) be with described primary vector subset in m the weight vector that vector data is corresponding; Structural matrix
Wherein I is unit matrix; To matrix M
TrainCarry out feature decomposition, obtain eigenwert, wherein, q eigenwert is λ
q, q eigenwert characteristic of correspondence vector is v
q, v
qColumn vector for M * 1 dimension; The size of eigenwert according to value sorted, obtain the 2nd to d+1 the corresponding proper vector of eigenwert according to order from small to large, and with the proper vector composition of vector data matrix Y that obtains
Train=[v
2, v
3..., v
D+1], wherein, d is the dimension behind the dimensionality reduction of presetting, m vector data X in the described primary vector data subset
mVector data behind the corresponding dimensionality reduction is x
mDescribed vector data matrix Y
TrainThe capable vector of m;
The second dimension-reduction treatment module 303 is used for obtaining second weight vector corresponding with described i vector data
J element
J vector data in K neighbour's vector data of corresponding described i vector data,
, comprising described secondary vector data subset dimensionality reduction according to described the second weight vector: according to second weight vector corresponding with n vector data in the described secondary vector data subset
And the vector data collection behind the dimensionality reduction corresponding with K neighbour's vector data of n vector data in the described secondary vector data subset
Obtain n vector data X in the described secondary vector data subset
nVector data x behind the corresponding dimensionality reduction
n, be specially:
The below verifies explanation with instantiation to the application's scheme:
The application is at MATLAB(MATrix LABoratory, the matrix experiment chamber) tests in the software, verify with the view data in the MNIST handwritten form database, comprise ten handwriting digitals of 0-9 in the MNIST handwritten form database, have 60000 training samples (numeric type is known) and 10000 test sample books (numeric type is unknown), corresponding several training samples of each numeral and several test sample books.
In the present application example, each digital random is selected 200 training samples and 500 test sample books in MNIST handwritten form database, and therefore, one has 2000 training samples and 5000 test sample books.
Be convenient narration, the below is designated as { I with the sets of image data of 2000 training samples and 5000 test sample book compositions
i, wherein, I
i∈ R
M * nI view data, the capable pixel count of m presentation video data, the row pixel count of n presentation video data, in the present application example, m=n=28.
At { I
iIn, front 2000 have label, namely
l
i∈ 1,2 ..., 10} is I
iCorresponding label is used for indicating I
iNumeric type, front 2000 composing training view data subsets
Rear 5000 without label, consist of the test pattern data subset
In the present application example, the view data subset is dropped to 3 dimensions, the detailed process of present application example is as follows:
Obtain sets of image data { I
i;
With sets of image data { I
iIn each view data stretch, obtain the vector data set
Wherein, x
i∈ R
Mn * 1To view data I
iThe lira of going is stretched acquisition.Wherein, comprise the primary vector data subset corresponding with the training image data subset in the vector data set
With the secondary vector data subset corresponding with data image subset to be tested
To primary vector data subset X
TrainIn each element x
i, according to x
iWith primary vector data subset X
TrainIn Euclidean distance between other element, determine and x
iK vector data of Euclidean distance minimum be x
iK neighbour, for convenient narration, this K vector data is designated as primary vector data subset X
TrainIn element x
iNeighbour's point set
In present application example, K=9.
Use neighbour's point set
Come the element x in the linear expression primary vector data subset
i, namely
Wherein,
Be
Weighting coefficient, obtain by the quadrature matching algorithm, concrete, comprise interative computation twice:
In first time interative computation, the vector data of selecting in K neighbour's vector data of described i vector data is
So that
Minimum, thus determine with
Corresponding weights
In second time iteration budget, in remaining K-1 neighbour's vector data, select a vector data
So that
Minimum, thus determine with
Corresponding weights
The weights that a remaining K-2 vector data is corresponding all are 0.
Element x in acquisition and the primary vector data subset
iThe first corresponding weight vector
In comprise 2000 elements,
J element
J element in the corresponding described primary vector data subset,
Value as follows:
With primary vector data subset X
TrainIn the synthetic weighting coefficient matrix of weight vector corresponding to each element
Obviously, W
TrainIt is the matrix of 2000 * 2000 dimensions.
Structural matrix M
Train=(I-W
Train)
T(I-W
Train), wherein I is unit matrix;
To M
TrainCarry out feature decomposition, making its j eigenwert is λ
j, j eigenwert characteristic of correspondence vector is v
j, v
jIn include 2000 elements, suppose arranged sequentially according to from small to large of eigenwert, the 2nd to the 4th eigenwert characteristic of correspondence vector formed primary vector data matrix Y behind the dimensionality reduction
Train=[v
2, v
3, v
4], obviously, Y
TrainIt is the matrix of 2000 * 3 dimensions.
Element x in the primary vector data subset
iBe Y
TrainThe capable vector of i.
The process of the training image data subset being carried out dimensionality reduction more than has been described, has the following describes the process that the recognition image data subset carries out dimensionality reduction for the treatment of:
To secondary vector data subset X
TestIn each element x
i, according to x
iWith primary vector data subset X
TrainIn Euclidean distance between each element, determine and x
i9 vector datas of Euclidean distance minimum be x
i9 neighbours, for convenient narration, these 9 vector datas are designated as secondary vector data subset X
TestMiddle element x
iNeighbour's point set
Use neighbour's point set
Come the element x in the linear expression secondary vector data subset
i, namely
Wherein,
Be
Weighting coefficient, obtain by the quadrature matching algorithm, specifically can referring to preceding method, repeat no more here.
Element x in acquisition and the secondary vector data subset
iThe second corresponding weight vector
In comprise 9 elements,
J element
Element x in the corresponding described secondary vector data subset
i9 neighbour's vector datas in j vector data,
According to the second weight vector
And the element x in the described secondary vector data subset
iDimensionality reduction corresponding to 9 neighbour's vector datas after the vector data collection
Obtain the element x in the described secondary vector data subset
iDimensionality reduction after vector data x
i, be specially:
Wherein, T represents the transposition computing.
Behind all view data dimensionality reductions, according to the Euclidean distance between n vector data in any vector data in the primary vector data subset behind the dimensionality reduction and the secondary vector data subset behind the dimensionality reduction, obtain 9 neighbour's vector datas of n vector data in the secondary vector data subset behind the dimensionality reduction;
According to the numeric type of view data corresponding to 9 neighbour's vector datas of described n vector data definite with described dimensionality reduction after the secondary vector data subset in the numeric type of view data to be identified corresponding to n vector data.
The below compares to the handwriting digital discrimination of identifying and the discrimination that utilizes the digit recognition method that embeds based on linearizing local linear to the Handwritten Numeral Recognition Method of utilizing the application and providing, see table 1 for details, can find, the Handwritten Numeral Recognition Method that the embodiment of the present application provides, in the situation that identification is substantially constant several times, improved the discrimination of Handwritten Digital Recognition.
To the above-mentioned explanation of the disclosed embodiments, make this area professional and technical personnel can realize or use the present invention.Multiple modification to these embodiment will be apparent concerning those skilled in the art, and General Principle as defined herein can be in the situation that do not break away from the spirit or scope of the present invention, in other embodiments realization.Therefore, the present invention will can not be restricted to these embodiment shown in this article, but will meet the widest scope consistent with principle disclosed herein and features of novelty.
The contrast of table 1 discrimination
Claims (7)
1. a Handwritten Numeral Recognition Method is characterized in that, comprising:
Obtain sets of image data, described sets of image data comprises training image data subset and view data subset to be identified;
Each view data in the described sets of image data is stretched, obtain the vector data set, described vector data set comprises the primary vector data subset corresponding with described training image data subset, and the secondary vector data subset corresponding with described view data subset to be identified;
According to the distance between each vector data in i vector data and the described primary vector data subset in the vector data set, obtain K neighbour's vector data of described i vector data;
With described K described i the vector data of neighbour's vector data linear expression:
Wherein, X
iBe described i vector data;
Be j vector data in described K neighbour's vector data;
For with described K neighbour's vector data in j vector data
Corresponding weighting coefficient,
Obtain by orthogonal matching pursuit algorithm;
When described i vector data belongs to described primary vector data subset, obtain first weight vector corresponding with described i vector data
M element
M vector data in the corresponding described primary vector data subset, wherein, when m vector data in the described primary vector data subset is j vector data in K neighbour's vector data of described i vector data,
When described m vector data do not belong to K neighbour's vector data of described i vector data,
, comprising described primary vector data subset dimensionality reduction according to described the first weight vector:
Obtain weighting coefficient matrix
Wherein,
(m=1,2 ..., M) be with described primary vector subset in m the weight vector that vector data is corresponding;
Structural matrix
Wherein I is unit matrix;
To matrix M
TrainCarry out feature decomposition, obtain eigenwert, wherein, q eigenwert is λ
q, q eigenwert characteristic of correspondence vector is v
q, v
qColumn vector for M * 1 dimension;
The size of eigenwert according to value sorted, obtain the 2nd to d+1 the corresponding proper vector of eigenwert according to order from small to large, and with the proper vector composition of vector data matrix Y that obtains
Train=[v
2, v
3..., v
D+1], wherein, d is the dimension behind the dimensionality reduction of presetting, m vector data X in the described primary vector data subset
mVector data behind the corresponding dimensionality reduction is x
mDescribed vector data matrix Y
TrainThe capable vector of m;
When described i vector data belongs to described secondary vector data subset, obtain second weight vector corresponding with described i vector data
J element
J vector data in K neighbour's vector data of corresponding described i vector data,
, comprising described secondary vector data subset dimensionality reduction according to described the second weight vector:
According to second weight vector corresponding with n vector data in the described secondary vector data subset
And the vector data collection behind the dimensionality reduction corresponding with K neighbour's vector data of n vector data in the described secondary vector data subset
Obtain n vector data X in the described secondary vector data subset
nVector data x behind the corresponding dimensionality reduction
n, be specially:
According to the distance between n vector data in any vector data in the primary vector data subset behind the dimensionality reduction and the secondary vector data subset behind the dimensionality reduction, obtain K neighbour's vector data of n vector data in the secondary vector data subset behind the dimensionality reduction;
According to the numeric type of view data corresponding to K neighbour's vector data of described n vector data definite with described dimensionality reduction after the secondary vector data subset in the numeric type of view data to be identified corresponding to n vector data.
2. method according to claim 1, it is characterized in that, described according to the distance between each vector data in i vector data and the described primary vector data subset in the vector data set, K the neighbour's vector data that obtains described i vector data comprises:
According to the Euclidean distance between each vector data in i vector data and the described primary vector data subset in the vector data set, obtain K neighbour's vector data of described i vector data.
3. method according to claim 1 is characterized in that, describedly obtains by orthogonal matching pursuit algorithm
Comprise at least one times interative computation:
In interative computation each time, in K neighbour's vector data of described i vector data, select a vector data of not determining weights
So that
Minimum, thus determine
4. method according to claim 1 is characterized in that, described each view data in the described sets of image data is stretched comprises:
Each view data in the described sets of image data is stretched or stretches by row by row.
5. method according to claim 1 is characterized in that, the dimension behind the described default dimensionality reduction is 2 or 3.
6. method according to claim 5 is characterized in that, also comprises:
Each vector data in the secondary vector data subset after in d dimension coordinate system, showing described dimensionality reduction.
7. a Handwritten Numeral Recognition System is characterized in that, comprising:
Image data processing module is used for obtaining sets of image data, and described sets of image data comprises training image data subset and view data subset to be identified; Each view data in the described sets of image data is stretched, obtain the vector data set, described vector data set comprises the primary vector data subset corresponding with described training image data subset, and the secondary vector data subset corresponding with described view data subset to be identified; According to the distance between each vector data in i vector data and the described primary vector data subset in the vector data set, obtain K neighbour's vector data of described i vector data; With described K described i the vector data of neighbour's vector data linear expression:
Wherein, X
iBe described i vector data;
Be j vector data in described K neighbour's vector data;
For with described K neighbour's vector data in j vector data
Corresponding weighting coefficient,
Obtain by orthogonal matching pursuit algorithm;
The first dimension-reduction treatment module is used for obtaining first weight vector corresponding with described i vector data
M element
M vector data in the corresponding described primary vector data subset, wherein, when m vector data in the described primary vector data subset is j vector data in K neighbour's vector data of described i vector data,
When described m vector data do not belong to K neighbour's vector data of described i vector data,
, comprising: obtain weighting coefficient matrix described primary vector data subset dimensionality reduction according to described the first weight vector
Wherein,
(m=1,2 ..., M) be with described primary vector subset in m the weight vector that vector data is corresponding; Structural matrix
Wherein I is unit matrix; To matrix M
TrainCarry out feature decomposition, obtain eigenwert, wherein, q eigenwert is λ
q, q eigenwert characteristic of correspondence vector is v
q, v
qColumn vector for M * 1 dimension; The size of eigenwert according to value sorted, obtain the 2nd to d+1 the corresponding proper vector of eigenwert according to order from small to large, and with the proper vector composition of vector data matrix Y that obtains
Train=[v
2, v
3..., v
D+1], wherein, d is the dimension behind the dimensionality reduction of presetting, m vector data X in the described primary vector data subset
mVector data behind the corresponding dimensionality reduction is x
mDescribed vector data matrix Y
TrainThe capable vector of m;
The second dimension-reduction treatment module is used for obtaining second weight vector corresponding with described i vector data
J element
J vector data in K neighbour's vector data of corresponding described i vector data,
, comprising described secondary vector data subset dimensionality reduction according to described the second weight vector: according to second weight vector corresponding with n vector data in the described secondary vector data subset
And the vector data collection behind the dimensionality reduction corresponding with K neighbour's vector data of n vector data in the described secondary vector data subset
Obtain n vector data X in the described secondary vector data subset
nVector data x behind the corresponding dimensionality reduction
n, be specially:
Identification module, be used for according to the distance between n vector data in any vector data of the primary vector data subset behind the dimensionality reduction and the secondary vector data subset behind the dimensionality reduction, obtain K neighbour's vector data of n vector data in the secondary vector data subset behind the dimensionality reduction; According to the numeric type of view data corresponding to K neighbour's vector data of described n vector data definite with described dimensionality reduction after the secondary vector data subset in the numeric type of view data to be identified corresponding to n vector data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310286449.4A CN103310237B (en) | 2013-07-09 | 2013-07-09 | Handwritten Numeral Recognition Method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310286449.4A CN103310237B (en) | 2013-07-09 | 2013-07-09 | Handwritten Numeral Recognition Method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103310237A true CN103310237A (en) | 2013-09-18 |
CN103310237B CN103310237B (en) | 2016-08-24 |
Family
ID=49135431
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310286449.4A Active CN103310237B (en) | 2013-07-09 | 2013-07-09 | Handwritten Numeral Recognition Method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103310237B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103400161A (en) * | 2013-07-18 | 2013-11-20 | 苏州大学 | Handwritten numeral recognition method and system |
CN103679207A (en) * | 2014-01-02 | 2014-03-26 | 苏州大学 | Handwriting number identification method and system |
CN104778478A (en) * | 2015-04-22 | 2015-07-15 | 中国石油大学(华东) | Handwritten numeral identification method |
CN106257495A (en) * | 2015-06-19 | 2016-12-28 | 阿里巴巴集团控股有限公司 | A kind of digit recognition method and device |
CN106650820A (en) * | 2016-12-30 | 2017-05-10 | 山东大学 | Matching recognition method of handwritten electrical component symbols and standard electrical component symbols |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102722713A (en) * | 2012-02-22 | 2012-10-10 | 苏州大学 | Handwritten numeral recognition method based on lie group structure data and system thereof |
CN103164701A (en) * | 2013-04-10 | 2013-06-19 | 苏州大学 | Method and device for recognizing handwritten numbers |
-
2013
- 2013-07-09 CN CN201310286449.4A patent/CN103310237B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102722713A (en) * | 2012-02-22 | 2012-10-10 | 苏州大学 | Handwritten numeral recognition method based on lie group structure data and system thereof |
CN103164701A (en) * | 2013-04-10 | 2013-06-19 | 苏州大学 | Method and device for recognizing handwritten numbers |
Non-Patent Citations (3)
Title |
---|
BANGJUN-WANG 等: "A Classification Algorithm in Li-K Nearest Neighbor", 《2013 FOURTH GLOBAL CONGRESS ON INTELLIGENT SYSTEMS》 * |
LI ZHANG 等: "Density-induced margin support vector machines", 《PATTERN RECOGNITION》 * |
苗壮: "基于DFS的手写数字识别模型及其应用研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103400161A (en) * | 2013-07-18 | 2013-11-20 | 苏州大学 | Handwritten numeral recognition method and system |
CN103679207A (en) * | 2014-01-02 | 2014-03-26 | 苏州大学 | Handwriting number identification method and system |
CN104778478A (en) * | 2015-04-22 | 2015-07-15 | 中国石油大学(华东) | Handwritten numeral identification method |
CN106257495A (en) * | 2015-06-19 | 2016-12-28 | 阿里巴巴集团控股有限公司 | A kind of digit recognition method and device |
CN106650820A (en) * | 2016-12-30 | 2017-05-10 | 山东大学 | Matching recognition method of handwritten electrical component symbols and standard electrical component symbols |
CN106650820B (en) * | 2016-12-30 | 2020-04-24 | 山东大学 | Matching and recognizing method for handwritten electric component symbol and standard electric component symbol |
Also Published As
Publication number | Publication date |
---|---|
CN103310237B (en) | 2016-08-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104463247B (en) | The abstracting method of spectral vector cross-correlation feature in a kind of classification hyperspectral imagery | |
CN103235947B (en) | A kind of Handwritten Numeral Recognition Method and device | |
CN105069811B (en) | A kind of Multitemporal Remote Sensing Images change detecting method | |
Shimizu et al. | Discovery of non-gaussian linear causal models using ICA | |
CN103310237A (en) | Handwritten digit recognition method and system | |
CN104616029B (en) | Data classification method and device | |
Małek et al. | The VIMOS Public Extragalactic Redshift Survey (VIPERS)-A support vector machine classification of galaxies, stars, and AGNs | |
CN102609681A (en) | Face recognition method based on dictionary learning models | |
CN108229588B (en) | Machine learning identification method based on deep learning | |
Badrinarayanan et al. | Understanding symmetries in deep networks | |
CN103164701B (en) | Handwritten Numeral Recognition Method and device | |
CN110659378B (en) | Fine-grained image retrieval method based on contrast similarity loss function | |
CN104217438A (en) | Image significance detection method based on semi-supervision | |
CN108764310A (en) | SAR target identification methods based on multiple dimensioned multiple features depth forest | |
CN104966075B (en) | A kind of face identification method and system differentiating feature based on two dimension | |
CN105389486A (en) | Authentication method based on mouse behavior | |
CN103440508A (en) | Remote sensing image target recognition method based on visual word bag model | |
Hussein et al. | A texture-based approach for content based image retrieval system for plant leaves images | |
CN106778714A (en) | LDA face identification methods based on nonlinear characteristic and model combination | |
CN103310205B (en) | A kind of Handwritten Numeral Recognition Method and device | |
CN109448842B (en) | The determination method, apparatus and electronic equipment of human body intestinal canal Dysbiosis | |
CN104572930B (en) | Data classification method and device | |
CN107392249A (en) | A kind of density peak clustering method of k nearest neighbor similarity optimization | |
CN108345942B (en) | Machine learning identification method based on embedded code learning | |
CN105975940A (en) | Palm print image identification method based on sparse directional two-dimensional local discriminant projection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CP02 | Change in the address of a patent holder |
Address after: Suzhou City, Jiangsu province 215123 Xiangcheng District Ji Road No. 8 Patentee after: Soochow University Address before: 215123 Suzhou Industrial Park, Jiangsu Road, No. 199 Patentee before: Soochow University |
|
CP02 | Change in the address of a patent holder |