CN103310237A

CN103310237A - Handwritten digit recognition method and system

Info

Publication number: CN103310237A
Application number: CN2013102864494A
Authority: CN
Inventors: 张莉; 冷亦琴; 包兴; 杨季文; 李凡长
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2013-07-09
Filing date: 2013-07-09
Publication date: 2013-09-18
Anticipated expiration: 2033-07-09
Also published as: CN103310237B

Abstract

An embodiment of the invention discloses a handwritten digit recognition method and system. During dimensionality reduction of handwritten digits, all image data are represented linearly through K neighbors, weighting coefficients of all image data when the image data are represented linearly through the K neighbors are obtained through an orthogonal matching algorithm, besides, weighting coefficient matrixes are established to perform dimensionality reduction on training image data, and dimensionality reduction is performed on images to be recognized through weighting coefficient vectors and vector data of the K neighbors of the images after dimensionality reduction. According to experiments, by the aid of the handwritten digit recognition method, the recognition rate of handwritten digit recognition is improved.

Description

Handwritten Numeral Recognition Method and system

Technical field

The present invention relates to area of pattern recognition, more particularly, relate to a kind of Handwritten Numeral Recognition Method and system.

Background technology

Along with the develop rapidly of computer technology and digital image processing techniques, Handwritten Digital Recognition is in ecommerce, and the occasions such as the automatic input of machine obtain practical application.

Yet handwriting digital is high dimensional data, if directly to its identification, not only the time long, and computation complexity is large, thereby, often handwriting digital is carried out identifying behind the dimensionality reduction, based on this, the people such as saul have proposed a kind of digit recognition method that embeds based on linearizing local linear again.But, can use least square method in the method finds the solution the local linear of data and represents coefficient, relate to and ask inverse of a matrix, if matrix is singular matrix, and there is not inverse matrix in singular matrix, so, will be without solution, so that the discrimination of Handwritten Digital Recognition is lower when the local linear of finding the solution data with least square method represents coefficient.

Therefore, the discrimination that how to improve Handwritten Digital Recognition becomes problem demanding prompt solution.

Summary of the invention

The purpose of this invention is to provide a kind of Handwritten Numeral Recognition Method and system, to improve the discrimination of Handwritten Digital Recognition.

For achieving the above object, the invention provides following technical scheme:

A kind of Handwritten Numeral Recognition Method is characterized in that, comprising:

Obtain sets of image data, described sets of image data comprises training image data subset and view data subset to be identified;

Each view data in the described sets of image data is stretched, obtain the vector data set, described vector data set comprises the primary vector data subset corresponding with described training image data subset, and the secondary vector data subset corresponding with described view data subset to be identified;

According to the distance between each vector data in i vector data and the described primary vector data subset in the vector data set, obtain K neighbour's vector data of described i vector data;

With described K described i the vector data of neighbour's vector data linear expression:

X_{i} = Σ_{j = 1}^{K} w_{i}^{j} X_{i}^{j},

Wherein, X _iBe described i vector data;

Be j vector data in described K neighbour's vector data;

For with described K neighbour's vector data in j vector data

Corresponding weighting coefficient,

Obtain by orthogonal matching pursuit algorithm;

When described i vector data belongs to described primary vector data subset, obtain first weight vector corresponding with described i vector data

M element M vector data in the corresponding described primary vector data subset, wherein, when m vector data in the described primary vector data subset is j vector data in K neighbour's vector data of described i vector data, When described m vector data do not belong to K neighbour's vector data of described i vector data,

W_{im}^{tr} = 0;

, comprising described primary vector data subset dimensionality reduction according to described the first weight vector:

Obtain weighting coefficient matrix

W_{train}^{tr} = [W_{1}^{tr}; W_{2}^{tr}; \cdot \cdot \cdot W_{M}^{tr}],

Wherein,

(m=1,2 ..., M) be with described primary vector subset in m the weight vector that vector data is corresponding;

Structural matrix

M_{train} = {(I - W_{train}^{tr})}^{T} (I - W_{train}^{tr}),

Wherein I is unit matrix;

To matrix M _TrainCarry out feature decomposition, obtain eigenwert, wherein, q eigenwert is λ _q, q eigenwert characteristic of correspondence vector is v _q, v _qColumn vector for M * 1 dimension;

The size of eigenwert according to value sorted, obtain the 2nd to d+1 the corresponding proper vector of eigenwert according to order from small to large, and with the proper vector composition of vector data matrix Y that obtains _Train=[v ₂, v ₃..., v _D+1], wherein, d is the dimension behind the dimensionality reduction of presetting, m vector data X in the described primary vector data subset _mVector data behind the corresponding dimensionality reduction is x _mDescribed vector data matrix Y _TrainThe capable vector of m;

When described i vector data belongs to described secondary vector data subset, obtain second weight vector corresponding with described i vector data

J element J vector data in K neighbour's vector data of corresponding described i vector data,

, comprising described secondary vector data subset dimensionality reduction according to described the second weight vector:

According to second weight vector corresponding with n vector data in the described secondary vector data subset And the vector data collection behind the dimensionality reduction corresponding with K neighbour's vector data of n vector data in the described secondary vector data subset

Obtain n vector data X in the described secondary vector data subset _nVector data x behind the corresponding dimensionality reduction _n, be specially:

x_{n} = Y_{test}^{n} \cdot W_{n}^{teT};

According to the distance between n vector data in any vector data in the primary vector data subset behind the dimensionality reduction and the secondary vector data subset behind the dimensionality reduction, obtain K neighbour's vector data of n vector data in the secondary vector data subset behind the dimensionality reduction;

According to the numeric type of view data corresponding to K neighbour's vector data of described n vector data definite with described dimensionality reduction after the secondary vector data subset in the numeric type of view data to be identified corresponding to n vector data.

Said method, preferred, described according to the distance between each vector data in i vector data and the described primary vector data subset in the vector data set, K the neighbour's vector data that obtains described i vector data comprises:

According to the Euclidean distance between each vector data in i vector data and the described primary vector data subset in the vector data set, obtain K neighbour's vector data of described i vector data.

Said method, preferred, describedly obtain by orthogonal matching pursuit algorithm

Comprise at least one times interative computation:

In interative computation each time, in K neighbour's vector data of described i vector data, select a vector data of not determining weights So that

Minimum, thus determine

Wherein,

Be in the j time interative computation with selected vector data

Corresponding weights; F=1,2 ..., u, u is default iterations.

Said method, preferred, described each view data in the described sets of image data is stretched comprises:

Each view data in the described sets of image data is stretched or stretches by row by row.

Said method, preferred, the dimension behind the described default dimensionality reduction is 2 or 3.

Said method, preferred, also comprise:

Each vector data in the secondary vector data subset after in d dimension coordinate system, showing described dimensionality reduction.

A kind of Handwritten Numeral Recognition System is characterized in that, comprising:

Image data processing module is used for obtaining sets of image data, and described sets of image data comprises training image data subset and view data subset to be identified; Each view data in the described sets of image data is stretched, obtain the vector data set, described vector data set comprises the primary vector data subset corresponding with described training image data subset, and the secondary vector data subset corresponding with described view data subset to be identified; According to the distance between each vector data in i vector data and the described primary vector data subset in the vector data set, obtain K neighbour's vector data of described i vector data; With described K described i the vector data of neighbour's vector data linear expression:

X_{i} = Σ_{j = 1}^{K} w_{i}^{j} X_{i}^{j},

Wherein, X _iBe described i vector data; Be j vector data in described K neighbour's vector data; For with described K neighbour's vector data in j vector data

Corresponding weighting coefficient,

Obtain by orthogonal matching pursuit algorithm;

The first dimension-reduction treatment module is used for obtaining first weight vector corresponding with described i vector data

M element

M vector data in the corresponding described primary vector data subset, wherein, when m vector data in the described primary vector data subset is j vector data in K neighbour's vector data of described i vector data,

When described m vector data do not belong to K neighbour's vector data of described i vector data,

, comprising: obtain weighting coefficient matrix described primary vector data subset dimensionality reduction according to described the first weight vector

Wherein,

(m=1,2 ..., M) be with described primary vector subset in m the weight vector that vector data is corresponding; Structural matrix

Wherein I is unit matrix; To matrix M _TrainCarry out feature decomposition, obtain eigenwert, wherein, q eigenwert is λ _q, q eigenwert characteristic of correspondence vector is v _q, v _qColumn vector for M * 1 dimension; The size of eigenwert according to value sorted, obtain the 2nd to d+1 the corresponding proper vector of eigenwert according to order from small to large, and with the proper vector composition of vector data matrix Y that obtains _Train=[v ₂, v ₃..., v _D+1], wherein, d is the dimension behind the dimensionality reduction of presetting, m vector data X in the described primary vector data subset _mVector data behind the corresponding dimensionality reduction is x _mDescribed vector data matrix Y _TrainThe capable vector of m;

The second dimension-reduction treatment module is used for obtaining second weight vector corresponding with described i vector data J element

J vector data in K neighbour's vector data of corresponding described i vector data,

, comprising described secondary vector data subset dimensionality reduction according to described the second weight vector: according to second weight vector corresponding with n vector data in the described secondary vector data subset

And the vector data collection behind the dimensionality reduction corresponding with K neighbour's vector data of n vector data in the described secondary vector data subset

x_{n} = Y_{test}^{n} \cdot W_{n}^{teT};

Identification module, be used for according to the distance between n vector data in any vector data of the primary vector data subset behind the dimensionality reduction and the secondary vector data subset behind the dimensionality reduction, obtain K neighbour's vector data of n vector data in the secondary vector data subset behind the dimensionality reduction; According to the numeric type of view data corresponding to K neighbour's vector data of described n vector data definite with described dimensionality reduction after the secondary vector data subset in the numeric type of view data to be identified corresponding to n vector data.

By above scheme as can be known, the Handwritten Numeral Recognition Method that the application provides, handwriting digital is being carried out in the process of dimensionality reduction, come linear expression for each view data by K neighbour, weighting coefficient to each view data during by K neighbour's linear expression then quadrature matching algorithm obtains, and, come the training image data are carried out dimensionality reduction by the structure weighting coefficient matrix, treat recognition image and then carry out dimensionality reduction by the vector data behind weight vector and K neighbour's thereof the dimensionality reduction, by experiment as can be known, the Handwritten Numeral Recognition Method that the embodiment of the present application provides has improved the discrimination of Handwritten Digital Recognition.

Description of drawings

In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, the below will do to introduce simply to the accompanying drawing of required use in embodiment or the description of the Prior Art, apparently, accompanying drawing in the following describes only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain according to these accompanying drawings other accompanying drawing.

The process flow diagram of a kind of Handwritten Numeral Recognition Method that Fig. 1 provides for the embodiment of the present application;

Foundation the first weight vector that Fig. 2 provides for the embodiment of the present application is to the process flow diagram of the method for primary vector data subset dimensionality reduction;

The structural representation of a kind of Handwritten Numeral Recognition System that Fig. 3 provides for the embodiment of the present application.

Term " first " in instructions and claims and the above-mentioned accompanying drawing, " second ", " the 3rd " " 4th " etc. (if exist) are be used to distinguishing similar part, and needn't be used for describing specific order or precedence.The data that should be appreciated that such use suitably can exchanged in the situation, so that the application's described herein embodiment can be with except the order enforcement here illustrated.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the invention, the technical scheme in the embodiment of the invention is clearly and completely described, obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that obtains under the creative work prerequisite.

See also Fig. 1, the process flow diagram of a kind of Handwritten Numeral Recognition Method that Fig. 1 provides for the embodiment of the present application comprises:

Step S101: obtain sets of image data, described sets of image data comprises training image data subset and view data subset to be identified;

In the embodiment of the present application, comprise two class view data in the sets of image data, be respectively training image data and view data to be identified, wherein, the training image data are known digital types, and namely each training image data representation is which numeral is known.

Step S102: each view data in the described sets of image data is stretched, obtain the vector data set, described vector set comprises the primary vector data subset corresponding with described training image data subset, and the secondary vector data subset corresponding with described view data subset to be identified;

Suppose that raw image data is that a * b ties up, so, after the stretching, the vector data of acquisition is ab * 1 dimension.

View data is stretched be, can stretch by row, also can stretch by row, need to prove, stretch to all view data according to the same stretching mode.

The below illustrates and how view data is stretched, because a view data is a two-dimensional matrix, supposes that a view data is

[\begin{matrix} 1 & 1 & 1 \\ 2 & 2 & 2 \\ 3 & 3 & 3 \end{matrix}]

So, this view data is specially by the row stretching, from the first row of two-dimensional matrix, each row is in turn connected into a vector, obtain vector data: [1 1122233 3] ^T

This view data is specially by the row stretching, from the first row of two-dimensional matrix, each leu is connected into a vector, obtain vector data: [1 2312312 3] ^T

Step S103: according to the distance between each vector data in i vector data and the described primary vector data subset in the vector data set, obtain K neighbour's vector data of described i vector data; That is to say, no matter described i vector data belongs to the primary vector data subset or belong to the secondary vector data subset, the K of i vector data neighbour's vector data all sought from described primary vector data subset.

Described distance can be absolute distance, that is,

s (x, y) = Σ_{k = 1}^{l} | x_{k} - y_{k} |

Wherein, the distance between s (x, y) expression vector data x and the vector data y, x _kK the element of expression vector data x, y _kK the element of expression vector data y;

Preferably, in the embodiment of the present application, described distance is Euclidean distance, that is,

s (x, y) = {[Σ_{k = 1}^{l} {(x_{k} - y_{k})}^{2}]}^{\frac{1}{2}}

Wherein, the distance between s (x, y) expression vector data x and the vector data y, x _kK the element of expression vector data x, y _kK the element of expression vector data y.

The K of described i vector data neighbour's vector data can for, in the described primary vector data subset with the distance of described i vector data all vector datas less than preset value;

The K of described i vector data neighbour's vector data also can for, in the described primary vector data subset with minimum K the vector data of the distance of described i vector data.

Described K is positive integer, and the value of K can rule of thumb be worth to come definite, also can determine by emulation, and preferred, the value of described K can be carried out value between 3～20.

Step S104: with described K described i the vector data of neighbour's vector data linear expression:

X_{i} = Σ_{j = 1}^{K} w_{i}^{j} X_{i}^{j},

Wherein, X _iBe described i vector data;

Be j vector data in described K neighbour's vector data;

For with described K neighbour's vector data in j vector data

Corresponding weighting coefficient,

Obtain by orthogonal matching pursuit algorithm;

Step S105: when described i vector data belongs to described primary vector data subset, obtain first weight vector corresponding with described i vector data

Be the vector of M * 1 dimension,

M element

In the embodiment of the present application, when m vector data in the primary vector data subset is certain vector in K neighbour's vector data of described i vector data, m the weighting coefficient that vector data is corresponding in the primary vector data subset

The weighting coefficient of value for obtaining by above-mentioned quadrature matching algorithm, otherwise, m the weighting coefficient that vector data is corresponding in the primary vector data subset Value be 0.

Step S106: to described primary vector data subset dimensionality reduction, concrete steps can comprise as shown in Figure 2 according to described the first weight vector:

Step S1061: obtain weighting coefficient matrix

Be the matrix of M * M dimension, wherein, (m=1,2 ..., M) be with described primary vector subset in m the weight vector that vector data is corresponding;

Step S1062: structural matrix

M_{train} = {(I - W_{train}^{tr})}^{T} (I - W_{train}^{tr}),

Wherein I is unit matrix;

Step S1063: to matrix M _TrainCarry out feature decomposition, obtain eigenwert, wherein, q eigenwert is λ _q, q eigenwert characteristic of correspondence vector is v _q, v _qColumn vector for M * 1 dimension;

Specifically how to carry out feature decomposition, and then obtain eigenwert and vectorial for the common practise of this area with the eigenwert characteristic of correspondence, repeat no more here.

Step S1064: the size of eigenwert according to value sorted, obtain the 2nd to d+1 the corresponding proper vector of eigenwert according to order from small to large, and with the proper vector composition of vector data matrix Y that obtains _Train=[v ₂, v ₃..., v _D+1], Y _TrainBe the matrix of M * d dimension, wherein, d is the dimension behind the dimensionality reduction of presetting, m vector data X in the described primary vector data subset _mVector data behind the corresponding dimensionality reduction is x _mDescribed vector data matrix Y _TrainThe capable vector of m;

Preferably, in the embodiment of the present application, for the ease of view data is analyzed, the value of the dimension d behind the dimensionality reduction can be 2 or 3; Certainly, also can be 4,5,6 or other round values.

For example, when d=2, Y _Train=[v ₂, v ₃]; When d=3, Y _Train=[v ₂, v ₃, v ₄].

Step S107: when described i vector data belongs to described secondary vector data subset, obtain second weight vector corresponding with described i vector data

Be the vector of K * 1 dimension,

J element

W_{in}^{te} = w_{i}^{j};

That is to say in the embodiment of the present application, to only have K element in the second weight vector.

Step S108: according to second weight vector corresponding with n vector data in the described secondary vector data subset

N vector data dimensionality reduction in the secondary vector data subset can comprise:

According to second weight vector corresponding with n vector data in the described secondary vector data subset

x_{n} = Y_{test}^{n} \cdot W_{n}^{teT},

Wherein, T represents vector Carry out the transposition computing.

Step S109: according to the distance between n vector data in any vector data in the primary vector data subset behind the dimensionality reduction and the secondary vector data subset behind the dimensionality reduction, obtain K neighbour's vector data of n vector data in the secondary vector data subset behind the dimensionality reduction;

That is to say, in this step, K neighbour's data vector of n vector data in the secondary vector data subset the primary vector data subset behind dimensionality reduction behind the searching dimensionality reduction;

Described distance can be absolute distance, also can be Euclidean distance, and the method for specifically obtaining K neighbour's vector data can referring to aforementioned content, repeat no more here.

Step S110: according to the numeric type of view data corresponding to K neighbour's vector data of described n vector data definite with described dimensionality reduction after the secondary vector data subset in the numeric type of view data to be identified corresponding to n vector data.

In the embodiment of the present application, in described K neighbour's vector data, when the numeric type that satisfies view data corresponding to the vector data of preset ratio all is same numeric type, determine that then numeric type corresponding to described n vector data is the numeric type of the view data of described preset data.

For example, if the numeric type of the view data that the data of described predetermined number are corresponding all is 6, the represented handwriting digital of view data corresponding to data that is described predetermined number all is 6, then, the numeric type of the view data to be identified that n vector data is corresponding is 6, and namely the represented handwriting digital of view data to be identified is 6.

The Handwritten Numeral Recognition Method that the application provides, handwriting digital is being carried out in the process of dimensionality reduction, come linear expression for each view data by K neighbour, weighting coefficient to each view data during by K neighbour's linear expression then quadrature matching algorithm obtains, and, come the training image data are carried out dimensionality reduction by the structure weighting coefficient matrix, treat recognition image and then carry out dimensionality reduction by the vector data behind weight vector and K neighbour's thereof the dimensionality reduction, by experiment as can be known, the Handwritten Numeral Recognition Method that the embodiment of the present application provides has improved the discrimination of Handwritten Digital Recognition.

Above-described embodiment, preferred, describedly obtain weighting coefficient by orthogonal matching pursuit algorithm Can comprise at least one times interative computation:

In interative computation each time, in K neighbour's vector data of described i vector data, select a vector data of not determining weights

So that

Minimum, thus determine

Be in the j time interative computation with selected vector data Corresponding weights; F=1,2 ..., u, u is default iterations.

After all interative computations are all finished, be that weights corresponding to selecteed vector data are 0 in the K of described i vector data neighbour's vector data.

Illustrate, suppose that i vector data has 3 neighbour's vector datas, is respectively

With

Default iterations is 2;

For the first time during iteration, by optimizing algorithm from

One of middle selection two norm operation values minimums supposes

Value is minimum, so, determines Corresponding weights are

Be convenient narration, will

Be designated as

I.e. selected vector data in the 1st interative computation.

For the second time during iteration, by optimizing algorithm from

One of middle selection two norm operation values minimums supposes Value is minimum, so, determines

Corresponding weights are

Iterations has reached default iterations, thereby determines that the value that remaining vector data is corresponding is 0, namely among 3 neighbours

Corresponding weights are 0.

Thereby concrete which kind of optimizing algorithm that adopts selects minimum definite weights to belong to the common practise of this area from above-mentioned two norm operation values, repeats no more here.

Again for example, suppose that i vector data has 3 neighbour's vector datas, is respectively

With

Default iterations is 1; So, only need to carry out an iteration just passable, concrete,

For the first time during iteration, by optimizing algorithm from

One of middle selection two norm operation values minimums supposes

Value is minimum, so, determines

Corresponding weights are

Be convenient narration, will Be designated as

I.e. selected vector data in the 1st interative computation.

After having carried out interative computation, iterations is the iterations preset of famous director, thereby determines that the weights that other two vector datas are corresponding all are 0, namely among 3 neighbours

Corresponding weights are 0, Corresponding weights also are 0.

Above-described embodiment, preferred, default iterations is 2, i.e. u=2 namely obtains the weighting coefficient of two vector datas in K neighbour's vector data by twice iteration, and the weighted value of other vector data is 0 in described K neighbour's vector data.

Above-described embodiment, preferred, for further facilitating the staff view data is carried out observation analysis, the dimension d value behind default dimensionality reduction is 2 or 3 o'clock, each vector data in the secondary vector data subset show described dimensionality reduction in d dimension coordinate system after.

Corresponding with embodiment of the method, the structural representation of a kind of Handwritten Numeral Recognition System that the embodiment of the present application provides comprises as shown in Figure 3:

Image data processing module 301, the first dimension-reduction treatment modules, 302, the second dimension-reduction treatment modules 303 and identification module 304; Wherein,

Image data processing module 301 is used for obtaining sets of image data, and described sets of image data comprises training image data subset and view data subset to be identified; Each view data in the described sets of image data is stretched, obtain the vector data set, described vector data set comprises the primary vector data subset corresponding with described training image data subset, and the secondary vector data subset corresponding with described view data subset to be identified; According to the distance between each vector data in i vector data and the described primary vector data subset in the vector data set, obtain K neighbour's vector data of described i vector data; With described K described i the vector data of neighbour's vector data linear expression:

X_{i} = Σ_{j = 1}^{K} w_{i}^{j} X_{i}^{j},

Wherein, X _iBe described i vector data; Be j vector data in described K neighbour's vector data;

For with described K neighbour's vector data in j vector data

Corresponding weighting coefficient,

Obtain by orthogonal matching pursuit algorithm;

The first dimension-reduction treatment module 302 is used for obtaining first weight vector corresponding with described i vector data

M element

Wherein,

(m=1,2 ..., M) be with described primary vector subset in m the weight vector that vector data is corresponding; Structural matrix Wherein I is unit matrix; To matrix M _TrainCarry out feature decomposition, obtain eigenwert, wherein, q eigenwert is λ _q, q eigenwert characteristic of correspondence vector is v _q, v _qColumn vector for M * 1 dimension; The size of eigenwert according to value sorted, obtain the 2nd to d+1 the corresponding proper vector of eigenwert according to order from small to large, and with the proper vector composition of vector data matrix Y that obtains _Train=[v ₂, v ₃..., v _D+1], wherein, d is the dimension behind the dimensionality reduction of presetting, m vector data X in the described primary vector data subset _mVector data behind the corresponding dimensionality reduction is x _mDescribed vector data matrix Y _TrainThe capable vector of m;

The second dimension-reduction treatment module 303 is used for obtaining second weight vector corresponding with described i vector data J element

x_{n} = Y_{test}^{n} \cdot W_{n}^{teT};

Identification module 304 is used for according to the distance between n vector data in any vector data of the primary vector data subset behind the dimensionality reduction and the secondary vector data subset behind the dimensionality reduction, obtains K neighbour's vector data of n vector data in the secondary vector data subset behind the dimensionality reduction; According to the numeric type of view data corresponding to K neighbour's vector data of described n vector data definite with described dimensionality reduction after the secondary vector data subset in the numeric type of view data to be identified corresponding to n vector data.

The below verifies explanation with instantiation to the application's scheme:

The application is at MATLAB(MATrix LABoratory, the matrix experiment chamber) tests in the software, verify with the view data in the MNIST handwritten form database, comprise ten handwriting digitals of 0-9 in the MNIST handwritten form database, have 60000 training samples (numeric type is known) and 10000 test sample books (numeric type is unknown), corresponding several training samples of each numeral and several test sample books.

In the present application example, each digital random is selected 200 training samples and 500 test sample books in MNIST handwritten form database, and therefore, one has 2000 training samples and 5000 test sample books.

Be convenient narration, the below is designated as { I with the sets of image data of 2000 training samples and 5000 test sample book compositions _i, wherein, I _i∈ R ^{M * n}I view data, the capable pixel count of m presentation video data, the row pixel count of n presentation video data, in the present application example, m=n=28.

At { I _iIn, front 2000 have label, namely

l _i∈ 1,2 ..., 10} is I _iCorresponding label is used for indicating I _iNumeric type, front 2000 composing training view data subsets

Rear 5000 without label, consist of the test pattern data subset In the present application example, the view data subset is dropped to 3 dimensions, the detailed process of present application example is as follows:

Obtain sets of image data { I _i;

With sets of image data { I _iIn each view data stretch, obtain the vector data set

Wherein, x _i∈ R ^{Mn * 1}To view data I _iThe lira of going is stretched acquisition.Wherein, comprise the primary vector data subset corresponding with the training image data subset in the vector data set

With the secondary vector data subset corresponding with data image subset to be tested

To primary vector data subset X _TrainIn each element x _i, according to x _iWith primary vector data subset X _TrainIn Euclidean distance between other element, determine and x _iK vector data of Euclidean distance minimum be x _iK neighbour, for convenient narration, this K vector data is designated as primary vector data subset X _TrainIn element x _iNeighbour's point set

In present application example, K=9.

Use neighbour's point set

Come the element x in the linear expression primary vector data subset _i, namely Wherein, Be

Weighting coefficient, obtain by the quadrature matching algorithm, concrete, comprise interative computation twice:

In first time interative computation, the vector data of selecting in K neighbour's vector data of described i vector data is So that

Minimum, thus determine with

Corresponding weights

In second time iteration budget, in remaining K-1 neighbour's vector data, select a vector data

So that

Minimum, thus determine with

Corresponding weights

The weights that a remaining K-2 vector data is corresponding all are 0.

Element x in acquisition and the primary vector data subset _iThe first corresponding weight vector In comprise 2000 elements,

J element

J element in the corresponding described primary vector data subset, Value as follows:

{W_{ij}}^{tr} = \{\begin{matrix} w_{i}^{j}, & x_{j} &Element; X_{train}^{i} \\ 0, & x_{j} &NotElement; X_{train}^{i} \end{matrix}

With primary vector data subset X _TrainIn the synthetic weighting coefficient matrix of weight vector corresponding to each element

Obviously, W _TrainIt is the matrix of 2000 * 2000 dimensions.

Structural matrix M _Train=(I-W _Train) ^T(I-W _Train), wherein I is unit matrix;

To M _TrainCarry out feature decomposition, making its j eigenwert is λ _j, j eigenwert characteristic of correspondence vector is v _j, v _jIn include 2000 elements, suppose arranged sequentially according to from small to large of eigenwert, the 2nd to the 4th eigenwert characteristic of correspondence vector formed primary vector data matrix Y behind the dimensionality reduction _Train=[v ₂, v ₃, v ₄], obviously, Y _TrainIt is the matrix of 2000 * 3 dimensions.

Element x in the primary vector data subset _iBe Y _TrainThe capable vector of i.

The process of the training image data subset being carried out dimensionality reduction more than has been described, has the following describes the process that the recognition image data subset carries out dimensionality reduction for the treatment of:

To secondary vector data subset X _TestIn each element x _i, according to x _iWith primary vector data subset X _TrainIn Euclidean distance between each element, determine and x _i9 vector datas of Euclidean distance minimum be x _i9 neighbours, for convenient narration, these 9 vector datas are designated as secondary vector data subset X _TestMiddle element x _iNeighbour's point set

X_{train}^{i} = {x_{i}^{1}, x_{i}^{2}, \cdot \cdot \cdot, x_{i}^{9}} .

Use neighbour's point set

Come the element x in the linear expression secondary vector data subset _i, namely

Wherein,

Be Weighting coefficient, obtain by the quadrature matching algorithm, specifically can referring to preceding method, repeat no more here.

Element x in acquisition and the secondary vector data subset _iThe second corresponding weight vector

In comprise 9 elements,

J element

Element x in the corresponding described secondary vector data subset _i9 neighbour's vector datas in j vector data,

According to the second weight vector

And the element x in the described secondary vector data subset _iDimensionality reduction corresponding to 9 neighbour's vector datas after the vector data collection

Obtain the element x in the described secondary vector data subset _iDimensionality reduction after vector data x _i, be specially:

x_{i} = Y_{test}^{i} \cdot W_{i}^{teT}

Wherein, T represents the transposition computing.

Behind all view data dimensionality reductions, according to the Euclidean distance between n vector data in any vector data in the primary vector data subset behind the dimensionality reduction and the secondary vector data subset behind the dimensionality reduction, obtain 9 neighbour's vector datas of n vector data in the secondary vector data subset behind the dimensionality reduction;

According to the numeric type of view data corresponding to 9 neighbour's vector datas of described n vector data definite with described dimensionality reduction after the secondary vector data subset in the numeric type of view data to be identified corresponding to n vector data.

The below compares to the handwriting digital discrimination of identifying and the discrimination that utilizes the digit recognition method that embeds based on linearizing local linear to the Handwritten Numeral Recognition Method of utilizing the application and providing, see table 1 for details, can find, the Handwritten Numeral Recognition Method that the embodiment of the present application provides, in the situation that identification is substantially constant several times, improved the discrimination of Handwritten Digital Recognition.

To the above-mentioned explanation of the disclosed embodiments, make this area professional and technical personnel can realize or use the present invention.Multiple modification to these embodiment will be apparent concerning those skilled in the art, and General Principle as defined herein can be in the situation that do not break away from the spirit or scope of the present invention, in other embodiments realization.Therefore, the present invention will can not be restricted to these embodiment shown in this article, but will meet the widest scope consistent with principle disclosed herein and features of novelty.

The contrast of table 1 discrimination

Claims

1. a Handwritten Numeral Recognition Method is characterized in that, comprising:

X_{i} = Σ_{j = 1}^{K} w_{i}^{j} X_{i}^{j},

For with described K neighbour's vector data in j vector data Corresponding weighting coefficient,

Obtain by orthogonal matching pursuit algorithm;

When described i vector data belongs to described primary vector data subset, obtain first weight vector corresponding with described i vector data M element M vector data in the corresponding described primary vector data subset, wherein, when m vector data in the described primary vector data subset is j vector data in K neighbour's vector data of described i vector data,

W_{im}^{tr} = 0;

Obtain weighting coefficient matrix

W_{train}^{tr} = [W_{1}^{tr}; W_{2}^{tr}; \cdot \cdot \cdot W_{M}^{tr}],

Wherein,

Structural matrix

M_{train} = {(I - W_{train}^{tr})}^{T} (I - W_{train}^{tr}),

Wherein I is unit matrix;

When described i vector data belongs to described secondary vector data subset, obtain second weight vector corresponding with described i vector data J element

x_{n} = Y_{test}^{n} \cdot W_{n}^{teT};

2. method according to claim 1, it is characterized in that, described according to the distance between each vector data in i vector data and the described primary vector data subset in the vector data set, K the neighbour's vector data that obtains described i vector data comprises:

3. method according to claim 1 is characterized in that, describedly obtains by orthogonal matching pursuit algorithm