Embodiment
Inventor finds that by research algorithm of support vector machine has the advantages such as structural risk minimization and good generalization ability, adopt the Handwritten Digital Recognition of implement the algorithm of support vector machine based on Lie group structured data, can solve the identification problem of handwriting digital under small sample, non-linear and high dimensional pattern, catch thereby solve existing Handwritten Digital Recognition technology the defect existing in the nonlinear characteristic of Lie group structured data.But inventor also finds because Lie group structured data is matrix data instead of vector data by research, the not processing of support matrix data of algorithm of support vector machine of standard application at present, therefore the support vector machine method of standard application also cannot be processed Lie group structured data at present.
Inventor can pass through structural matrix gaussian kernel function by discovery after research further, utilizes algorithm of support vector machine, sets up corresponding sorter model, to the processing of classifying of Lie group structured data, and then realizes object of the present invention.
In conjunction with the invention described above thought, below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiment.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtaining under creative work prerequisite, belong to the scope of protection of the invention.
Fig. 1 is the process flow diagram of a kind of Handwritten Numeral Recognition Method based on Lie group structured data of the present invention.With reference to Fig. 1, the method can comprise:
Step S100, from original handwriting digital view data, extract the Lie group structured data of respective amount;
For ease of describing, in the embodiment of the present invention, represent handwriting digital view data with x, represent Lie group structured data with z, and the number of establishing original handwriting digital view data x is that (l is integer to l, and l>=1), original handwriting digital view data x is respectively x
1... x
l, the Lie group structured data of respective amount is z
1... z
l;
Taking a handwriting digital image data extraction Lie group structured data as example, establishing rv is reference vector, gets at random k point on the stroke region of x view data, forms k vector
R
irepresent that vectorial mould is long, n
iv
idirection, and ‖ n
i‖=1=M
i× rv,
wherein ‖ ‖ represents that this vectorial mould is long, can obtain the corresponding Lie group structured data of handwriting digital view data sample x z to be:
The corresponding relation of the class label of step S200, handwriting digital view data that Lie group structured data is corresponding with it is as training sample, obtain the training sample set corresponding with the Lie group structured data of described respective amount, structure is processed the matrix gaussian kernel function of Lie group structured data simultaneously;
If y is the class label of handwriting digital view data x, y ∈ 1 ... c}, c is the classification number of handwriting digital view data x, by the Lie group structured data z extracting
1... z
lclass label y with corresponding handwriting digital view data
1... y
lcombine, can obtain training sample the set { (z that comprises x and y corresponding relation
1, y
1) ... (z
l, y
l);
The algorithm of support vector machine of standard application is not supported the processing of Lie group structured data at present, therefore the kernel function of existing algorithm of support vector machine is for the present invention inadaptable, for solving the application of algorithm of support vector machine to Lie group structured data, can build the matrix gaussian kernel function of algorithm of support vector machine, make Lie group structured data and algorithm of support vector machine compatible mutually, the concrete formula of matrix gaussian kernel function is as follows:
Described z
aand z
brepresent any two Lie group structured datas, a, b is integer, equal ∈ 1 ... l}, and a ≠ b, p > 0 is kernel function, ‖ ‖
ffor matrix norm.
Step S300, utilize algorithm of support vector machine, taking described matrix gaussian kernel function as kernel function, input training sample, training obtains sorter model;
Those skilled in the art can know and utilize algorithm of support vector machine, carry out the conventional method of machine training, and the embodiment of the present invention will provide a kind of machine training method of many classification problems processing of supporting Lie group structured data, this training method is specially: described each Lie group structured data is input to respectively to described c and gets in 2 several sorter models of combination, a Lie group structured data obtains corresponding c and gets 2 the several sorter Output rusults of combination, add up this Lie group structured data in described Output rusults and be divided into the value of a certain class in c class, and maximizing therefrom, described maximal value is defined as to the digital classification of the handwriting digital view data that this Lie group structured data is corresponding.
Understand in detail for the ease of the machine training method to the embodiment of the present invention, below will provide concrete training process.
Training sample set { (z
1, y
1) ... (z
l, y
l) classification number be c, therefrom appoint and get sample corresponding to two class class labels, the sample that taken out only includes this two classes label, and the label of the sample in the training sample set of not taking out is not this two classes label, can obtain c taking sample corresponding to two class class labels as a combination and get 2 several combinations of combination, for convenience of statement, now get a combination i in 2 several combinations of combination with c, j(i, the equal ∈ of j 1 ... c}, and the sample that i ≠ j) two class labels are corresponding is example, the training process of sorter model is described, concrete training process is:
From training sample set { (z
1, y
1) ... (z
l, y
l) in extract to obtain i, after j two class samples, by described i, j two class samples carry out form optimization and can obtain: order
wherein, subscript ij represents and i, the data message that j two classes are relevant, and subscript m represents an index,
represent i, the j two classes Lie group structured data of being correlated with, l
ijrepresent i, the sample sum of j two classes,
for
corresponding class label, and work as
?
When
?
The present invention is based on Lie group structured data, identify the view data of handwriting digital by support vector machine method, using support vector machine method to process the i of handwriting digital view data so, when j two class classification, need to solve following optimization problem:
Wherein m, n all represents an index,
for
corresponding class label,
m, n is integer, and m, the equal ∈ of n 1 ... l
ij,
for algorithm of support vector machine training production model coefficient, S is the regular parameter of algorithm of support vector machine training need, produces following sorter model according to above-mentioned optimization training:
I, j all=1 ... c, and i ≠ j;
Sgn(in above formula) expression sign function, b
ijbe model threshold, can calculate gained by following formula:
Wherein z
svcorresponding coefficient value is
The above-mentioned i that drawn, sorter model corresponding to j two class sample, all the other combinations that extract from training sample if also exist, the principle of all the other combined training sorter models is identical therewith, can mutually contrast, and repeats no more herein.
Step S400, by Lie group structured data corresponding handwriting digital view data to be measured, be input to respectively in the sorter model that obtains of training, obtain corresponding digital classification.
Step S300 obtains after sorter model, can from handwriting digital view data to be measured, extract corresponding Lie group structured data, to handwriting digital view data to be measured is classified, obtains corresponding digital classification.It should be noted that, the purposes of the original handwriting digital view data in step S100 is training classifier model, and it can think a huge handwriting digital image data base; And handwriting digital view data to be measured in step S400 is the identifying object of Handwritten Numeral Recognition Method of the present invention, the view data of the handwriting digital of identifying for needs.
The sorter model that the embodiment of the present invention trains can be processed many classification problems of Lie group structured data, in concrete Classification and Identification, can carry out according to following manner: Lie group structured data corresponding handwriting digital view data to be measured is input to respectively to described c and gets in 2 several sorter models of combination, a Lie group structured data obtains corresponding c and gets 2 the several sorter Output rusults of combination, add up this Lie group structured data in described Output rusults and be divided into the value of a certain class in c class, and maximizing therefrom, described maximal value is defined as to the digital classification of the handwriting digital view data that this Lie group structured data is corresponding,
Described sorter Output rusults can be used formula f
ij(z) represent, i, j=1 ... c, and i ≠ j, concrete in the time will adding up this Lie group structured data and be divided into the value of i class, can be undertaken by following formula:
Can obtain by above formula c the value that this added up Lie group structured data is divided into i class, pass through formula
maximizing from this c value, the corresponding digital classification of maximal value searching out is exactly the classification of the corresponding handwriting digital view data of this Lie group structured data.
The embodiment of the present invention is by structural matrix gaussian kernel function, utilize algorithm of support vector machine to process Lie group structured data corresponding to handwriting digital view data, by the advantage of algorithm of support vector machine identification small sample, non-linear and high dimensional pattern hypograph, realize the nonlinear characteristic of Lie group structured data and caught;
Secondly, by many classification problems of Lie group structured data are reduced to multiple two classification problems, and according to the algorithm of support vector machine processing of classifying, realized many classification problems processing of Lie group structured data, thereby better realized Handwritten Digital Recognition.
The beneficial effect that can bring below by following experimental verification the present invention:
The classification number of the general storable handwriting digital view data of handwriting digital database is 10 classes, now select four classes wherein, obtain numeral 1,2,7 and 9 test, every class is got first 200 respectively from training set and test set, and every class all has 200 training samples and test sample book.Then on training sample, select parameter with ten times of cross validations, wherein the span of regular factor is: { 2
-1, 2
0... 2
4, matrix gaussian kernel parameter value scope is { 2
-10, 2
-9... 2
-6.Then apply the parameter of select and again train a model, obtain discrimination at test set estimated performance.Further, can consider the impact of cloud quantity on discrimination, the value set of some cloud number is that { 5,10,15,20,25,30,35,40,50}, some cloud is random generation, can repeat 5 experiments, provides an average result.Fig. 2 to Fig. 4 shows and adopts Lie group mean algorithm, and Lie group Fisher algorithm and technical solution of the present invention are tested obtained experimental result.
Fig. 2 is Lie group mean algorithm, the comparison diagram of Lie group Fisher algorithm and the inventive method classification performance to numeral 1 and 7.With reference to Fig. 2, can find to be obviously better than Lie group mean algorithm and Lie group Fisher algorithm based on classifying quality of the present invention, and discrimination is along with each sample is got increasing and presenting increase tendency of counting out.Fig. 3 is that Lie group mean algorithm and the inventive method are to numeral 1, the comparison diagram of 7 and 9 classification performance, Fig. 4 is that Lie group mean algorithm and the inventive method are to numeral 1,2, the comparison diagram of 7 and 9 classification performance, with reference to Fig. 3 and Fig. 4, can find out that the many classifying qualities of the present invention are obviously better than Lie group mean algorithm.
Fig. 5 is the structured flowchart of a kind of Handwritten Numeral Recognition System based on Lie group structured data of the present invention.With reference to Fig. 5, this system can comprise:
Lie group structured data extraction module 100, extracts the Lie group structured data of respective amount for the handwriting digital view data from original;
Pretreatment module 200, for the corresponding relation of the class label of handwriting digital view data that Lie group structured data is corresponding with it as training sample, obtain the training sample set corresponding with the Lie group structured data of described respective amount, meanwhile, structure is processed the matrix gaussian kernel function of Lie group structured data:
Described z
aand z
brepresent any two Lie group structured datas, and a ≠ b, p > 0 is kernel function, ‖ ‖
ffor matrix norm;
Model training module 300, for utilizing algorithm of support vector machine, taking described matrix gaussian kernel function as kernel function, input training sample, training obtains sorter model;
Sort module 400, for by Lie group structured data corresponding handwriting digital view data to be measured, is input to respectively and trains in the sorter model obtaining, and obtains corresponding digital classification.
Wherein, the structure of model training module 300 can as shown in Figure 6, comprise:
Combination acquiring unit 301, for appointing and get sample corresponding to two class class labels from described training sample set, obtains c and gets 2 several combinations of combination, and c is the classification number of handwriting digital view data;
Circuit training unit 302, for each unit of being combined as, utilizes respectively algorithm of support vector machine, taking described matrix gaussian kernel function as kernel function, inputs sample corresponding to each combination, and training obtains c and gets 2 the several sorter models of combination.
Further, circuit training unit 302 can comprise
Training subelement (not shown), comprises i for extracting, the combination of j two class samples, and i, the equal ∈ of j 1 ... c}, and i ≠ j, the flow process of execution training classifier model: order
l represents the number of handwriting digital view data, and x represents handwriting digital view data, and z represents Lie group structured data, and y is the class label of handwriting digital view data x, y ∈ 1 ... c}, subscript ij represents and i, the data message that j two classes are relevant, subscript m represents an index
represent i, the j two classes Lie group structured data of being correlated with, l
ijrepresent i, the sample sum of j two classes,
for
corresponding class label, and work as
?
When
?
And solve
m, n all represents an index,
m, n is integer, and m, the equal ∈ of n 1 ... l
ij,
for algorithm of support vector machine training production model coefficient, S is the regular parameter of algorithm of support vector machine training need, obtains sorter model according to above-mentioned solving result
Sgn() represent sign function, b
ijit is model threshold;
Circulation subelement (not shown), for completing after the flow process of above-mentioned training classifier model at described training subelement, extracts another combination, then carries out above-mentioned sorter model training flow process, gets 2 the several sorter models of combination until obtain c.
The structure of sort module 400 can as shown in Figure 7, comprise:
Computing unit 401, gets 2 the several sorter models of combination for Lie group structured data corresponding handwriting digital view data to be measured being input to respectively to described c, and a Lie group structured data obtains corresponding c and gets 2 the several sorter Output rusults of combination;
Statistic unit 402, is divided into the value of a certain class in c class for adding up this Lie group structured data of described Output rusults, and maximizing therefrom;
Determining unit 403, is defined as the digital classification of the handwriting digital view data that this Lie group structured data is corresponding for the maximal value that described statistic unit is searched out.
Further, statistic unit 402 can comprise:
Class Data-Statistics subelement (not shown), for according to formula
i ∈ 1 ... c} adds up the value that this Lie group structured data in described Output rusults is divided into i class, a certain class in the c class that will add up that described i class is hypothesis;
Maximal value is searched subelement (not shown), for according to formula:
find the maximal value in the numerical value of described statistics subelement statistics.
The present invention is based on the Handwritten Numeral Recognition System of Lie group structured data, mutually corresponding with the Handwritten Numeral Recognition Method based on Lie group structured data, system concrete function is realized and can referring to corresponding method no longer, be repeated no more herein.
Professional can also further recognize, unit and the algorithm steps of each example of describing in conjunction with embodiment disclosed herein, can realize with electronic hardware, computer software or the combination of the two, for the interchangeability of hardware and software is clearly described, composition and the step of each example described according to function in the above description in general manner.These functions are carried out with hardware or software mode actually, depend on application-specific and the design constraint of technical scheme.Professional and technical personnel can realize described function with distinct methods to each specifically should being used for, but this realization should not thought and exceeds scope of the present invention.
The software module that the method for describing in conjunction with embodiment disclosed herein or the step of algorithm can directly use hardware, processor to carry out, or the combination of the two is implemented.Software module can be placed in the storage medium of any other form known in random access memory (RAM), internal memory, ROM (read-only memory) (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technical field.
To the above-mentioned explanation of the disclosed embodiments, make professional and technical personnel in the field can realize or use the present invention.To be apparent for those skilled in the art to the multiple amendment of these embodiment, General Principle as defined herein can, in the situation that not departing from the spirit or scope of the present invention, realize in other embodiments.Therefore, the present invention will can not be restricted to these embodiment shown in this article, but will meet the widest scope consistent with principle disclosed herein and features of novelty.