Embodiment
The inventor has structural risk minimization and good advantages such as generalization ability through discovering algorithm of support vector machine; Adopt algorithm of support vector machine to realize Handwritten Digital Recognition based on the Lie group structured data; Can solve the identification problem of handwriting digital under small sample, non-linear and high dimensional pattern, thereby solve the defective that existing Handwritten Digital Recognition technology exists on the nonlinear characteristic of catching the Lie group structured data.But the inventor also finds because the Lie group structured data is matrix data rather than vector data through research; The not processing of support matrix data of algorithm of support vector machine of standard application at present, therefore the SVMs method of standard application also can't be handled the Lie group structured data at present.
The inventor through research further after discovery can pass through the structural matrix gaussian kernel function, utilize algorithm of support vector machine, set up respective classified device model, the Lie group structured data is carried out classification processing, and then realizes the object of the invention.
In conjunction with the invention described above thought; To combine the accompanying drawing in the embodiment of the invention below, the technical scheme in the embodiment of the invention carried out clear, intactly description, obviously; Described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills are not making the every other embodiment that is obtained under the creative work prerequisite, all belong to the scope of the present invention's protection.
Fig. 1 is the process flow diagram of a kind of Handwritten Numeral Recognition Method based on the Lie group structured data of the present invention.With reference to Fig. 1, this method can comprise:
Step S100, from original handwriting digital view data, extract the Lie group structured data of respective amount;
For ease of describing; Represent the handwriting digital view data with x in the embodiment of the invention, represent the Lie group structured data, and the number of establishing original handwriting digital view data x is that (l is an integer to l with z; And l>=1), then original handwriting digital view data x is respectively x
1... x
l, the Lie group structured data of respective amount is z
1... z
l
With a handwriting digital image data extraction Lie group structured data is example, and establishing rv is reference vector, on the stroke zone of x view data, gets k point at random, constitutes k vector
I=1 ..., k, r
iThe mould of expression vector is long, n
iBe v
iDirection, and ‖ n
i‖=1=M
i* rv,
Wherein ‖ ‖ representes that this vectorial mould is long, then can obtain the pairing Lie group structured data of handwriting digital view data sample x z to be:
The corresponding relation of the class label of step S200, handwriting digital view data that the Lie group structured data is corresponding with it is as training sample; Obtain gathering with the Lie group structured data corresponding training sample of said respective amount, structure is handled the matrix gaussian kernel function of Lie group structured data simultaneously;
If y is the class label of handwriting digital view data x, y ∈ 1 ... c}, c are the classification number of handwriting digital view data x, then with the Lie group structured data z that extracts
1... z
lClass label y with corresponding handwriting digital view data
1... y
lMake up, can obtain comprising training sample the set { (z of x and y corresponding relation
1, y
1) ... (z
l, y
l);
The algorithm of support vector machine of standard application is not supported the processing of Lie group structured data at present; The kernel function of therefore existing algorithm of support vector machine is for the present invention and incompatible; For solving the application of algorithm of support vector machine to the Lie group structured data; Can make up the matrix gaussian kernel function of algorithm of support vector machine, make the Lie group structured data compatible mutually with algorithm of support vector machine, the concrete formula of matrix gaussian kernel function is following:
Said z
aAnd z
bRepresent any two Lie group structured datas, a, b are integer, equal ∈ 1 ... l}, and a ≠ b, p>0 is a kernel function, ‖ ‖
FBe matrix norm.
Step S300, utilizing algorithm of support vector machine, is kernel function with said matrix gaussian kernel function, the input training sample, and training obtains sorter model;
Those skilled in the art can know and utilize algorithm of support vector machine, carry out the conventional method of machine training; And the embodiment of the invention to provide the machine training method that a kind of many classification problems of supporting the Lie group structured data are handled; This training method is specially: said each Lie group structured data is input to said c respectively gets in several sorter models of combination of 2; Lie group structured data obtains corresponding c and gets several sorters output of combination result of 2; Add up this Lie group structured data among the said output result and be divided in the c class a certain type value; And maximizing therefrom, said maximal value is confirmed as the digital classification of the corresponding handwriting digital view data of this Lie group structured data.
Machine training method for the ease of to the embodiment of the invention is carried out detail knowledge, and hereinafter will provide concrete training process.
Training sample set { (z
1, y
1) ... (z
l, y
l) the classification number be c, therefrom appoint and get two types of class label corresponding sample, the sample that is promptly taken out includes only this two types of labels; And the label of the sample in the training sample that the takes out set be these two types of labels, is that a combination can obtain c and gets several combinations of combination of 2 with two types of class label corresponding sample, is convenient the statement; An existing combination i who gets with c in several combinations of combination of 2, j (i, the equal ∈ { 1 of j; ... c}; And i ≠ j) two types of label corresponding sample are example, and the training process of sorter model is described, concrete training process is:
From training sample set { (z
1, y
1) ... (z
l, y
l) in extract i, behind two types of samples of j, with said i, two types of samples of j carry out form optimization and can get: the order
Wherein, subscript ij representes and i, two types of relevant data messages of j, and subscript m is represented an index,
Represent i, two types of relevant Lie group structured datas of j, l
IjExpression i, the sample sum that j is two types,
For
The corresponding class label, and work as
Then
When
Then
The present invention is based on the Lie group structured data, use the SVMs method to discern the view data of handwriting digital, using the SVMs method to handle the i of handwriting digital view data so, two types of branch time-likes of j, then need find the solution following optimization problem:
M wherein, n all representes an index,
For
The corresponding class label,
M, n is integer, and m, the equal ∈ of n 1 ... l
Ij,
For the algorithm of support vector machine training produces model coefficient, S is the regular parameter of algorithm of support vector machine training need, produces following sorter model according to above-mentioned optimization training:
I, j all=1 ... c, and i ≠ j;
Sgn () expression sign function in the following formula, b
IjBe model threshold, can be by the computes gained:
Z wherein
SvCorresponding coefficient value does
The above-mentioned i that drawn, the corresponding sorter model of two types of samples of j, if also there are all the other combinations that from training sample, extract, then the principle of all the other combined training sorter models is identical therewith, can contrast each other, repeats no more here.
Step S400, Lie group structured data that handwriting digital view data to be measured is corresponding are input to respectively in the sorter model that training obtains, and obtain corresponding digital classification.
After step S300 obtains sorter model, can from handwriting digital view data to be measured, extract corresponding Lie group structured data,, obtain corresponding digital classification so that handwriting digital view data to be measured is classified.Need to prove that the purposes of the original handwriting digital view data among the step S100 is the training classifier model, it can think a huge handwriting digital image data base; And handwriting digital view data to be measured among the step S400 is the identifying object of Handwritten Numeral Recognition Method of the present invention, the view data of the handwriting digital of discerning for needs.
The sorter model that the embodiment of the invention trained can be handled many classification problems of Lie group structured data; On concrete Classification and Identification; Can carry out according to following manner: the Lie group structured data that handwriting digital view data to be measured is corresponding is input to said c respectively and gets in several sorter models of combination of 2; Lie group structured data obtains corresponding c and gets several sorters output of combination result of 2; Add up this Lie group structured data among the said output result and be divided in the c class a certain type value, and maximizing therefrom, said maximal value is confirmed as the digital classification of the corresponding handwriting digital view data of this Lie group structured data;
The available formula f of said sorter output result
Ij(z) expression, i, j=1 ... c, and i ≠ j, concrete in the time will adding up this Lie group structured data and be divided into the value of i class, can carry out through following formula:
This Lie group structured data that can obtain being added up through following formula is divided into c value of i class; Through formula
maximizing from this c value, the pairing digital classification of the maximal value that is searched out is exactly the classification of the pairing handwriting digital view data of this Lie group structured data.
The embodiment of the invention is through the structural matrix gaussian kernel function; Utilize algorithm of support vector machine to handle the corresponding Lie group structured data of handwriting digital view data; Advantage by algorithm of support vector machine identification small sample, non-linear and high dimensional pattern hypograph has realized that the nonlinear characteristic of Lie group structured data is caught;
Secondly, be reduced to a plurality of two classification problems through many classification problems, and carry out classification processing, realized that many classification problems of Lie group structured data are handled, thereby better realized Handwritten Digital Recognition according to algorithm of support vector machine with the Lie group structured data.
The beneficial effect that can bring through following experimental verification the present invention below:
The classification number of the general storable handwriting digital view data of handwriting digital database is 10 types; Existing four types of selecting wherein; Obtaining numeral 1,2,7 and 9 experimentizes; From training set and test set, get preceding 200 respectively for every type, promptly every type all has 200 training samples and test sample book.On training sample, select parameter with ten times of cross validations then, wherein the span of regular factor is: { 2
-1, 2
0... 2
4, matrix gaussian kernel parameter value scope is { 2
-10, 2
-9... 2
-6.Use the parameter of select then and come to train again a model, obtain discrimination at test set estimated performance.Further, can consider the influence of a cloud quantity to discrimination, the value set of some cloud number is for { 5,10,15,20,25,30,35,40,50} puts cloud and produces at random, can repeat 5 experiments, provides an average result.Fig. 2 to Fig. 4 shows and adopts Lie group mean algorithm, Lie group Fisher algorithm and the technical scheme of the present invention resulting experimental result that experimentizes.
Fig. 2 is the Lie group mean algorithm, and Lie group Fisher algorithm and the inventive method are to the comparison diagram of the classification performance of numeral 1 and 7.With reference to Fig. 2, can find obviously to be superior to Lie group mean algorithm and Lie group Fisher algorithm, and discrimination is got increasing of counting out along with each sample and is presented increase tendency based on classifying quality of the present invention.Fig. 3 is that Lie group mean algorithm and the inventive method are to numeral 1; The comparison diagram of 7 and 9 classification performance; Fig. 4 is Lie group mean algorithm and the inventive method comparison diagram to the classification performance of numeral 1,2,7 and 9; With reference to Fig. 3 and Fig. 4, can find out that the many classifying qualities of the present invention obviously are superior to the Lie group mean algorithm.
Fig. 5 is the structured flowchart of a kind of Handwritten Digital Recognition system based on the Lie group structured data of the present invention.With reference to Fig. 5, this system can comprise:
Lie group structured data extraction module 100 is used for from the Lie group structured data of original handwriting digital view data extraction respective amount;
Pre-processing module 200; The corresponding relation of class label that is used for the handwriting digital view data that the Lie group structured data is corresponding with it is as training sample; Obtain Lie group structured data corresponding training sample set with said respective amount; Simultaneously, structure is handled the matrix gaussian kernel function of Lie group structured data:
Said z
aAnd z
bRepresent any two Lie group structured datas, and a ≠ b, p>0 is a kernel function, ‖ ‖
FBe matrix norm;
Model training module 300 is used to utilize algorithm of support vector machine, is kernel function with said matrix gaussian kernel function, the input training sample, and training obtains sorter model;
Sort module 400 is used for the Lie group structured data that handwriting digital view data to be measured is corresponding, is input to respectively and trains in the sorter model that obtains, and obtains corresponding digital classification.
Wherein, the structure of model training module 300 can be as shown in Figure 6, comprising:
Combination acquiring unit 301 is used for appointing from the set of said training sample and gets two types of class label corresponding sample, obtains c and gets several combinations of combination of 2, and c is the classification number of handwriting digital view data;
Circuit training unit 302 is used for each unit of being combined as, and utilizes algorithm of support vector machine respectively, is kernel function with said matrix gaussian kernel function, and corresponding sample is respectively made up in input, and training obtains c and gets several sorter models of combination of 2.
Further, circuit training unit 302 can comprise
Training subelement (not shown) is used for extraction and comprises i, the combination of two types of samples of j, and i, the equal ∈ of j 1 ... c}, and i ≠ j, the flow process of execution training classifier model: order
L representes the number of handwriting digital view data, and x representes the handwriting digital view data, and z representes the Lie group structured data, and y is the class label of handwriting digital view data x; Y ∈ 1 ... and c}, subscript ij representes and i; Two types of relevant data messages of j, subscript m is represented an index
Represent i, two types of relevant Lie group structured datas of j, l
IjExpression i, the sample sum that j is two types,
For
The corresponding class label, and work as
Then
When
Then
And find the solution
M, n all represent an index,
M, n is integer, and m, the equal ∈ of n 1 ... l
Ij,
For the algorithm of support vector machine training produces model coefficient, S is the regular parameter of algorithm of support vector machine training need, obtains sorter model according to above-mentioned solving result
Sgn () representes sign function, b
IjIt is model threshold;
Circulation subelement (not shown) is used for after said training subelement is accomplished the flow process of above-mentioned training classifier model, extracting another combination, carries out above-mentioned sorter model training flow process again, gets several sorter models of combination of 2 until obtaining c.
The structure of sort module 400 can be as shown in Figure 7, comprising:
Computing unit 401 is used for the Lie group structured data that handwriting digital view data to be measured is corresponding and is input to said c respectively and gets several sorter models of combination of 2, and Lie group structured data obtains corresponding c and gets several sorters output of combination result of 2;
Statistic unit 402, be used for adding up said output as a result this Lie group structured data be divided in the c class a certain type value, and maximizing therefrom;
Confirm unit 403, be used for the maximal value that said statistic unit searches out is confirmed as the digital classification of the corresponding handwriting digital view data of this Lie group structured data.
Further, statistic unit 402 can comprise:
Class primary system meter subelement (not shown); Be used for according to formula
i ∈ { 1; ... c} adds up the value that this Lie group structured data among the said output result is divided into the i class, said i class be hypothesis in the c class that will add up a certain type;
Maximal value is searched subelement (not shown), is used for according to formula:
Maximal value in the numerical value of the said statistics subelement statistics of
searching.
The present invention is based on the Handwritten Digital Recognition system of Lie group structured data, corresponding each other with the Handwritten Numeral Recognition Method based on the Lie group structured data, the system concrete function is realized and can referring to the method for correspondence no longer be repeated no more here.
The professional can also further recognize; The unit and the algorithm steps of each example of describing in conjunction with embodiment disclosed herein; Can realize with electronic hardware, computer software or the combination of the two; For the interchangeability of hardware and software clearly is described, the composition and the step of each example described prevailingly according to function in above-mentioned explanation.These functions still are that software mode is carried out with hardware actually, depend on the application-specific and the design constraint of technical scheme.The professional and technical personnel can use distinct methods to realize described function to each certain applications, but this realization should not thought and exceeds scope of the present invention.
The method of describing in conjunction with embodiment disclosed herein or the step of algorithm can be directly with the software modules of hardware, processor execution, and perhaps the combination of the two is implemented.Software module can place the storage medium of any other form known in random access memory (RAM), internal memory, ROM (read-only memory) (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or the technical field.
To the above-mentioned explanation of the disclosed embodiments, make this area professional and technical personnel can realize or use the present invention.Multiple modification to these embodiment will be conspicuous concerning those skilled in the art, and defined General Principle can realize under the situation that does not break away from the spirit or scope of the present invention in other embodiments among this paper.Therefore, the present invention will can not be restricted to these embodiment shown in this paper, but will meet and principle disclosed herein and features of novelty the wideest corresponding to scope.