Sketch recognition method based on character string nuclear
Technical field
The present invention relates to cartographical sketching identification based on kernel method.
Background technology
Background technology involved in the present invention comprises following three aspects:
One, cartographical sketching identification
Cartographical sketching identification is the fuzzy sketch that pen type obtains alternately to be expressed map to accurate avatars; Promptly from the man-machine interaction process, excavate the sketch shape constraining in the ever-increasing sketch information; On the basis of understanding the original intention of user, freely, irregular sketch recognition become rule, geometric configuration accurately.
The at present main pattern recognition method that has three kinds of modes is promptly based on the method for stroke representation; The method of representing based on pels such as straight line, arc, curves; Pattern recognition method based on geometric properties.
1, based on the method for expressing of gesticulating
Generally, the sketch that obtains through the pen type interactive mode of sketch recognition system shows as some strokes that are made up of the sampling spot sequence of user between starting to write and starting writing.Stroke is the configuration information of figure fractal key.It is symbol that stroke is looked by early stage system, gives its specific implication, the identification of stroke is equal to the identification of figure.In the one stroke identification, comparatively classical is: the Rubine method, and this method is simply effective, but it requires the user to sketch the contours figure with fixed mode stroke, and figure constitutes fairly simple.
2, based on the method for expressing of pel
Represent that based on pel with diagrammatic representation be the combinations of pels such as straight line, arc, curve under the certain space relation.Based on the method for pel comprise usually cut apart, step such as match, regular, identification.The sketch that the user draws can split into pel and relation thereof automatically, and the training of graphics template does not need user intervention, the ATL easy expansion, and adaptability is strong.But train template inevitably and all oneself attributes of definition of graphics test have been increased system overhead.
3, based on the identification of geometric properties
Method based on geometric properties is directly treated as recognition unit with figure, and the geometric properties that directly extracts sketch is used for classification.Usually its extraction that will cut apart the different patterns geometric properties in some way earlier comes down to stroke information dimensionality reduction; But because the complicacy of figure; Be difficult to the fixing complete information of the geometric properties reservation stroke of dimension, partial information has been lost in the dimensionality reduction process.Therefore, the geometric properties of the dissimilar figure of the not higher vision of accuracy of the method for geometric properties identification expresses possibility close.
Two, character string nuclear (Sting Kernel)
Kernel method is represented one group of relevant machine learning and data mining algorithm.The key components of kernel method are kernel functions, and kernel function can be measured the similarity of input data.Based on these kernel functions, can pass through SVMs (SVM) and accomplish tasks such as classification, recurrence.
Character string is endorsed and handled the input data type is the data of character string, through the similarity of two input of character string of character string kernel function tolerance.
Character string endorse be divided into multiple, for example: spectrum-similarly, calculate the character string nuclear of the public substring of two input of character string, based on the character string nuclear of comparison, the character string nuclear that obtains by probability model.
Character string nuclear has been applied to fields such as protein homology detection, text classification.
Three, SVMs (Support Vector Machine, SVM)
(Support Vector Machine SVM) is the famous system based on kernel method to SVMs.
(support vector machine SVM) solves the new tool of machine learning problem to SVMs by means of optimization method.It is introduced in the meeting of computer learning in 1992 theory and is got into the machine learning field by Vapnik and co-worker's invention thereof, has received afterwards widely and having paid close attention to.All obtain breakthrough aspect its theoretical research and the algorithm realization in recent years, and becoming the powerful measure that overcomes " dimension disaster " and tradition difficulties such as " crossing study ".The object that the theoretical system of SVM contains is very extensive, like dual representation, feature space, the theories of learning, optimum theory and algorithm etc.SVM has obtained reasonable application in fields such as text classification, handwriting recognition, image classification, bioinformatics.
This algorithm becomes sky at set L, stops when perhaps λ is enough little.
Summary of the invention
Based on above-mentioned prior art; The present invention proposes a kind of Freehand Sketchy Graphics Recognition based on character string nuclear; In conjunction with cartographical sketching identification, character string nuclear, these three prior arts of SVMs, realize a kind of new Freehand Sketchy Graphics Recognition that can continuous training/accumulation.
The present invention proposes a kind of Freehand Sketchy Graphics Recognition based on character string nuclear, this method may further comprise the steps:
A kind of Freehand Sketchy Graphics Recognition based on character string nuclear, this method may further comprise the steps:
Step 1 is mapped as feature string with cartographical sketching, and the cartographical sketching of input is carried out equidistant sampling, and the sampled distance threshold value is rule of thumb selected 5 pixels, and limits sketch of continuous sampling in the time threshold 0.7 second of sampling; The cartographical sketching of sampling is mapped to feature string, further comprising the steps of:
Get a positive integer n; The boundary rectangle of the cartographical sketching that calculating sampling arrives, and the boundary rectangle of sketch is divided into n
2Part; Each cuts apart little rectangle that the back obtains can use two-dimensional coordinate x, and y representes, 1≤x wherein, y≤n and x, y ∈ N; When the central point of and if only if a little rectangle had dropped in the zone that cartographical sketching surrounds, we thought that this little rectangle has been filled; It is n that a cartographical sketching has been mapped to a length
2Feature string;
Step 2 is checked the sampling sketch training as training sample through SVMs based on character string, obtains sorter; Using the lattice search to carry out parameter transfers excellent; Select best penalty factor C, gama, and use the best penalty factor C that obtains; Positive integer n selected in gama and the step 1 is trained whole training set, obtains supporting vector machine model; Character string nuclear is measured the similarity of these two character strings through the editing distance between two character strings; Said editing distance is that character string one is transformed to the used minimum character manipulation number of character string two, and said character manipulation comprises: character of (1) deletion; (2) insert a character; (3) change a character into another character;
Step 3, the sorter that obtains through training is classified to sketch to be identified and is discerned, and will blur, irregular cartographical sketching is mapped as accurate geometric configuration;
Said best penalty factor C, the value of gama is respectively: 5,0.01, the positive integer n value is 9.
Compared with prior art, a kind of sketch recognition method of the present invention based on character string nuclear, the position of this method and sketch, size, the drafting mode is irrelevant, allows the user to carry out sketch drafting according to the mode of oneself being accustomed to.Sketch recognition method recognition accuracy based on character string nuclear is higher, and realizes simple.
Description of drawings
Fig. 1 is the leg-of-mutton feature string mapping of Freehandhand-drawing (n=5) synoptic diagram;
Fig. 2 is experiment graphical-set synoptic diagram;
Fig. 3 is cartographical sketching and recognition result figure thereof;
Fig. 4 is the overall flow figure of the Freehand Sketchy Graphics Recognition based on character string nuclear of the present invention.
Embodiment
The thought of at first filling based on the zone is mapped as feature string with cartographical sketching; Secondly through SVMs (Support Vector Machine; SVM) based on character string nuclear (String Kernelho) training sample is trained, obtain sorter, the sorter that obtains through training is then classified to sketch to be identified and is discerned; To blur, irregular cartographical sketching is mapped as accurate geometric configuration.
System of the present invention runs under the visual C++6.0 environment, based on the libsvm software package, uses C Plus Plus to develop.
At first, in visual C++6.0, set up the engineering of a single document view; The libsvm software package that has added character string nuclear (String Kernel) is transplanted in the engineering of foundation, is write the code of feature string mapping block.Prepare training data: gathered 1150 samples, wherein 1000 samples are as training data, and 150 as test data, and sketch is mapped as feature string, the feature string that obtains is write in the file preserve.Training classifier: use characteristic character string training classifier, and use the mode of lattice search to carry out parameter regulation, i.e. penalty factor C, gama, n regulates.
Describe in detail as follows:
One, the cartographical sketching sampling & as the present invention's input discerns
What in this step, need do is, the cartographical sketching to be identified that samples is mapped as feature string, and the sorter that uses training to obtain is classified to it, thereby accomplishes the identification of cartographical sketching.
1) pre-service
Cartographical sketching to input carries out equidistant sampling; The sampled distance threshold value is rule of thumb selected 5 pixels; And limit sketch of (0.7 second) continuous sampling in the time threshold of sampling, order and direction no requirement (NR) that input is gesticulated, even the user start to write and the last time between starting writing interval greater than 0.7 second; Then user's a last sketch EOI is thought by system, and begins to draw a new sketch.As the gap that pixel resamples, this variable-value rule of thumb obtains.
The cartographical sketching of sampling is mapped to feature string, may further comprise the steps:
At first get a positive integer n;
The boundary rectangle of the cartographical sketching that calculating sampling arrives, and the boundary rectangle of sketch is divided into n
2Part.Each is cut apart little rectangle that the back obtains and can use two-dimensional coordinate (x y) representes, 1≤x wherein, y≤n and x, y ∈ N.When the central point of and if only if a little rectangle had dropped in the zone that cartographical sketching surrounds, we thought that this little rectangle has been filled.The leg-of-mutton feature string mapping of Freehandhand-drawing as shown in Figure 1 (n=5Pixel) synoptic diagram (dash area is expressed as and is filled).The feature string S=0100002222033004400050000 of Freehandhand-drawing triangle mapping among the figure.The computing formula of mapping process is following: at first, and defined function
f:N
2→{0,1,2,…,n}
Secondly, the process of calculated characteristics character string is following:
Initialize:x=1,y=1,n=k,k∈□
+,S=″″;
For(x=1;x≤n;x++)
For(y=1;y≤n;y++)
{
Int?tmp=f(x,y);
S=strcat(S,tmp);
Output?S;
So far we just a cartographical sketching to be mapped to a length be n
2Feature string.
Pre-service of the present invention comprises the sample collection of the cartographical sketching of various ways, and for example, the present invention has selected six kinds of common geometric figures such as straight line, triangle, rectangle, ellipse as the experiment graphical-set.
As shown in Figure 2, to above-mentioned graphical-set, in experiment, collect sample data.The user is free to the skeletonizing figure, 1000 of the total sample number that the present invention adopts.Mainly be used for of the influence of the different sample numbers of comparison to training time and recognition accuracy.
Two, examine training classifier through SVMs based on character string
Use character string nuclear training classifier.It is excellent that the parameter accent is carried out in the search of use lattice, selects best penalty factor C, gama, n.Character string kernel function in the native system is measured the similarity of these two character strings through the editing distance between two character strings (EDIT DISTANCE).
The definition of editing distance:
Suppose these 2 character strings of A and B.To convert character string A into character string B with minimum character manipulation.Here said character manipulation comprises:
(1) character of deletion;
(2) insert a character;
(3) change a character into another character.
Character string A is transformed to the used minimum character manipulation number of character string B is called the editing distance of character string A to B, be designated as K (A, B).
Experimental result, as shown in Figure 3.And following experimental verification data are arranged:
The accuracy of identification of form 1 difformity figure and recall ratio (get n=9, C=5, gama=0.01)
The recognition accuracy of the cartographical sketching of form 2 on different training collection scale (C=5, gama=0.01)