CN104992166B - A kind of Manuscripted Characters Identification Method and system based on robust measurement - Google Patents

A kind of Manuscripted Characters Identification Method and system based on robust measurement Download PDF

Info

Publication number
CN104992166B
CN104992166B CN201510450358.9A CN201510450358A CN104992166B CN 104992166 B CN104992166 B CN 104992166B CN 201510450358 A CN201510450358 A CN 201510450358A CN 104992166 B CN104992166 B CN 104992166B
Authority
CN
China
Prior art keywords
sample
projection matrix
label
classification
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510450358.9A
Other languages
Chinese (zh)
Other versions
CN104992166A (en
Inventor
张召
汪笑宇
张莉
李凡长
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN201510450358.9A priority Critical patent/CN104992166B/en
Publication of CN104992166A publication Critical patent/CN104992166A/en
Application granted granted Critical
Publication of CN104992166B publication Critical patent/CN104992166B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00402Recognising digital ink, i.e. recognising temporal sequences of handwritten position coordinates
    • G06K9/00409Preprocessing; Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06KRECOGNITION OF DATA; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K9/00Methods or arrangements for reading or recognising printed or written characters or for recognising patterns, e.g. fingerprints
    • G06K9/00402Recognising digital ink, i.e. recognising temporal sequences of handwritten position coordinates
    • G06K9/00422Matching; classification

Abstract

The invention discloses a kind of Manuscripted Characters Identification Methods and system based on robust measurement, by carrying out similarity-based learning to handwritten form training sample, construction weighting similar diagram, divergence and the local characteristics that all training samples are kept while detaching local class scatter in compact local class.In order to promote the robustness of handwritten form description, it proposes 1 norm measure being applied to semi-supervised feature learning model, the Manuscripted Characters Identification Method and system of performance robust are designed, output one can be used in sample and the projection matrix P of the outer handwritten form image characteristics extraction of sample.The conclusion of the outer image of sample is by projecting test sample to projection matrix P, and then the feature of extraction is inputted into efficient label Category of Communication device and is sorted out, take the position of the maximum value of probability in the soft label of corresponding classification, for the classification of discriminating test sample, most accurate character identification result is obtained.Meanwhile by establishing ratio model, reducing model parameter, and projection matrix P meets orthogonal property.

Description

A kind of Manuscripted Characters Identification Method and system based on robust measurement
Technical field
The present invention relates to computer visions and image identification technical field, more particularly to a kind of hand based on robust measurement Write the recognition methods of body body and system.
Background technology
Nowadays it is the epoch of an information explosion, there are a large amount of, valuable multimedia is high in our daily lifes Tie up information.Offline handwriting recongnition is the example for carrying out feature extraction to wherein certain high dimensional information and utilizing.It is logical Computer is crossed by paper image electronization, the character picture of computer storage is obtained, passes through a series of side of machine learning later The operations such as method extraction characteristics of image, classification finally identify character.Once the method for obtaining efficiently and accurately identification character, may be used on The fields such as office automation, machine translation, you can bring huge social and economic benefit.But it is hand-written due to effectively extracting The process tool of body characteristics of image acquires a certain degree of difficulty, and causes up to the present, offline handwritten form (abbreviation handwritten form in the present invention) word Symbol identification also has certain distance apart from real requirement.Current most of research work all concentrates on processing handwriting picture feature Extraction problem, and also obtained certain achievement.But the handwriting picture that is acquired from real world usually exists comprising making an uproar The problems such as sound, heterogeneous data or shortage of data, handwriting picture is in the presence of stroke lack of standardization etc. caused by the reasons such as writing style Problem, it is therefore desirable to which more robust algorithm carries out feature extraction.
In recent years, some robust Models based on 1- norms are suggested, such as the Principal Component Analysis Algorithm based on 1- norms (PCA-L1), the identification locality preserving projections algorithm (DLPP-L1) etc. based on 1- norms.The thought that these robust algorithms propose It is:Traditional algorithm based on 2- norm distance measures in data noise or heterogeneous data it is more sensitive, and be based on 1- The distance metric of norm can then overcome this disadvantage, the robustness of lift scheme.These algorithms so that result is more robust really, But it is unsupervised with algorithm that is entirely supervising due to only existing at present, it is unable to fully using having label data and without label data information, Therefore the accuracy of result also has prodigious room for promotion.In addition, some empirical parameters in algorithm be also very difficult to it is optimal really It is fixed.
Therefore it provides a kind of Manuscripted Characters Identification Method and system based on robust measurement, realize that handwritten character image is special The robust of sign extracts, while improving the accuracy of handwritten character characterization image ability and identification, be those skilled in the art urgently Problem to be solved.
Invention content
In view of this, the present invention provides a kind of Manuscripted Characters Identification Method and system based on robust measurement, realize hand-written The robust of body character picture feature extracts, while improving the accuracy of handwritten character characterization image ability and identification, to overcome Jin Shiyong has label or without label data in the prior art, the characteristics of without fully considering data information in reality.
In order to solve the above technical problems, the present invention provides a kind of Manuscripted Characters Identification Method measured based on robust, based on having The thought for the 1- norm projections that the identification of label data is locally kept with all samples, this method include:
Similarity-based learning carried out to handwritten form training sample, construction weighting similar diagram, in compact local class divergence and point The local characteristics of all training samples are kept while from local class scatter;It is semi-supervised to build the robust based on 1- norm measures Handwritten character characteristics of image learning model, the model optimization export one and can be used in sample carrying with the outer characteristics of image of sample The projection matrix P taken;Meanwhile by establishing ratio model, model parameter is reduced, and the projection matrix P for optimizing output meets just Hand over characteristic;
Using the projection matrix P to this progress feature extraction of hand-written body examination sample, the conclusion of the outer image of sample is mainly led to It crosses and maps the test sample to projection matrix P;
Using label Category of Communication device, test is completed to the test sample feature after dimensionality reduction, exports the test sample The soft label of classification takes the position of the maximum value of probability in the corresponding soft label of classification, the class for judging the test sample Not, character identification result is obtained;
Wherein, the numerical value in the soft label of the classification represents the probability that the test sample belongs to each classification.
In the above method, optionally, robust semi-supervised handwritten character image of the structure based on 1- norm measures is special Learning model is levied, the model optimization exports a projection matrix P that can be used in sample with the outer image characteristics extraction of sample, packet It includes:
There are the original training sample collection of noise by given oneWherein, n is trained sample The dimension of this collection, N are the quantity of training sample set, and training sample concentration includes the sample set of c class labelWith the sample set without any labelAnd meet sample number Amount, wherein c is the integer more than 2, and meets sample size l+u=N;IfThere is label sample for l This label, and sample xiLabel be yi(i≤l);
A projection with identification feature with part holding feature is calculated according to the original training sample collection MatrixIncluding exporting to obtain handwritten form character picture outside extractable sample by solving following optimization method The projection matrix P of feature:
s.t.PTP=Id
Wherein, | | | |1For 1- norms, it is defined as | | S | |1=∑i,j|Si,j|, Si,jIndicate (i, j) number member of s-matrix Element,It is weight coefficient matrix with W.
It is optionally, described that hand-written body examination examination sample image progress feature is carried using the projection matrix P in the above method Take, the conclusion of the outer image of sample mainly by the way that the test sample image is mapped to projection matrix P, including:
Training sample and test sample are projected using the projection matrix P, complete handwritten character characteristics of image Extraction.
The present invention also provides a kind of handwriting recongnition systems based on robust measurement, including:
Training module, for carrying out similarity-based learning to handwritten form training sample, construction obtains weighting similar diagram, compact The local characteristics of all training samples are kept in local class while divergence and separation part class scatter;Structure is based on 1- norms The semi-supervised handwritten character characteristics of image learning model of robust of measurement, the model optimization export one can be used in sample and The projection matrix P of the outer image characteristics extraction of sample;Meanwhile by establishing ratio model, model parameter is reduced, and optimize output Projection matrix P meets orthogonal property;
Preprocessing module is tested, is used for using the projection matrix P to this progress feature extraction of hand-written body examination sample, sample The conclusion of outer image is mainly by mapping the test sample to projection matrix P;
Test module completes test to the test sample feature after dimensionality reduction, exports institute for utilizing label Category of Communication device The soft label of classification for stating test sample takes the position of the maximum value of probability in the corresponding soft label of classification, described for judging The classification of test sample, obtains character identification result;
Wherein, the numerical value in the soft label of the classification represents the probability that the test sample belongs to each classification.
It can be seen via above technical scheme that compared with prior art, the invention discloses a kind of based on robust measurement Manuscripted Characters Identification Method and system, by carrying out similarity-based learning to handwritten form training sample, construction obtains weighting similar diagram, The local characteristics of all training samples are kept in compact part class while divergence and separation part class scatter;In order to promote hand The robustness of body description is write, the semi-supervised handwritten character characteristics of image learning model of robust based on 1- norm measures, institute are built It states model optimization and exports a projection matrix P that can be used in sample with the outer image characteristics extraction of sample;The outer image of sample is returned It receives by projecting test sample to projection matrix P, and then the feature of extraction is inputted into efficient label Category of Communication device Sorted out, take the position of the maximum value of probability in the soft label of corresponding classification, is used for the classification of discriminating test sample, obtain most accurate True character identification result.Meanwhile by establishing ratio model, reducing model parameter, and projection matrix P meets orthogonal spy Property.
Description of the drawings
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.
Fig. 1 is a kind of flow chart of Manuscripted Characters Identification Method based on robust measurement provided in an embodiment of the present invention;
Fig. 2 is a kind of structure diagram signal of handwriting recongnition system based on robust measurement provided in an embodiment of the present invention Figure;
Fig. 3 is a kind of identification schematic diagram of Manuscripted Characters Identification Method based on robust measurement provided in an embodiment of the present invention.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation describes, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
Core of the invention is to provide a kind of Manuscripted Characters Identification Method measured based on robust and system, realizes handwritten form word The robust extraction of characteristics of image is accorded with, while improving the accuracy of handwritten character characterization image ability and identification, it is existing to overcome Jin Shiyong has label or without label data in technology, the characteristics of without fully considering data information in reality.
The invention discloses a kind of Manuscripted Characters Identification Methods and system based on robust measurement, by training sample to handwritten form This progress similarity-based learning, construction obtain weighting similar diagram, and divergence and the local class scatter of separation is same in compact local class When keep the local characteristics of all training samples;In order to promote the robustness of handwritten form description, build based on 1- norm measures The semi-supervised handwritten character characteristics of image learning model of robust, the model optimization export one and can be used in sample and outside sample The projection matrix P of image characteristics extraction;The conclusion of the outer image of sample by the way that test sample is projected to projection matrix P, into And the feature of extraction is inputted into efficient label Category of Communication device and is sorted out, take the maximum value of probability in the soft label of corresponding classification Position, be used for discriminating test sample classification, obtain most accurate character identification result.Meanwhile by establishing ratio model, Reduce model parameter, and projection matrix P meets orthogonal property.
In order to enable those skilled in the art to better understand the solution of the present invention, with reference to the accompanying drawings and detailed description The present invention is described in further detail.
The present invention is tested in three handwritten character image data bases:CAS, USPS and ORHD.CAS is the Chinese Academy of Sciences Institute of Automation handwritten form database, including 3755 Chinese characters and 171 letters, number or symbol;USPS is U.S.'s postal The handwritten numeral database of political affairs system, including 9298 handwriting digital 0-9;ORHD is University of California at Irvine (UCI) machine The database of device study, includes 5620 numeral samples, contain in each sample there are one integer within the scope of 0-16.These data Library is collected from many aspects, thus test result is with universal illustrative.
Give a handwritten form sample set, be divided into training set and test set, separately include original training sample and Test sample.
With reference to figure 1, a kind of stream of Manuscripted Characters Identification Method measured based on robust provided in an embodiment of the present invention is shown Cheng Tu, the thought for the 1- norm projections that this method is locally kept based on the identification for having label data with all samples, specifically may be used To include the following steps:
Step S100, similarity-based learning, construction weighting similar diagram, in compact local class are carried out to handwritten form training sample The local characteristics of all training samples are kept while divergence and separation part class scatter.For the robustness of lifting system, Step S100 will also build the semi-supervised handwritten character characteristics of image learning model of robust based on 1- norm measures, the model Optimization output one can be used in sample and the projection matrix P of the outer image characteristics extraction of sample;Meanwhile by establishing ratio mould Type reduces model parameter, and the projection matrix P for optimizing output meets orthogonal property;
Step S101, the projection matrix P obtained using the step S100 to this progress feature extraction of hand-written body examination sample, The conclusion of the outer image of sample is mainly by mapping the test sample to projection matrix P;
Step S102, using label Category of Communication device, test is completed to the test sample feature after dimensionality reduction, exports the survey The soft label of classification of sample sheet takes the position of the maximum value of probability in the corresponding soft label of classification, for judging the test The classification of sample, obtains character identification result;Wherein, the numerical value in the soft label of the classification represents the test sample and belongs to each The probability of a classification.
In the present invention, handwritten form training image differentiate according to above-mentioned steps S100 and geometry keeps learning, It is proposed the semi-supervised feature learning model of robust based on 1- norm measures, it is special that optimization output one can be used for the outer test image of sample The projection matrix P of extraction is levied, detailed process is as follows:
For given one, there may be the original handwritten form vector set of noiseReferred to as Original training sample collection (wherein, n is the dimension of handwritten form sample, and N is the quantity of sample), training sample concentration include c The sample set of class labelWith the sample set without any label And meeting sample size, wherein c is the integer more than 2, and meets sample size l+u=N.If The label for having exemplar for l, and sample xiLabel be yi(i≤l)。
A projection matrix with identification feature with part holding feature is calculated according to original training setSpecifically, it exports to obtain handwritten form character picture outside extractable sample by solving following optimization method The projection matrix P of feature:
s.t.PTP=Id
Wherein | | | |1For 1- norms, it is defined as follows:
Wherein SI, jIndicate (i, j) number element of s-matrix, weight coefficient matrixW is defined as follows:
Wherein, M is reconstruct coefficient matrix, and γ ∈ [0,1] are to have label data and the tradeoff without label data feature extraction Parameter.MatrixIt is defined as follows:
The reconstruction coefficients matrix M can be obtained by solving following optimization problem:
Wherein, | | | | it is 2- norms, is defined as follows:
When calculating, the present embodiment, to locally optimal solution Step wise approximation, can be calculated dimension dropping to 1 first using alternative manner The case where dimension.First reconstruction coefficients matrix M is obtained by having the following optimization problem of method calculating:
Weight coefficient matrix is calculated laterWith
To obtain the weight coefficient matrix W in described problem:
1- norm optimization solutions are carried out below.Enable sign function
Former majorized function is substituted into obtain:
Increment is enabled again
P (t+1)=p (t)+β δ (t) are updated later.Wherein, β is the positive number of a very little.If the value of F (p (t+1)) increases Long unobvious, then export p*=p (t+1), otherwise always iteration until convergence.
Above description is the case where being down to 1 dimension i.e. d=1, following further clarification be down to multidimensional i.e. 1 d > the case where:
P is set first0=0, (xi)0=xi(i=1,2 ..., N);Later for i=1, each i in 2 ..., N, Calculate following formula:
By (xi)kIt substitutes into aforementioned alternative manner and calculates pk
Specific algorithm is as follows:
Handwritten form feature extraction algorithm based on robust measurement
Input:Raw data matrixControl parameter γ, β, d.
Output:Projection matrix P*
Initialization:d0=0, p0=0, (xi)0=xi, k=0, γ=0.2, β=0.01, ε=10-6
step1:It solves and calculates
step2:It calculates:
step3:Work as d0When < d, k ← k+1, for i=1, each i in 2 ..., N is calculated
Otherwise P is exported*=P
step4:It enables(miRepresent the mean value of the i-th class sample), and standardize
step5:Do when while is also not converged
It calculates
Calculate increment
Update
Check whether convergence:
IfThen stop, setting P (:,d0)=p (t+1);
Otherwise t=t+1
end while
step6:D is set0←d0+ 1, continue to execute step3.
The selection of iteration initial value in this example:γ=0.2;β=0.01 is initial value, is constantly reduced in iterative process.
We have obtained handwritten character image characteristics extraction matrix P as a result,.
In the present invention, for being carried in step S101, using the projection matrix P to hand-written body examination sample this progress feature It takes, the conclusion of the outer image of sample specifically includes mainly by mapping the test sample to projection matrix P:
The feature that handwritten form test image sample is extracted using projection matrix P obtained above, generates new test set, specifically For:Based on training set, projection matrix can be passed throughIt is embedded in training sample and test sample to obtain projector space again, Handwritten character image characteristics extraction is completed, training set and test set after feature extraction are generated.
Wherein, training sample xtrainAnd test sample xtestFeature extraction results expression it is as follows:WhereinThe respectively feature extraction knot of original training sample and test sample Fruit.
In the present invention, for step S102, using label Category of Communication device, the test sample feature after dimensionality reduction is completed to survey Examination, exports the soft label of classification of the test sample, takes the position of the maximum value of probability in the corresponding soft label of classification, is used for The classification for judging the test sample, obtains character identification result, specifically includes:
After original training set and test set character picture feature is calculated by step S101, it is easy construction feature extraction Handwritten form test sample collection afterwardsAnd training setWhereinIt is corresponding every One original sample xiFeature.Classified later to test set sample using label Category of Communication device, obtains test set sample This classification results.
Compared with prior art, the invention discloses a kind of Manuscripted Characters Identification Method and system based on robust measurement, lead to It crosses and similarity-based learning is carried out to handwritten form training sample, construction obtains weighting similar diagram, divergence and separation in compact local class The local characteristics of all training samples are kept while local class scatter;In order to promote the robustness of handwritten form description, structure The semi-supervised handwritten character characteristics of image learning model of robust based on 1- norm measures, the model optimization export one and can be used In the projection matrix P with image characteristics extraction outside sample in sample;The conclusion of the outer image of sample by by test sample to projection Matrix P is projected, and then the feature of extraction is inputted efficient label Category of Communication device and is sorted out, and takes the soft mark of corresponding classification The position of the maximum value of probability in label is used for the classification of discriminating test sample, obtains most accurate character identification result.Meanwhile By establishing ratio model, reduce model parameter, and projection matrix P meets orthogonal property.
It is corresponding with a kind of Manuscripted Characters Identification Method based on robust measurement disclosed in the embodiments of the present invention, the present invention Embodiment additionally provides a kind of handwriting recongnition system measured based on robust, and with reference to figure 2, which may include as follows Content:
Training module 201, for carrying out similarity-based learning to handwritten form training sample, construction obtains weighting similar diagram, The local characteristics of all training samples are kept in compact part class while divergence and separation part class scatter.In addition, will also The semi-supervised handwritten character characteristics of image learning model of robust based on 1- norm measures is built, the model optimization exports one It can be used for the projection matrix P with the outer image characteristics extraction of sample in sample;Meanwhile by establishing ratio model, reducing model ginseng Number, and the projection matrix P for optimizing output meets orthogonal property;
Test preprocessing module 202, for using the projection matrix P to this progress feature extraction of hand-written body examination sample, The conclusion of the outer image of sample is mainly by mapping the test sample to projection matrix P;
Test module 203 completes test for utilizing label Category of Communication device to the test sample feature after dimensionality reduction, defeated The soft label of classification for going out the test sample takes the position of the maximum value of probability in the corresponding soft label of classification, for judging The classification of the test sample, obtains character identification result;
Wherein, the numerical value in the soft label of the classification represents the probability that the test sample belongs to each classification.
Table 1 is please referred to, is the method for the present invention and semi-supervised maximum margin criterion algorithm (SSMMC), semi-supervised linear Discrimination Analysis Algorithm (SSLDA), identification locality preserving projections algorithm (DLPP-L1) method recognition result pair based on 1- norms Than table, the average recognition rate and highest discrimination of each method experiment are given.In this example, participate in SSMMC, SSLDA for comparing and DLPP-L1 methods are used for the feature extraction of test sample using the projection matrix P being respectively calculated, and classification is all made of label Category of Communication device.
1. present invention of table and the comparison of SSMMC, SSLDA, DLPP-L1 method recognition result
By three real data sets, i.e. (a) CAS Offline Chinese Handwriting Digits, (b) Example laboratory on USPS and (c) Optical Recognition of Handwritten Digits is the results show that the present invention Method can be effectively used for the Automatic Feature Extraction of handwritten form.
Attached drawing 3 is please referred to, shows a kind of Manuscripted Characters Identification Method measured based on robust provided in an embodiment of the present invention Identification schematic diagram.
By experimental result we can see that the handwritten character image characteristics extraction and recognition effect of the present invention are apparent Better than relevant SSMMC, SSLDA and DLPP-L1 method, and stronger stability is shown, there is certain advantage.
In summary:The invention discloses a kind of Manuscripted Characters Identification Methods and system based on robust measurement, pass through opponent It writes body training sample and carries out similarity-based learning, construction obtains weighting similar diagram, divergence and the local class of separation in compact local class Between divergence while keep the local characteristics of all training samples;In order to promote the robustness of handwritten form description, structure is based on 1- The semi-supervised handwritten character characteristics of image learning model of robust of norm measure, the model optimization export one and can be used for sample The projection matrix P of the outer image characteristics extraction of interior and sample;The conclusion of the outer image of sample by by test sample to projection matrix P into Row projection, and then the feature of extraction is inputted into efficient label Category of Communication device and is sorted out, it takes general in the soft label of corresponding classification The position of the maximum value of rate is used for the classification of discriminating test sample, obtains most accurate character identification result.Meanwhile by building Vertical ratio model, reduces model parameter, and projection matrix P meets orthogonal property.
It should be noted that each embodiment in this specification is described in a progressive manner, each embodiment weight What point illustrated is all the difference with other embodiments, and the same or similar parts between the embodiments can be referred to each other. For system class embodiment, since it is basically similar to the method embodiment, so describe fairly simple, related place ginseng See the part explanation of embodiment of the method.
A kind of Manuscripted Characters Identification Method and system based on robust measurement provided by the present invention have been carried out in detail above It introduces.Principle and implementation of the present invention are described for specific case used herein, the explanation of above example It is merely used to help understand the method and its core concept of the present invention.It should be pointed out that for the ordinary skill people of the art Member for, without departing from the principle of the present invention, can with several improvements and modifications are made to the present invention, these improve and Modification is also fallen within the protection scope of the claims of the present invention.

Claims (4)

1. a kind of Manuscripted Characters Identification Method based on robust measurement, which is characterized in that based on the identification and institute for having label data There are the thought for the 1- norm projections that sample locally keeps, this method to include:
Similarity-based learning, construction weighting similar diagram, divergence and separation office in compact local class are carried out to handwritten form training sample The local characteristics of all training samples are kept while portion's class scatter;It is semi-supervised hand-written to build the robust based on 1- norm measures Body character picture feature learning model, the model optimization output one can be used in sample and image characteristics extraction outside sample Projection matrix P;Meanwhile by establishing ratio model, model parameter is reduced, and the projection matrix P for optimizing output meets orthogonal spy Property;
Using the projection matrix P to this progress feature extraction of hand-written body examination sample, the conclusion of the outer image of sample will be by will be described Test sample is mapped to projection matrix P;
Using label Category of Communication device, test is completed to the test sample feature after dimensionality reduction, exports the classification of the test sample Soft label takes the position of the maximum value of probability in the corresponding soft label of classification, the classification for judging the test sample to obtain To character identification result;
Wherein, the numerical value in the soft label of the classification represents the probability that the test sample belongs to each classification.
2. the method as described in claim 1, which is characterized in that the robust of the structure based on 1- norm measures is semi-supervised hand-written Body character picture feature learning model, the model optimization output one can be used in sample and image characteristics extraction outside sample Projection matrix P, including:
Given one there are the original training sample collection of noiseWherein, n is training sample set Dimension, N are the quantity of training sample set, and training sample concentration includes the sample set of c class labelWith the sample set without any labelAnd meet sample number Amount, wherein c is the integer more than 2, and meets sample size l+u=N;IfThere is label sample for l This label, and sample xiLabel be yi(i≤l);
A projection matrix with identification feature with part holding feature is calculated according to the original training sample collectionIncluding exporting to obtain handwritten form character picture feature outside extractable sample by solving following optimization method Projection matrix P:
s.t.PTP=Id
Wherein, | | | |1For 1- norms, it is defined as | | S | |1=∑i,j|Si,j|, Si,jIndicate (i, j) number element of s-matrix,It is weight coefficient matrix with W.
3. method as claimed in claim 2, which is characterized in that described to utilize the projection matrix P to hand-written body examination sample sheet Image carries out feature extraction, and the conclusion of the outer image of sample is wrapped by mapping the test sample image to projection matrix P It includes:
Training sample and test sample are projected using the projection matrix P, complete handwritten character image characteristics extraction.
4. a kind of handwriting recongnition system based on robust measurement, which is characterized in that including:
Training module, for carrying out similarity-based learning to handwritten form training sample, construction obtains weighting similar diagram, in compact part The local characteristics of all training samples are kept in class while divergence and separation part class scatter;Structure is based on 1- norm measures The semi-supervised handwritten character characteristics of image learning model of robust, the model optimization exports one and can be used in sample and sample The projection matrix P of outer image characteristics extraction;Meanwhile by establishing ratio model, model parameter is reduced, and optimize the projection of output Matrix P meets orthogonal property;
Preprocessing module is tested, for, to this progress feature extraction of hand-written body examination sample, sample to be schemed outside using the projection matrix P The conclusion of picture is by mapping the test sample to projection matrix P;
Test module completes test to the test sample feature after dimensionality reduction, exports the survey for utilizing label Category of Communication device The soft label of classification of sample sheet takes the position of the maximum value of probability in the corresponding soft label of classification, for judging the test The classification of sample, obtains character identification result;
Wherein, the numerical value in the soft label of the classification represents the probability that the test sample belongs to each classification.
CN201510450358.9A 2015-07-28 2015-07-28 A kind of Manuscripted Characters Identification Method and system based on robust measurement Active CN104992166B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510450358.9A CN104992166B (en) 2015-07-28 2015-07-28 A kind of Manuscripted Characters Identification Method and system based on robust measurement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510450358.9A CN104992166B (en) 2015-07-28 2015-07-28 A kind of Manuscripted Characters Identification Method and system based on robust measurement

Publications (2)

Publication Number Publication Date
CN104992166A CN104992166A (en) 2015-10-21
CN104992166B true CN104992166B (en) 2018-09-11

Family

ID=54303979

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510450358.9A Active CN104992166B (en) 2015-07-28 2015-07-28 A kind of Manuscripted Characters Identification Method and system based on robust measurement

Country Status (1)

Country Link
CN (1) CN104992166B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105528620B (en) * 2015-12-11 2019-12-06 苏州大学 method and system for combined robust principal component feature learning and visual classification
CN106845358B (en) * 2016-12-26 2020-11-10 苏州大学 Method and system for recognizing image features of handwritten characters
CN109472370B (en) * 2018-09-30 2021-09-10 深圳市元征科技股份有限公司 Method and device for classifying maintenance plants

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793704A (en) * 2014-03-11 2014-05-14 苏州大学 Supervising neighborhood preserving embedding face recognition method and system and face recognizer
CN103955676A (en) * 2014-05-12 2014-07-30 苏州大学 Human face identification method and system
CN104504412A (en) * 2014-11-28 2015-04-08 苏州大学 Method and system for extracting and identifying handwriting stroke features
CN104751191A (en) * 2015-04-23 2015-07-01 重庆大学 Sparse self-adaptive semi-supervised manifold learning hyperspectral image classification method
CN104794489A (en) * 2015-04-23 2015-07-22 苏州大学 Deep label prediction based inducing type image classification method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8457409B2 (en) * 2008-05-22 2013-06-04 James Ting-Ho Lo Cortex-like learning machine for temporal and hierarchical pattern recognition

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793704A (en) * 2014-03-11 2014-05-14 苏州大学 Supervising neighborhood preserving embedding face recognition method and system and face recognizer
CN103955676A (en) * 2014-05-12 2014-07-30 苏州大学 Human face identification method and system
CN104504412A (en) * 2014-11-28 2015-04-08 苏州大学 Method and system for extracting and identifying handwriting stroke features
CN104751191A (en) * 2015-04-23 2015-07-01 重庆大学 Sparse self-adaptive semi-supervised manifold learning hyperspectral image classification method
CN104794489A (en) * 2015-04-23 2015-07-22 苏州大学 Deep label prediction based inducing type image classification method and system

Also Published As

Publication number Publication date
CN104992166A (en) 2015-10-21

Similar Documents

Publication Publication Date Title
Yao et al. Strokelets: A learned multi-scale representation for scene text recognition
Cai et al. Effective active skeleton representation for low latency human action recognition
CN104992166B (en) A kind of Manuscripted Characters Identification Method and system based on robust measurement
CN106845358B (en) Method and system for recognizing image features of handwritten characters
CN102324038B (en) Plant species identification method based on digital image
Li et al. Hierarchical shape primitive features for online text-independent writer identification
CN104834941A (en) Offline handwriting recognition method of sparse autoencoder based on computer input
CN103593674A (en) Cervical lymph node ultrasonoscopy feature selection method
CN104008375A (en) Integrated human face recognition mehtod based on feature fusion
Li et al. Fuzzy bag of words for social image description
CN105205124A (en) Semi-supervised text sentiment classification method based on random feature subspace
CN105844223A (en) Face expression algorithm combining class characteristic dictionary learning and shared dictionary learning
CN103020167B (en) A kind of computer Chinese file classification method
Djeddi et al. Artificial immune recognition system for Arabic writer identification
US20170076152A1 (en) Determining a text string based on visual features of a shred
CN104268507A (en) Manual alphabet identification method based on RGB-D image
CN104978569A (en) Sparse representation based incremental face recognition method
CN102855488A (en) Three-dimensional gesture recognition method and system
CN106909946A (en) A kind of picking system of multi-modal fusion
CN106648149B (en) A kind of aerial hand-written character recognition method based on accelerometer and gyroscope
Wang et al. Dynamical and-or graph learning for object shape modeling and detection
CN106599913B (en) A kind of multi-tag imbalance biomedical data classification method based on cluster
CN106503706B (en) The method of discrimination of Chinese character pattern cutting result correctness
CN103839074B (en) Image classification method based on matching of sketch line segment information and space pyramid
CN103456017A (en) Image segmentation method of semi-supervised weight kernel fuzzy clustering based on seed set

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant