CN106971192A - View data categorizing system based on Universum associate(d) matrix Ho Kashyap algorithms - Google Patents

View data categorizing system based on Universum associate(d) matrix Ho Kashyap algorithms Download PDF

Info

Publication number
CN106971192A
CN106971192A CN201611023336.5A CN201611023336A CN106971192A CN 106971192 A CN106971192 A CN 106971192A CN 201611023336 A CN201611023336 A CN 201611023336A CN 106971192 A CN106971192 A CN 106971192A
Authority
CN
China
Prior art keywords
universum
samples
model
sample
umatmhks
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611023336.5A
Other languages
Chinese (zh)
Inventor
王喆
李冬冬
朱昱锦
崇传禹
高大启
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China University of Science and Technology
Original Assignee
East China University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China University of Science and Technology filed Critical East China University of Science and Technology
Priority to CN201611023336.5A priority Critical patent/CN106971192A/en
Publication of CN106971192A publication Critical patent/CN106971192A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The present invention provides a kind of view data categorizing system based on Universum associate(d) matrix Ho Kashyap algorithms, and the 3rd class sample point that certain amount is located between two class samples, i.e. Universum samples are generated first by based on In Between generation strategies;Universum sample points are substituted into regularization term R afterwardsuniIn;Then HK disaggregated models regularization term introduced after matrixing, constitute complete combination Universum matrixing HK models;Finally the model is trained, the optimized parameter that model is directed to current training dataset is obtained, optimal classification decision surface is generated.In test phase, test sample point substitution decision-making surface function is judged, output category label.Compared to traditional sorting technique, the present invention is allowed the contrast of two class samples of script to become apparent from, further increases accuracy by introducing Universum samples.

Description

View data classification based on Universum associate(d) matrix Ho-Kashyap algorithms System
Technical field
The present invention relates to Pattern classification techniques field, more particularly to a kind of processing is identified to image data set Universum associate(d) matrix Ho-Kashyap algorithms and system.
Background technology
Pattern-recognition is research and utilization computer to imitate or realize the recognition capability of the mankind or other animals, so as to grinding Study carefully the task that object completes automatic identification.In recent years, mode identification technology be widely used in artificial intelligence, machine learning, Computer engineering, robotics, Neurobiology, medical science, detective learn and archaeology, geological prospecting, Astronautics and weapon Many key areas such as technology.Pattern-recognition needs one of processing classical problem to be to 2-D data, i.e., to show using matrix table Data, are handled.In actual applications, the data that matrix is represented are common in problem of image recognition, such as recognition of face, fingerprint Identification, or spectral matching factor.
Traditional method for classifying modes is when handling image problem, it is necessary to which an image pattern is converted into vector table first Show, then the sample of vectorization is handled.Classical method include SVMs (Support Vector Machine, SVM), principal component analysis (Principal Component Analysis), Fisher linear discriminants (Fisher Linear Discriminant) etc..Handle the image after vectorization and there are two subject matters:First, an image is converted into vector Afterwards, vectorial dimension is of a relatively high, for many classical ways in feature extraction field, it may appear that small sample problem, i.e., The scale of data set is much smaller than the dimension of data set.For example, projection algorithm (Locally Preserving protect in office Projection, LPP), FLD, and PCA etc..This kind of algorithm needs to be related to Eigenvalues Decomposition, and dimension and the difference of sample number Linear multivariate Diophantine equation group is caused to seek approximate solution problem.Higher-dimension sample also causes computational complexity to increase, and consumption is more Internal memory places the parameters such as weight vectors.Secondly, an image is converted into after vector, the space knot between image element itself Structure is destroyed.It is not the attribute that correspondence is independently defined, but represent because the element of image pattern is different from vectorial sample elements Pixel Information of the whole sample in ad-hoc location.Therefore, the two-dimensional structure of destruction image script in theory can be accurate to classifying Degree causes certain influence.
In order to solve traditional mode recognition methods in problem present on two-dimentional data set, some specific methods are designed Out.In these methods, the method for directly processing two dimensional sample achieves more significant success.Exemplary process has will be traditional special Levy two-dimensional principal component analysis (2DPCA) and two dimension Fisher linear discriminants (2DFLD) of processing method two dimensionization etc..Meanwhile, Have the method for classical taxonomy method two dimensionization, for example, support tensor machine (Support Tensor Machine, STM) etc..
At present, the method for both direction respectively has deficiency.First kind method is only in the characteristic processing stage to the direct place of data set Reason, main purpose be dimensionality reduction to avoid or alleviate small sample problem, but still entered in follow-up sorting phase using conventional method Row processing, although so part solves produced problem one after two dimensional sample vectorization described above, can not solve problem Two.Equations of The Second Kind method is often complicated, it is necessary to adjust quantity of parameters to obtain optimal value due to being mostly nonlinear method. And matrix computations amount is the cube of exponent number, this kind of method is related to a large amount of matrix computations when handling many nonlinear steps, because This time complexity is high.If can design simple for structure, parameter is less, and the side that directly can be classified to 2-D data Method, it will further improve disposal ability of the Pattern classification techniques on image problem.
The content of the invention
For prior art construction is complicated, inefficiency and precision is not high, it is impossible to meet precisely, in real time or lack priori The image problem of knowledge, the invention provides a kind of sorting technique based on Universum associate(d) matrix Ho-Kashyap algorithms, To two classification problems, the Universum samples between class are generated by classical In-Between technologies first, one is then devised The model of individual two dimensionization Ho-Kashyap (HK) algorithm, designs a sign Universum sample and is associated with original sample afterwards Regularization term and substituted into the module of second step design, optimal ginseng finally is solved with gradient descent method to whole model Number, obtained decision boundary is while image data set classification accuracy rate is ensured, in modelling and the aspect of model calculation two Improve efficiency.
The technical solution adopted for the present invention to solve the technical problems:Backstage is described according to specific image problem first, The sample collected is subjected to dimensionality reduction denoising using classical LPP, FLD or PCA method.Secondly, by what is represented with matrix Data set is divided into training dataset and test data set two parts.In training step, given birth to first by based on In-Between It is located at the 3rd class sample point between two class samples, i.e. Universum samples into strategy generating certain amount.Afterwards, will Universum sample points substitute into regularization term RuniIn.Then HK disaggregated models regularization term introduced after matrixing, structure Into complete combination Universum matrixing HK models.Finally, the model is trained, obtains model for current instruction Practice the optimized parameter of data set, generate optimal classification decision surface.3rd, in test phase, current test sample point is substituted into and instructed The decision-making surface function perfected is judged.Finally, the class label that output is determined.
The technical solution adopted for the present invention to solve the technical problems can also be further perfect.The of the training module One step, generation Universum method be not limited to use In-Between, as long as the method used can be quickly generated between The 3rd class sample between two classes.Further, because vector is also a kind of special matrix, the model can also handle to Measure data set.In processing, if not considering the Universum samples introduced, and the weight vectors of model side are made to be not involved in changing Generation optimization, then model degradation is to traditional amendment HK algorithms (Modified Ho-Kashyap Algorithm, MHKS).Can be with Find out, as the method such as this method and MHKS, belong to linear classification method, therefore, it is possible to faster determine classification than nonlinear method Decision surface, so as to improve efficiency.
The invention has the advantages that:The sorting technique of view data is directly handled, small sample problem is not only overcome, carries High efficiency, and the integrality of view data structure set is remained, therefore have higher accuracy;By introducing Universum Sample, allows the contrast of two class samples of script to become apparent from, further increases accuracy;Because this method belongs to linear method, Shorten the training time;This method can prove that the risk supremum of promoting under the conditions of Rademacher is no more than original MHKS Method.
Brief description of the drawings
Fig. 1 is the system framework that the present invention is applied to image model classification problem;
Fig. 2 is the experimental comparison figure of inventive algorithm and other algorithms;
Embodiment
The invention will be described further with reference to the accompanying drawings and examples:The method of the present invention is divided into three modules.
Part I:Data acquisition
This module includes two steps, first by value data;Secondly, Universum samples are generated.
1) by the image problem digitization in reality:The data set that generator matrix is represented is easy to subsequent module to be handled. The matrix data generated after collection further can carry out dimension-reduction treatment using classical way.One matrix samples is expressed as A, square Dimension d=m × n of the pixel conversion value, i.e. sample of each element correspondence sample of battle array.
2) In-Between methods generation Universum samples are utilized:Universum samples are defined as and problem data Collection is but not belonging to any kind sample in same domain value range.For example in grapheme classification problem, two classification are used Model is to digital " 5 " and " 8 " two class sample classification, and remaining digital " 0 ", " 1 ", " 2 ", " 3 ", " 4 ", " 6 ", " 7 ", " 9 " can To be considered as Universum samples.In other problemses, if there is no ready-made Universum samples, it is necessary to use Certain method generation.Here we used a typical generating algorithm, i.e. In-Between methods.The thought of this method It is, it is first determined two classes are close to the sample of decision boundary, the line between inhomogeneous boundary sample, then the random distance on line The new sample of place's generation.The sample of generation is exactly Universum samples.In our method, to simplify calculating, two are unified in The midpoint generation Universum samples of individual sample line.
Part II:Train classification models
In this module, the data set collected will be trained in the core algorithm for substituting into invention.Key step is as follows:
1) design regularization term Runi:Universum samples are substituted into initial decision-making surface function as the 3rd class sample to enter Row processing, the formula for generating regularization term is as follows:
2) new model M atMHKS is generated to traditional MHKS matrixings:First, traditional MHKS models are based on minimum equal The square theory of error is proposed, and MHKS is the HK algorithms of amendment.The target equation of HK algorithms is as follows;
Js(w, b)=| | Yw-b | |2
Wherein, Y is the matrix that vectorial sample is constituted, and w is weight vectors, and b is the bias correction vector not to bear being manually set. HK target is just so that Yw-b error as close possible to 0.MHKS by increasing border width, by the target turn to it is following not Equation:
Yw≥1N×1
It is so as to obtain new target equation:
Matrixing is directly handled matrix on the basis of MHKS, first, and MatMHKS is by by the weight vectors w of script It is divided into the vectorial u and the vector v of control rectangular array of control row matrix, the decision surface equation for obtaining basis is changed into:
And then, MatMHKS target equation is changed into:
Wherein, v=[vT,v0]T, Y=[y1,y2,...,yN]T,yii[uTAi,1]T.For simplicity, S1With S2For two unit squares Battle array.
3) by regularization term RuniMatMHKS is introduced, the matrixing HK disaggregated models for combining Universum methods are constituted UMatMHKS:As can be seen that HK, MHKS and MatMHKS follow same Frame Design, i.e. structural risk minimization framework:
Min J=Remp+cRreg
Wherein RempIt is traditional empiric risk, the i.e. error sum of squares of experiment value and theoretical value.RregPair it is to promote risk, i.e., Empiric risk it is extensive so that model can be applicable on different pieces of information collection.C is a penalty factor.In this conventional frame In, introduce the designed Universum regularization terms R of previous stepuni, so as to obtain the complete frame of new method:
4) object function under generation new frame:New model by Universum samples due to introducing matrixing HK methods In,
Substitute into design parameter and just obtain final target equation:
5) optimized parameter is solved using gradient descent method:For UMatMHKS target equation, using gradient descent method, First to target component derivation:
When the differential formulas result of parameter is 0, parameter obtains extreme value, now obtains the calculation formula of each parameter acquiring extreme value It is as follows:
It is according to back empiric risk and as the standard for the condition of stopping, parameter b solution is different with v from u
What the error equation that item is obtained was represented:
Part III:Test unknown data
, it is necessary to detect that the unknown data of its class label substitutes into the model trained in the module, and made decision by model. If unknown sample is Ai.Decision function is:
It from decision function, if decision-making equation result is not 0, can be judged, be 0 and represent that test sample assigns to two classes Probability is equal, and disaggregated model can not judge.
Experimental design
1) experimental data set is chosen:The classical image data sets of the experimental selection four.Choose class number, the sample dimension of data set Degree, scale (total sample number) row are in the following table.
All data sets used are handled using the wheel cross-iteration mode of Monte Carlo ten, i.e., be divided into two parts by data set is all kinds of And upsetting sample order, portion is as test data, and another is training data, repeats ten times.Extraction mode is to put back to Extract.In an experiment, by contrasting two parts of different proportion, the effect of each disaggregated model in actual applications is observed.For example with When the sample number of training is much smaller than the sample number for testing, the classification accuracy of different classifications model is how many.
2) algorithm is contrasted:Core algorithm UMatMHKS used in invention.In addition, we select MatMHKS, MHKS, Algorithm on the basis of SVM (Linear), SVM (Non-Linear).Wherein SVM (Non-Linear) algorithm uses RBF (Radial basis function).Parameter specifically sets as follows:
For UMatMHKS, MatMHKS and MHKS, vectorial b initial values are set to 10-6, parameter of stopping ξ is set to 10-4.Learning rate p It is set to 0.99.For prevent from not restraining situation occur and defined maximum iteration is set to 1000 times.Control RregWith RuniThe penalty parameter c of item is all from set { 10-2,10-1,100,101,102Middle selection.Especially, UMatMHKS weight vectors U initial values are set to random and are more than 0 number for being less than 1.
For SVM, relaxation factor C selection range is { 10-2,10-1,100,101,102In.For non-linear SVM, nuclear parameter Calculation formula is as follows, i.e. the average distance of sample two-by-two:
K(xi, xj)=exp (- | | xi-xj||2/σ)
3) performance metric method:Experiment is unified to be come using classification accuracy (Classification Accuracy, Acc) Record classification results of the distinct methods to each data set.Result is that correspondence algorithm is configured on the data set using optimized parameter When the result that obtains, i.e. optimal result.Acc values are between 0 to 100, and numerical value is higher, show that the algorithm divides on current data set Class effect is better.
The result that all models are handled on each image data set is as shown in Figure 2.Four width figures respectively depict contrast algorithm With classification accuracy during different scales setting training sample on four data sets.It can be seen that in all data On collection, Most models improve accuracy with number of training purpose increase.Especially, UMatMHKS is in four picture numbers According to all achieving effect best in model group on collection.

Claims (4)

1. a kind of view data categorizing system based on Universum associate(d) matrix Ho-Kashyap algorithms, it is characterised in that:Tool Body step is:
1), sample collection:Backstage is described according to specific image problem, and the sample collected is changed into can be for subsequent algorithm The matrix model of processing;
2)Training generation Universum samples:It is located at two class samples using based on In-Between generation strategies generation certain amount The 3rd class sample point between this, i.e. Universum samples;
3)Training obtains Universum regularization terms Runi
4)Training obtains matrix model MatMHKS;
5)Train regularization term RuniIntroduce matrixing model and obtain final mask UMatMHKS;
6)The optimized parameter of UMatMHKS object functions is sought using gradient descent method;
7)Calculated in test phase, the decision function that test sample is substituted into model UMatMHKS generations, according to the result drawn Symbol is classified.
2. training according to claim 1 obtains Universum regularization terms Runi, it is characterised in that:Standalone configuration makes It is introduced into for the processing formula of Universum samples, and using the formula as one in original matrix model.
3. according to claim 1 train regularization term RuniIntroduce matrixing model and obtain final mask UMatMHKS, It is characterized in that:By RuniIntroduce traditional structure risk framework so that solution space is further constrained, the result is that UMatMHKS popularization risk supremum is not higher than the popularization risk supremum of MatMHKS and MHKS models.
4. use gradient descent method according to claim 1 seeks the optimized parameter of UMatMHKS object functions, its feature exists In:Obtain two weight vectors u and v optimum value respectively using alternating iteration, and carried out using error rate for bias vector b Stop judgement.
CN201611023336.5A 2016-11-21 2016-11-21 View data categorizing system based on Universum associate(d) matrix Ho Kashyap algorithms Pending CN106971192A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611023336.5A CN106971192A (en) 2016-11-21 2016-11-21 View data categorizing system based on Universum associate(d) matrix Ho Kashyap algorithms

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611023336.5A CN106971192A (en) 2016-11-21 2016-11-21 View data categorizing system based on Universum associate(d) matrix Ho Kashyap algorithms

Publications (1)

Publication Number Publication Date
CN106971192A true CN106971192A (en) 2017-07-21

Family

ID=59334597

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611023336.5A Pending CN106971192A (en) 2016-11-21 2016-11-21 View data categorizing system based on Universum associate(d) matrix Ho Kashyap algorithms

Country Status (1)

Country Link
CN (1) CN106971192A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109344681A (en) * 2018-08-02 2019-02-15 长江大学 A kind of geologic objective recognition methods based on recognition of face

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
崇传禹: "基于Universum的矩阵型分类器设计研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109344681A (en) * 2018-08-02 2019-02-15 长江大学 A kind of geologic objective recognition methods based on recognition of face
CN109344681B (en) * 2018-08-02 2021-09-24 长江大学 Geological target recognition method based on face recognition

Similar Documents

Publication Publication Date Title
Ewees et al. Improved artificial bee colony using sine-cosine algorithm for multi-level thresholding image segmentation
Li et al. A geometry-attentional network for ALS point cloud classification
CN111079639B (en) Method, device, equipment and storage medium for constructing garbage image classification model
Lv et al. Classification of hyperspectral remote sensing image using hierarchical local-receptive-field-based extreme learning machine
CN104573729B (en) A kind of image classification method based on core principle component analysis network
CN104573699B (en) Trypetid recognition methods based on middle equifield intensity magnetic resonance anatomy imaging
Naik et al. Improved and Accurate Face Mask Detection Using Machine Learning in the Crowded Places
Chawathe Rice disease detection by image analysis
Bora et al. Clustering approach towards image segmentation: an analytical study
CN111507297B (en) Radar signal identification method and system based on measurement information matrix
Arora et al. Geometric feature-based classification of segmented human chromosomes
Dalal et al. ETR: Enhancing transformation reduction for reducing dimensionality and classification complexity in hyperspectral images
Shrivastava et al. Dictionary-based multiple instance learning
Chakraborty et al. Hyper-spectral image segmentation using an improved PSO aided with multilevel fuzzy entropy
Shi et al. Hyperspectral image classification based on dual-branch spectral multiscale attention network
Zhang et al. Multicontext 3D residual CNN for false positive reduction of pulmonary nodule detection
Chen et al. DGCNN network architecture with densely connected point pairs in multiscale local regions for ALS point cloud classification
Dai et al. MDC-Net: A multi-directional constrained and prior assisted neural network for wood and leaf separation from terrestrial laser scanning
Nhaila et al. New wrapper method based on normalized mutual information for dimension reduction and classification of hyperspectral images
CN106971192A (en) View data categorizing system based on Universum associate(d) matrix Ho Kashyap algorithms
Mustafa et al. Palm print recognition based on harmony search algorithm.
Gritsenko et al. Deformable surface registration with extreme learning machines
Jena et al. Elitist TLBO for identification and verification of plant diseases
Jung et al. A metric to measure contribution of nodes in neural networks
Vijayakumar et al. Machine learning algorithm for improving the efficient of forgery detection

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170721

WD01 Invention patent application deemed withdrawn after publication