CN110348493A - A kind of positive class and unmarked diagram data classification method based on multi-angle of view study - Google Patents

A kind of positive class and unmarked diagram data classification method based on multi-angle of view study Download PDF

Info

Publication number
CN110348493A
CN110348493A CN201910549488.6A CN201910549488A CN110348493A CN 110348493 A CN110348493 A CN 110348493A CN 201910549488 A CN201910549488 A CN 201910549488A CN 110348493 A CN110348493 A CN 110348493A
Authority
CN
China
Prior art keywords
diagram data
angle
sample
score
view
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910549488.6A
Other languages
Chinese (zh)
Inventor
钟昊文
刘波
肖燕珊
林志全
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN201910549488.6A priority Critical patent/CN110348493A/en
Publication of CN110348493A publication Critical patent/CN110348493A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of positive classes and unmarked diagram data classification method based on multi-angle of view study, construct multiple visual angles of diagram data first, i.e., by multiple and different diagram data feature extracting methods, diagram data sample is converted into multiple and different characteristic vectors;The relationship of sample and label is determined followed by multiple visual angles building valuation functions of diagram data, valuation functions are constructed based on sequence support vector machines (rankingSVM), the positive class score that then its evaluated function calculates is higher, that is, the score that the valuation functions constructed meet positive class diagram data is greater than the score of negative class diagram data more being likely to be for diagram data sample;Finally for given diagram data sample, its multiple visual angle is first obtained, then calculates score by valuation functions, to predict the label of given diagram data sample;The present invention can efficiently use the multi-angle of view of diagram data to handle and classify based on the diagram data of positive sample and unmarked sample, guarantee the accuracy of classification.

Description

A kind of positive class and unmarked diagram data classification method based on multi-angle of view study
Technical field
The present invention relates to machine learning techniques fields, and in particular to a kind of positive class and unmarked based on multi-angle of view study Diagram data classification method.
Background technique
With the development of big data era, concern of the classification of diagram data by more and more people, the reason is that diagram data There is powerful expressive force.Diagram data can be used to describe certain particular kind of relationship between certain things, represent things with point, with even The line for connecing two o'clock indicates there is certain relationship between corresponding two things.Diagram data (graph data) is a kind of abstract data knot Structure is made of vertex and side.Diagram data description or modeling, such as DNA, compound, social networks can be used in many elements.Figure Data classification refers to the different characteristic according to diagram data, is categorized into positive class or negative class.Most of existing diagram data classification side Method all assumes that there are positive classes and negative class in training sample, but in the practical application of part, training sample only exists the sum of positive class (may be positive class be also likely to be negative class) of no label, such as in drug research, researcher is easier to find certain chemical combination Object has positive effect to disease.In this case, positive sample and unmarked sample (positive and unlabeled are based on Learning universal method) can be used for diagram data classification.
In existing PU Study on Problems, the most common PU classification method has three classes: the method for (I) based on two-step Taylor-Galerkin (two-stepstrategy based methods);(II) method (probability based on probability Estimation estimation based methods);(III) (the cost-sensitive based of the method based on constituent-sensitive methods).Method based on two-step Taylor-Galerkin selects reliable negative class example or reliable positive class first from the sample of no label Then example constructs classifier using positive class example and negative class example;Method based on probability Estimation is that sample estimates belong to just The probability of class, is then predicted;Method based on cost sensitivity be by positive class sample and the sample without label provide it is suitable When weight train classifier.The characteristics of due to diagram data, these general PU classification methods cannot be directly used to diagram data. Existing diagram data classification method provides a thinking, they will be schemed by extracting different types of feature on diagram data Data are converted to digital vectors.These conversion methods are divided into two classes: method (the feature mining of (I) based on feature mining based methods);(II) method (embedding learning based methods) based on insertion study.It is based on The method of feature mining finds the topological characteristic or Sub-Image Feature for being conducive to classification first, then whether contains phase according to diagram data The feature answered, each diagram data can use the vector representation being made of 0 and 1;Method based on insertion study is desirable to scheme During data are converted to low dimension vector, the structure feature and attribute of diagram data are farthest remained.
In addition, most of existing diagram data classification method only has studied the case where describing diagram data from an angle, Referred to as single-view study (single view learning).In practice, we can describe object from multiple angles, because We can also describe data from different perspectives for this.By the diversity of data different perspectives come mention it is high performance in the way of be known as it is more Visual angle learns (multi-viewlearning).Positive class diagram data especially is only existed in training sample and without label diagram data In the case of, it would be desirable to more diagram data information are utilized, and the thought of multi-angle of view study can provide more for diagram data classification Information abundant.
But existing diagram data classification method only to be ground to there are the case where positive sample and negative sample in training sample The case where studying carefully, being not appropriate for based on positive sample and unmarked sample, limits the application that diagram data is sorted in real life;And And existing diagram data classification method only describes diagram data from an angle, so as to cause the deficiency of classification performance.
Summary of the invention
The purpose of the present invention is to overcome the shortcomings of the existing technology and deficiency, provides a kind of positive class based on multi-angle of view study With unmarked diagram data classification method, this method can efficiently use the multi-angle of view of diagram data to handle based on positive sample and nothing The diagram data of marker samples is classified, and guarantees the accuracy of classification.
The purpose of the invention is achieved by the following technical solution:
A kind of positive class and unmarked diagram data classification method based on multi-angle of view study, includes the following steps:
(1) parameter setting;The weight γ at each visual angle is setvvReact the importance at v-th of visual angle of diagram data;Setting Penalty factorIt allows the mistake in training process;Regularization parameter ε is set1, ε2;Non-negative relaxation is set VariableNon-negative slack variableFor guaranteeing diagram data visual angle a and visual angle b consistency;
(2) multi-angle of view constructs;Multi-angle of view is constructed, different feature extracting methods can be used, diagram data is reflected It penetrates, as diagram data is converted into vector (graph2vec) and Mining Frequent subgraph;WithIndicate v kind mapping method, then for Diagram data sample Gi, there is m kind mapping method then to have m visual angle
(3) valuation functions are determined;What it is due to model treatment is multi-angle of view data, i.e. valuation functions should also follow multi-angle of view The consistency and complementarity of habit, therefore propose following target equation:
R1={ (p, u): Xp∈P,Xu∈U},
R2={ (i, j): Xi,Xj∈P+U},
A=1 ..., m-1, b=a+1 ..., m,
K=1 ..., n, v=1 ..., m;
(4) optimization of target equation;Using method of Lagrange multipliers technology, the target equation of model is derived by Its dual problem, about non-negative Lagrange multiplierFormula is as follows:
Wherein
The dual problem derived can be solved using optimization algorithm such as SMO algorithm, the best glug solved is bright Day multiplierIt can be used to calculate wv
(5) score is calculated;For new diagram data G, constructing its m visual angle first is indicatedIts Following formula can be used to calculate for score:
(6) the label result of given diagram data sample is predicted using score.
The present invention have compared with prior art it is below the utility model has the advantages that
(1) the present invention is based on sequence supporting vector machine models to consider the problems of diagram data positive sample and unmarked sample, Also consider the classification problem that processing diagram data is removed from multi-angle;
(2) present invention improves it for few by combining the strategy of cost sensitivity to expand sequence supporting vector machine model The positive example diagram data of amount and the largely poor classification performance of the diagram data problem without label;
(3) related constraint that the present invention is learnt by introducing multi-angle of view, using the multi-angle of view of diagram data, to improve classification Effect;
(4) present invention can optimize it while establishing object module, to reduce complexity.
Detailed description of the invention
Fig. 1 is training pattern flow diagram of the invention;
Fig. 2 is diagram data classification process schematic diagram of the invention.
Specific embodiment
Present invention will now be described in further detail with reference to the embodiments and the accompanying drawings, but embodiments of the present invention are unlimited In this.
The present invention proposes a kind of positive class and unmarked diagram data classification method based on multi-angle of view study, first structure figures Diagram data sample is converted into multiple and different by multiple visual angles of data that is, by multiple and different diagram data feature extracting methods Characteristic vector;The relationship of sample and label is determined followed by multiple visual angles building valuation functions of diagram data, we are bases Valuation functions are constructed in sequence support vector machines (rankingSVM), diagram data sample is more likely to be positive class then its evaluated letter The score that number calculates is higher, i.e., the score that the valuation functions that we construct meet positive class diagram data is greater than obtaining for negative class diagram data Point;Finally for given diagram data sample, its multiple visual angle is first obtained, then score is calculated by valuation functions, thus in advance Survey the label of given diagram data sample.
This method is broadly divided into three phases: containing n for givenpThe set of a positive example diagram dataAnd nuThe set of a no label diagram dataEach diagram data G Have class label y, a y ∈ Y=+1,0, -1, and indicate the label of diagram data be it is positive, it is no label and negative;(1) first Using different diagram data feature extracting methods by diagram data sample GiIt is mapped as v feature vector(v) v kind is represented Extracting method, by v feature vectorAs the input of multi-angle of view sequence SVM, wherein multi-angle of view sequence SVM is by multiple Different vector W(v)Composition;(2) after the first stage, multi-angle of view sequence SVM can generate classification score for diagram data sample. (3) score prediction label result is finally utilized.
Specifically, as shown in Fig. 1~2, a kind of positive class based on multi-angle of view study and unmarked diagram data classification side Method includes the following steps:
(1) parameter setting;
The weight γ at each visual angle is setvvReact the importance at v-th of visual angle of diagram data;Penalty factor is setIt allows the mistake in training process;Regularization parameter ε is set1, ε2;Non-negative slack variable is setNon-negative slack variableFor guaranteeing diagram data visual angle a and visual angle b consistency.Side herein In method, the configuration of hyper parameter is influential on classifying quality.In implementation process, tested by the intersection to each data set Card, it can be deduced that the experience range of choice of hyper parameter, but different hyper parameters is just selected for different data sets, this is just very It takes time and effort.Therefore, for convenience's sake, the method selection carries out tuning, other data to the hyper parameter on a data set This parameter setting can also be used in collection, it can improves classification performance by arameter optimization;
(2) multi-angle of view constructs;
Multi-angle of view is constructed, different feature extracting methods can be used, diagram data is mapped, as diagram data converts For vector (graph2vec) and Mining Frequent subgraph;WithIndicate v kind mapping method, then for diagram data sample Gi, there is m Kind mapping method then has m visual angle
(3) valuation functions are determined;
In most cases, we may only obtain a small amount of positive example diagram data and largely without the diagram data of label.It is logical It may cause poor classification performance with order models.In order to overcome this problem, it is contemplated that based on sequence supporting vector The strategy of machine combination cost sensitivity, introduces similarity weight S.Similarity weight sijIt indicates between diagram data i and diagram data j Similarity is calculated by existing similarity method.If two diagram datas are more similar, the similarity weight of the two diagram datas It is bigger.The difference between two similar diagram datas is focused in conjunction with the model of this strategy energy let us, so as to improve classification Performance.What it is due to model treatment is multi-angle of view data, i.e., valuation functions should also follow the consistency and complementarity of multi-angle of view study, Therefore propose following target equation:
R1={ (p, u): Xp∈P,Xu∈U},
R2={ (i, j): Xi,Xj∈P+U},
A=1 ..., m-1, b=a+1 ..., m,
K=1 ..., n, v=1 ..., m;
(4) optimization of target equation;
Using method of Lagrange multipliers technology, the target equation of model is carried out being derived by its dual problem, about Non-negative Lagrange multiplierFormula is as follows:
Wherein
The dual problem derived can be solved using optimization algorithm such as SMO algorithm, the best glug solved is bright Day multiplierIt can be used to calculate Wv
(5) score is calculated;
For new diagram data G, constructing its m visual angle first is indicatedIts score can be used with Lower formula calculates:
(6) the label result of given diagram data sample is predicted using score;There are many kinds of the methods of prediction, such as can lead to It crosses and sets a certain threshold valueWhen score is greater than the threshold value, given diagram data sample is positive class, and be otherwise negative class;Or benefit With K arest neighbors (KNN) algorithm, by judge given diagram data score whether close to known positive class diagram data score, connect Closely be positive class, and be otherwise negative class.
Object module proposed by the present invention considers the classification problem of diagram data positive sample and unmarked sample.By being combined into The strategy of this sensitivity improves sequence supporting vector machine model property in the classification problem of diagram data positive sample and unmarked sample The poor situation of energy;Object module proposed by the present invention considers the classification problem that processing diagram data is removed from multi-angle.By same Multiple Feature Mappings of Shi Liyong diagram data, and joined the constraint of multi-angle of view, this model is improved to diagram data classification problem Generalization ability and classification performance.
In the prior art available to the diagram data classification method excavated based on Frequent tree mining, but the consideration of this method is The problem of positive sample and negative sample are classified the problem of for positive sample and unmarked sample, needs first with naive Bayesian Technologies such as (Naive Bayesian) find out the high negative class sample of reliability inside unlabeled exemplars, then high with these reliabilitys Negative class sample and existing positive class sample training classifier, this scheme extremely relies on the reliable of the negative class diagram data found out Property, generalization ability is relatively low, and this scheme, only from an angle research diagram data, nicety of grading is not also high.
The present invention is based on sequence supporting vector machine models to consider the problems of diagram data positive sample and unmarked sample, also examines Consider the classification problem that processing diagram data is removed from multi-angle;Strategy by combining cost sensitivity expands sequence support vector machines mould Type improves its for a small amount of positive example diagram data and largely poor classification performance of the diagram data problem without label;Pass through The related constraint for introducing multi-angle of view study, using the multi-angle of view of diagram data, to improve the effect of classification;The present invention is establishing mesh It can be optimized while marking model, to reduce complexity.
Above-mentioned is the preferable embodiment of the present invention, but embodiments of the present invention are not limited by the foregoing content, His any changes, modifications, substitutions, combinations, simplifications made without departing from the spirit and principles of the present invention, should be The substitute mode of effect, is included within the scope of the present invention.

Claims (1)

1. a kind of positive class and unmarked diagram data classification method based on multi-angle of view study, which is characterized in that including following steps It is rapid:
(1) parameter setting;The weight γ at each visual angle is setvvReact the importance at v-th of visual angle of diagram data;Setting punishment The factorIt allows the mistake in training process;Regularization parameter ε is set1, ε2;Non-negative slack variable is setNon-negative slack variableFor guaranteeing diagram data visual angle a and visual angle b consistency;
(2) multi-angle of view constructs;Multi-angle of view is constructed, different feature extracting methods can be used, diagram data is mapped, such as Diagram data is converted into vector (graph2vec) and Mining Frequent subgraph;WithIndicate v kind mapping method, then for diagram data Sample Gi, there is m kind mapping method then to have m visual angle
(3) valuation functions are determined;What it is due to model treatment is multi-angle of view data, i.e., valuation functions should also follow multi-angle of view study Consistency and complementarity, therefore propose following target equation:
R1={ (p, u): Xp∈P,Xu∈U},
R2={ (i, j): Xi,Xj∈P+U},
A=1 ..., m-1, b=a+1 ..., m,
K=1 ..., n, v=1 ..., m;
(4) optimization of target equation;Using method of Lagrange multipliers technology, it is right to carry out being derived by its to the target equation of model Even problem, about non-negative Lagrange multiplierFormula is as follows:
Wherein
The dual problem derived can be solved using optimization algorithm such as SMO algorithm, the best Lagrange solved multiplies SonIt can be used to calculate wv
(5) score is calculated;For new diagram data G, constructing its m visual angle first is indicatedIts score Following formula can be used to calculate:
(6) the label result of given diagram data sample is predicted using score.
CN201910549488.6A 2019-06-24 2019-06-24 A kind of positive class and unmarked diagram data classification method based on multi-angle of view study Pending CN110348493A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910549488.6A CN110348493A (en) 2019-06-24 2019-06-24 A kind of positive class and unmarked diagram data classification method based on multi-angle of view study

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910549488.6A CN110348493A (en) 2019-06-24 2019-06-24 A kind of positive class and unmarked diagram data classification method based on multi-angle of view study

Publications (1)

Publication Number Publication Date
CN110348493A true CN110348493A (en) 2019-10-18

Family

ID=68182873

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910549488.6A Pending CN110348493A (en) 2019-06-24 2019-06-24 A kind of positive class and unmarked diagram data classification method based on multi-angle of view study

Country Status (1)

Country Link
CN (1) CN110348493A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117237748A (en) * 2023-11-14 2023-12-15 南京信息工程大学 Picture identification method and device based on multi-view contrast confidence

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117237748A (en) * 2023-11-14 2023-12-15 南京信息工程大学 Picture identification method and device based on multi-view contrast confidence
CN117237748B (en) * 2023-11-14 2024-02-23 南京信息工程大学 Picture identification method and device based on multi-view contrast confidence

Similar Documents

Publication Publication Date Title
CN106845530B (en) character detection method and device
CN109558942B (en) Neural network migration method based on shallow learning
CN107194336B (en) Polarized SAR image classification method based on semi-supervised depth distance measurement network
CN102314614B (en) Image semantics classification method based on class-shared multiple kernel learning (MKL)
US9224071B2 (en) Unsupervised object class discovery via bottom up multiple class learning
CN114842365B (en) Unmanned aerial vehicle aerial photography target detection and identification method and system
CN109272011B (en) Multi-task depth representation learning method for clothing image classification
CN110348579A (en) A kind of domain-adaptive migration feature method and system
CN110457984A (en) Pedestrian's attribute recognition approach under monitoring scene based on ResNet-50
CN103745233B (en) The hyperspectral image classification method migrated based on spatial information
CN111274972B (en) Dish identification method and device based on measurement learning
CN104091038A (en) Method for weighting multiple example studying features based on master space classifying criterion
Chen et al. A novel long-term iterative mining scheme for video salient object detection
CN113283414A (en) Pedestrian attribute identification method, related equipment and computer readable storage medium
CN117152788A (en) Skeleton behavior recognition method based on knowledge distillation and multitasking self-supervision learning
CN116129286A (en) Method for classifying graphic neural network remote sensing images based on knowledge graph
CN117611932A (en) Image classification method and system based on double pseudo tag refinement and sample re-weighting
Chong et al. Erase then grow: Generating correct class activation maps for weakly-supervised semantic segmentation
Kumari et al. Deep learning techniques for remote sensing image scene classification: A comprehensive review, current challenges, and future directions
CN110414575A (en) A kind of semi-supervised multiple labeling learning distance metric method merging Local Metric
Song et al. Automatic CRP mapping using nonparametric machine learning approaches
CN108427730B (en) Social label recommendation method based on random walk and conditional random field
CN110348493A (en) A kind of positive class and unmarked diagram data classification method based on multi-angle of view study
CN110175500A (en) Refer to vein comparison method, device, computer equipment and storage medium
CN109242039A (en) It is a kind of based on candidates estimation Unlabeled data utilize method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191018