CN110348493A - A kind of positive class and unmarked diagram data classification method based on multi-angle of view study - Google Patents
A kind of positive class and unmarked diagram data classification method based on multi-angle of view study Download PDFInfo
- Publication number
- CN110348493A CN110348493A CN201910549488.6A CN201910549488A CN110348493A CN 110348493 A CN110348493 A CN 110348493A CN 201910549488 A CN201910549488 A CN 201910549488A CN 110348493 A CN110348493 A CN 110348493A
- Authority
- CN
- China
- Prior art keywords
- diagram data
- angle
- sample
- score
- view
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000010586 diagram Methods 0.000 title claims abstract description 114
- 238000000034 method Methods 0.000 title claims abstract description 64
- 230000000007 visual effect Effects 0.000 claims abstract description 25
- 239000013598 vector Substances 0.000 claims abstract description 16
- 230000006870 function Effects 0.000 claims abstract description 15
- 238000012549 training Methods 0.000 claims description 9
- 238000013507 mapping Methods 0.000 claims description 7
- 238000005065 mining Methods 0.000 claims description 7
- 238000005457 optimization Methods 0.000 claims description 7
- 230000009977 dual effect Effects 0.000 claims description 5
- 238000005516 engineering process Methods 0.000 claims description 5
- 238000012706 support-vector machine Methods 0.000 abstract description 3
- 230000035945 sensitivity Effects 0.000 description 5
- 241001269238 Data Species 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000007812 deficiency Effects 0.000 description 2
- 238000003780 insertion Methods 0.000 description 2
- 230000037431 insertion Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- HUTDUHSNJYTCAR-UHFFFAOYSA-N ancymidol Chemical compound C1=CC(OC)=CC=C1C(O)(C=1C=NC=NC=1)C1CC1 HUTDUHSNJYTCAR-UHFFFAOYSA-N 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 150000001875 compounds Chemical class 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of positive classes and unmarked diagram data classification method based on multi-angle of view study, construct multiple visual angles of diagram data first, i.e., by multiple and different diagram data feature extracting methods, diagram data sample is converted into multiple and different characteristic vectors;The relationship of sample and label is determined followed by multiple visual angles building valuation functions of diagram data, valuation functions are constructed based on sequence support vector machines (rankingSVM), the positive class score that then its evaluated function calculates is higher, that is, the score that the valuation functions constructed meet positive class diagram data is greater than the score of negative class diagram data more being likely to be for diagram data sample;Finally for given diagram data sample, its multiple visual angle is first obtained, then calculates score by valuation functions, to predict the label of given diagram data sample;The present invention can efficiently use the multi-angle of view of diagram data to handle and classify based on the diagram data of positive sample and unmarked sample, guarantee the accuracy of classification.
Description
Technical field
The present invention relates to machine learning techniques fields, and in particular to a kind of positive class and unmarked based on multi-angle of view study
Diagram data classification method.
Background technique
With the development of big data era, concern of the classification of diagram data by more and more people, the reason is that diagram data
There is powerful expressive force.Diagram data can be used to describe certain particular kind of relationship between certain things, represent things with point, with even
The line for connecing two o'clock indicates there is certain relationship between corresponding two things.Diagram data (graph data) is a kind of abstract data knot
Structure is made of vertex and side.Diagram data description or modeling, such as DNA, compound, social networks can be used in many elements.Figure
Data classification refers to the different characteristic according to diagram data, is categorized into positive class or negative class.Most of existing diagram data classification side
Method all assumes that there are positive classes and negative class in training sample, but in the practical application of part, training sample only exists the sum of positive class
(may be positive class be also likely to be negative class) of no label, such as in drug research, researcher is easier to find certain chemical combination
Object has positive effect to disease.In this case, positive sample and unmarked sample (positive and unlabeled are based on
Learning universal method) can be used for diagram data classification.
In existing PU Study on Problems, the most common PU classification method has three classes: the method for (I) based on two-step Taylor-Galerkin
(two-stepstrategy based methods);(II) method (probability based on probability Estimation
estimation based methods);(III) (the cost-sensitive based of the method based on constituent-sensitive
methods).Method based on two-step Taylor-Galerkin selects reliable negative class example or reliable positive class first from the sample of no label
Then example constructs classifier using positive class example and negative class example;Method based on probability Estimation is that sample estimates belong to just
The probability of class, is then predicted;Method based on cost sensitivity be by positive class sample and the sample without label provide it is suitable
When weight train classifier.The characteristics of due to diagram data, these general PU classification methods cannot be directly used to diagram data.
Existing diagram data classification method provides a thinking, they will be schemed by extracting different types of feature on diagram data
Data are converted to digital vectors.These conversion methods are divided into two classes: method (the feature mining of (I) based on feature mining
based methods);(II) method (embedding learning based methods) based on insertion study.It is based on
The method of feature mining finds the topological characteristic or Sub-Image Feature for being conducive to classification first, then whether contains phase according to diagram data
The feature answered, each diagram data can use the vector representation being made of 0 and 1;Method based on insertion study is desirable to scheme
During data are converted to low dimension vector, the structure feature and attribute of diagram data are farthest remained.
In addition, most of existing diagram data classification method only has studied the case where describing diagram data from an angle,
Referred to as single-view study (single view learning).In practice, we can describe object from multiple angles, because
We can also describe data from different perspectives for this.By the diversity of data different perspectives come mention it is high performance in the way of be known as it is more
Visual angle learns (multi-viewlearning).Positive class diagram data especially is only existed in training sample and without label diagram data
In the case of, it would be desirable to more diagram data information are utilized, and the thought of multi-angle of view study can provide more for diagram data classification
Information abundant.
But existing diagram data classification method only to be ground to there are the case where positive sample and negative sample in training sample
The case where studying carefully, being not appropriate for based on positive sample and unmarked sample, limits the application that diagram data is sorted in real life;And
And existing diagram data classification method only describes diagram data from an angle, so as to cause the deficiency of classification performance.
Summary of the invention
The purpose of the present invention is to overcome the shortcomings of the existing technology and deficiency, provides a kind of positive class based on multi-angle of view study
With unmarked diagram data classification method, this method can efficiently use the multi-angle of view of diagram data to handle based on positive sample and nothing
The diagram data of marker samples is classified, and guarantees the accuracy of classification.
The purpose of the invention is achieved by the following technical solution:
A kind of positive class and unmarked diagram data classification method based on multi-angle of view study, includes the following steps:
(1) parameter setting;The weight γ at each visual angle is setv,γvReact the importance at v-th of visual angle of diagram data;Setting
Penalty factorIt allows the mistake in training process;Regularization parameter ε is set1, ε2;Non-negative relaxation is set
VariableNon-negative slack variableFor guaranteeing diagram data visual angle a and visual angle b consistency;
(2) multi-angle of view constructs;Multi-angle of view is constructed, different feature extracting methods can be used, diagram data is reflected
It penetrates, as diagram data is converted into vector (graph2vec) and Mining Frequent subgraph;WithIndicate v kind mapping method, then for
Diagram data sample Gi, there is m kind mapping method then to have m visual angle
(3) valuation functions are determined;What it is due to model treatment is multi-angle of view data, i.e. valuation functions should also follow multi-angle of view
The consistency and complementarity of habit, therefore propose following target equation:
R1={ (p, u): Xp∈P,Xu∈U},
R2={ (i, j): Xi,Xj∈P+U},
A=1 ..., m-1, b=a+1 ..., m,
K=1 ..., n, v=1 ..., m;
(4) optimization of target equation;Using method of Lagrange multipliers technology, the target equation of model is derived by
Its dual problem, about non-negative Lagrange multiplierFormula is as follows:
Wherein
The dual problem derived can be solved using optimization algorithm such as SMO algorithm, the best glug solved is bright
Day multiplierIt can be used to calculate wv;
(5) score is calculated;For new diagram data G, constructing its m visual angle first is indicatedIts
Following formula can be used to calculate for score:
(6) the label result of given diagram data sample is predicted using score.
The present invention have compared with prior art it is below the utility model has the advantages that
(1) the present invention is based on sequence supporting vector machine models to consider the problems of diagram data positive sample and unmarked sample,
Also consider the classification problem that processing diagram data is removed from multi-angle;
(2) present invention improves it for few by combining the strategy of cost sensitivity to expand sequence supporting vector machine model
The positive example diagram data of amount and the largely poor classification performance of the diagram data problem without label;
(3) related constraint that the present invention is learnt by introducing multi-angle of view, using the multi-angle of view of diagram data, to improve classification
Effect;
(4) present invention can optimize it while establishing object module, to reduce complexity.
Detailed description of the invention
Fig. 1 is training pattern flow diagram of the invention;
Fig. 2 is diagram data classification process schematic diagram of the invention.
Specific embodiment
Present invention will now be described in further detail with reference to the embodiments and the accompanying drawings, but embodiments of the present invention are unlimited
In this.
The present invention proposes a kind of positive class and unmarked diagram data classification method based on multi-angle of view study, first structure figures
Diagram data sample is converted into multiple and different by multiple visual angles of data that is, by multiple and different diagram data feature extracting methods
Characteristic vector;The relationship of sample and label is determined followed by multiple visual angles building valuation functions of diagram data, we are bases
Valuation functions are constructed in sequence support vector machines (rankingSVM), diagram data sample is more likely to be positive class then its evaluated letter
The score that number calculates is higher, i.e., the score that the valuation functions that we construct meet positive class diagram data is greater than obtaining for negative class diagram data
Point;Finally for given diagram data sample, its multiple visual angle is first obtained, then score is calculated by valuation functions, thus in advance
Survey the label of given diagram data sample.
This method is broadly divided into three phases: containing n for givenpThe set of a positive example diagram dataAnd nuThe set of a no label diagram dataEach diagram data G
Have class label y, a y ∈ Y=+1,0, -1, and indicate the label of diagram data be it is positive, it is no label and negative;(1) first
Using different diagram data feature extracting methods by diagram data sample GiIt is mapped as v feature vector(v) v kind is represented
Extracting method, by v feature vectorAs the input of multi-angle of view sequence SVM, wherein multi-angle of view sequence SVM is by multiple
Different vector W(v)Composition;(2) after the first stage, multi-angle of view sequence SVM can generate classification score for diagram data sample.
(3) score prediction label result is finally utilized.
Specifically, as shown in Fig. 1~2, a kind of positive class based on multi-angle of view study and unmarked diagram data classification side
Method includes the following steps:
(1) parameter setting;
The weight γ at each visual angle is setv,γvReact the importance at v-th of visual angle of diagram data;Penalty factor is setIt allows the mistake in training process;Regularization parameter ε is set1, ε2;Non-negative slack variable is setNon-negative slack variableFor guaranteeing diagram data visual angle a and visual angle b consistency.Side herein
In method, the configuration of hyper parameter is influential on classifying quality.In implementation process, tested by the intersection to each data set
Card, it can be deduced that the experience range of choice of hyper parameter, but different hyper parameters is just selected for different data sets, this is just very
It takes time and effort.Therefore, for convenience's sake, the method selection carries out tuning, other data to the hyper parameter on a data set
This parameter setting can also be used in collection, it can improves classification performance by arameter optimization;
(2) multi-angle of view constructs;
Multi-angle of view is constructed, different feature extracting methods can be used, diagram data is mapped, as diagram data converts
For vector (graph2vec) and Mining Frequent subgraph;WithIndicate v kind mapping method, then for diagram data sample Gi, there is m
Kind mapping method then has m visual angle
(3) valuation functions are determined;
In most cases, we may only obtain a small amount of positive example diagram data and largely without the diagram data of label.It is logical
It may cause poor classification performance with order models.In order to overcome this problem, it is contemplated that based on sequence supporting vector
The strategy of machine combination cost sensitivity, introduces similarity weight S.Similarity weight sijIt indicates between diagram data i and diagram data j
Similarity is calculated by existing similarity method.If two diagram datas are more similar, the similarity weight of the two diagram datas
It is bigger.The difference between two similar diagram datas is focused in conjunction with the model of this strategy energy let us, so as to improve classification
Performance.What it is due to model treatment is multi-angle of view data, i.e., valuation functions should also follow the consistency and complementarity of multi-angle of view study,
Therefore propose following target equation:
R1={ (p, u): Xp∈P,Xu∈U},
R2={ (i, j): Xi,Xj∈P+U},
A=1 ..., m-1, b=a+1 ..., m,
K=1 ..., n, v=1 ..., m;
(4) optimization of target equation;
Using method of Lagrange multipliers technology, the target equation of model is carried out being derived by its dual problem, about
Non-negative Lagrange multiplierFormula is as follows:
Wherein
The dual problem derived can be solved using optimization algorithm such as SMO algorithm, the best glug solved is bright
Day multiplierIt can be used to calculate Wv;
(5) score is calculated;
For new diagram data G, constructing its m visual angle first is indicatedIts score can be used with
Lower formula calculates:
(6) the label result of given diagram data sample is predicted using score;There are many kinds of the methods of prediction, such as can lead to
It crosses and sets a certain threshold valueWhen score is greater than the threshold value, given diagram data sample is positive class, and be otherwise negative class;Or benefit
With K arest neighbors (KNN) algorithm, by judge given diagram data score whether close to known positive class diagram data score, connect
Closely be positive class, and be otherwise negative class.
Object module proposed by the present invention considers the classification problem of diagram data positive sample and unmarked sample.By being combined into
The strategy of this sensitivity improves sequence supporting vector machine model property in the classification problem of diagram data positive sample and unmarked sample
The poor situation of energy;Object module proposed by the present invention considers the classification problem that processing diagram data is removed from multi-angle.By same
Multiple Feature Mappings of Shi Liyong diagram data, and joined the constraint of multi-angle of view, this model is improved to diagram data classification problem
Generalization ability and classification performance.
In the prior art available to the diagram data classification method excavated based on Frequent tree mining, but the consideration of this method is
The problem of positive sample and negative sample are classified the problem of for positive sample and unmarked sample, needs first with naive Bayesian
Technologies such as (Naive Bayesian) find out the high negative class sample of reliability inside unlabeled exemplars, then high with these reliabilitys
Negative class sample and existing positive class sample training classifier, this scheme extremely relies on the reliable of the negative class diagram data found out
Property, generalization ability is relatively low, and this scheme, only from an angle research diagram data, nicety of grading is not also high.
The present invention is based on sequence supporting vector machine models to consider the problems of diagram data positive sample and unmarked sample, also examines
Consider the classification problem that processing diagram data is removed from multi-angle;Strategy by combining cost sensitivity expands sequence support vector machines mould
Type improves its for a small amount of positive example diagram data and largely poor classification performance of the diagram data problem without label;Pass through
The related constraint for introducing multi-angle of view study, using the multi-angle of view of diagram data, to improve the effect of classification;The present invention is establishing mesh
It can be optimized while marking model, to reduce complexity.
Above-mentioned is the preferable embodiment of the present invention, but embodiments of the present invention are not limited by the foregoing content,
His any changes, modifications, substitutions, combinations, simplifications made without departing from the spirit and principles of the present invention, should be
The substitute mode of effect, is included within the scope of the present invention.
Claims (1)
1. a kind of positive class and unmarked diagram data classification method based on multi-angle of view study, which is characterized in that including following steps
It is rapid:
(1) parameter setting;The weight γ at each visual angle is setv,γvReact the importance at v-th of visual angle of diagram data;Setting punishment
The factorIt allows the mistake in training process;Regularization parameter ε is set1, ε2;Non-negative slack variable is setNon-negative slack variableFor guaranteeing diagram data visual angle a and visual angle b consistency;
(2) multi-angle of view constructs;Multi-angle of view is constructed, different feature extracting methods can be used, diagram data is mapped, such as
Diagram data is converted into vector (graph2vec) and Mining Frequent subgraph;WithIndicate v kind mapping method, then for diagram data
Sample Gi, there is m kind mapping method then to have m visual angle
(3) valuation functions are determined;What it is due to model treatment is multi-angle of view data, i.e., valuation functions should also follow multi-angle of view study
Consistency and complementarity, therefore propose following target equation:
R1={ (p, u): Xp∈P,Xu∈U},
R2={ (i, j): Xi,Xj∈P+U},
A=1 ..., m-1, b=a+1 ..., m,
K=1 ..., n, v=1 ..., m;
(4) optimization of target equation;Using method of Lagrange multipliers technology, it is right to carry out being derived by its to the target equation of model
Even problem, about non-negative Lagrange multiplierFormula is as follows:
Wherein
The dual problem derived can be solved using optimization algorithm such as SMO algorithm, the best Lagrange solved multiplies
SonIt can be used to calculate wv;
(5) score is calculated;For new diagram data G, constructing its m visual angle first is indicatedIts score
Following formula can be used to calculate:
(6) the label result of given diagram data sample is predicted using score.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910549488.6A CN110348493A (en) | 2019-06-24 | 2019-06-24 | A kind of positive class and unmarked diagram data classification method based on multi-angle of view study |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910549488.6A CN110348493A (en) | 2019-06-24 | 2019-06-24 | A kind of positive class and unmarked diagram data classification method based on multi-angle of view study |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110348493A true CN110348493A (en) | 2019-10-18 |
Family
ID=68182873
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910549488.6A Pending CN110348493A (en) | 2019-06-24 | 2019-06-24 | A kind of positive class and unmarked diagram data classification method based on multi-angle of view study |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110348493A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117237748A (en) * | 2023-11-14 | 2023-12-15 | 南京信息工程大学 | Picture identification method and device based on multi-view contrast confidence |
-
2019
- 2019-06-24 CN CN201910549488.6A patent/CN110348493A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117237748A (en) * | 2023-11-14 | 2023-12-15 | 南京信息工程大学 | Picture identification method and device based on multi-view contrast confidence |
CN117237748B (en) * | 2023-11-14 | 2024-02-23 | 南京信息工程大学 | Picture identification method and device based on multi-view contrast confidence |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106845530B (en) | character detection method and device | |
CN109558942B (en) | Neural network migration method based on shallow learning | |
CN107194336B (en) | Polarized SAR image classification method based on semi-supervised depth distance measurement network | |
CN102314614B (en) | Image semantics classification method based on class-shared multiple kernel learning (MKL) | |
US9224071B2 (en) | Unsupervised object class discovery via bottom up multiple class learning | |
CN114842365B (en) | Unmanned aerial vehicle aerial photography target detection and identification method and system | |
CN109272011B (en) | Multi-task depth representation learning method for clothing image classification | |
CN110348579A (en) | A kind of domain-adaptive migration feature method and system | |
CN110457984A (en) | Pedestrian's attribute recognition approach under monitoring scene based on ResNet-50 | |
CN103745233B (en) | The hyperspectral image classification method migrated based on spatial information | |
CN111274972B (en) | Dish identification method and device based on measurement learning | |
CN104091038A (en) | Method for weighting multiple example studying features based on master space classifying criterion | |
Chen et al. | A novel long-term iterative mining scheme for video salient object detection | |
CN113283414A (en) | Pedestrian attribute identification method, related equipment and computer readable storage medium | |
CN117152788A (en) | Skeleton behavior recognition method based on knowledge distillation and multitasking self-supervision learning | |
CN116129286A (en) | Method for classifying graphic neural network remote sensing images based on knowledge graph | |
CN117611932A (en) | Image classification method and system based on double pseudo tag refinement and sample re-weighting | |
Chong et al. | Erase then grow: Generating correct class activation maps for weakly-supervised semantic segmentation | |
Kumari et al. | Deep learning techniques for remote sensing image scene classification: A comprehensive review, current challenges, and future directions | |
CN110414575A (en) | A kind of semi-supervised multiple labeling learning distance metric method merging Local Metric | |
Song et al. | Automatic CRP mapping using nonparametric machine learning approaches | |
CN108427730B (en) | Social label recommendation method based on random walk and conditional random field | |
CN110348493A (en) | A kind of positive class and unmarked diagram data classification method based on multi-angle of view study | |
CN110175500A (en) | Refer to vein comparison method, device, computer equipment and storage medium | |
CN109242039A (en) | It is a kind of based on candidates estimation Unlabeled data utilize method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191018 |