CN106682687A - Multi-example learning method using deep learning technology - Google Patents
Multi-example learning method using deep learning technology Download PDFInfo
- Publication number
- CN106682687A CN106682687A CN201611148420.XA CN201611148420A CN106682687A CN 106682687 A CN106682687 A CN 106682687A CN 201611148420 A CN201611148420 A CN 201611148420A CN 106682687 A CN106682687 A CN 106682687A
- Authority
- CN
- China
- Prior art keywords
- sample
- many example
- example samples
- output
- max
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
- G06F18/2155—Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Computational Linguistics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a multi-example learning method using a deep learning technology. A series of data filling and segmentation means are used to convert multiple-example samples into characteristic matrixes of the same size, and a convolutional neural network is used to carry out learning with monitoring and classification. According to the method of the invention, hidden abstract concepts in a multi-example data set can be discovered, errors of the data set can be tolerated effectively, and the generalization capability is high.
Description
Technical field
The present invention relates to a kind of machine learning method, a kind of especially multi-instance learning side of application depth learning technology
Method.
Background technology
Multi-instance learning is a branch of machine learning, and the wide of machine learning researcher has just been received since appearance
General concern.Multi-instance learning is suggested in the activity analysis problem of drug molecule, is known in image classification, voice afterwards
Not, the field such as text understanding is all widely used.In the classification problem of multi-instance learning, the sample bag for training learner
Containing multiple examples, and only some of which sample can just play a decisive role to the tag along sort of example, and in this example its
Its sample is to classification without any effect.Such as piece image is made up of many less regional areas, and only some parts
Region just can determine that the classification (such as portrait, landscape, building etc.) of the image.But in the data of general multi-instance learning
Concentrate, classification information is only associated with sample, without being associated with certain example inside sample.It is currently used for solving show more
The method of example problem concerning study mainly has:MI-SVM, DD-SVM, MI-RBF, Citation-KNN, MI-Boosting, Bayes are more
Learn-by-example, the multi-instance learning method based on Gaussian process etc..Document " Kim, Minyoung and Torre,
Fernando.Multiple Instance Learning via Gaussian Processes.Data
Min.Knowl.Discov., Kluwer Academic Publishers, 2014,28, pp.1078-1106 " reports one kind
Multi-instance learning method based on Gaussian process, first passes through core Gram matrix of the foundation based on example and obtains example relative to target
The prior distribution of tag along sort, then sets up example to after target classification label by Bayesian formula and maximum likelihood theory
Test distribution, and learnt to this Posterior distrbutionp and by its derived prediction distribution by iterative algorithm, finally give one from
To the probability distribution of label, the subject matter of the method is huge amount of calculation to sample, and the mistake in the case of sample size is less
The situation of fitting is spent than more serious.
The weak point of existing method is:
(1) existing methods are to be based on modifying existing single example machine learning algorithm mostly, are allowed to be applied to and show more
In the scene of example study, and this modification is limited by a lot, and the complexity that can cause algorithm increases, efficiency decline, trains
The problems such as information loss of data set, so that the effect of algorithm is not very good;
(2) supervised learning, thus feature and the matter of label very dependent on data intensive data are based on existing methods more
Amount, the robustness of algorithm is poor, and the slight error in data set can be exaggerated, and have large effect to the accuracy of model;
(3) based on statistics more than existing methods, the probability distribution to input feature vector is modeled, and this is simultaneously unfavorable for many number of examples
According to the discovery and modeling of the implicit abstract concept concentrated, the accuracy and Generalization Capability of final disaggregated model is set all to receive sternly
The limitation of weight.
Based on this, the present invention proposes a kind of multi-instance learning method based on deep learning, is filled out by a series of data
Fill and partition means, many example samples are converted into etc. the eigenmatrix of size, having carried out supervision using convolutional neural networks learns
Practise and classify, the present invention is it can be found that the implicit abstract concept of many sample datas concentration, has appearance well to the error of data set
Bear ability, there is good generalization ability.
The content of the invention
In order to overcome the weak point of existing multi-instance learning method, including existing method is mostly to existing single example
The modification of learning method, it is very sensitive to the quality of data, based on statistics and probability distribution etc., patent of the present invention proposes one kind and is based on
The multi-instance learning method of deep learning.The normalization of feature of the invention including many example sample attributes, many example samples
Example expansion, training process and assorting process, each process include several steps, and its feature is described as follows respectively:
(1) normalization of many example sample attributes of
If the dimension of example is m, comprising connection attribute and Category Attributes.
A. the method for normalizing of connection attribute is:For certain connection attribute pi, in all examples in data set, obtain
piMaximum and minimum value, be designated as respectivelyWithThe computational methods of the connection attribute are after normalization:Meanwhile, each connection attribute in the record all examples of training datasetWithInformation, these letters
Breath will be used for the model training well normalization of unknown test data afterwards;
B. the dummy variable of Category Attributes:For including the k Category Attributes of probable value, using dummy variable, that is, turn
An one-dimensional vector for having k element is turned to, when the value of the attribute is certain probable value, corresponding element sets in one-dimensional vector
1 is set to, remaining element is 0, in this k dimensional vector, for each data record, it is 1 that there can only be an element, remaining
0 is, Category Attributes no longer carry out other normalization operations after dummy variable;
(2) example of many example samples of expands
The purpose that the example of many example samples expands is that each sample for concentrating training data has identical
Number of examples, can so carry out the deep learning of convolutional neural networks.If many sample datas of target include k classification, expansion side
Method step is as follows:
A. the maximum number of examples n that setting example expandsmaxThe maximum number of examples included by sample in training set;
B. for some target classification, the training sample that training data is concentrated can be divided into the sample for belonging to the category
(positive example) set DPWith sample (negative example) the set D for being not belonging to the categoryN, the example in all of negative example sample is placed on one
In set and upset order, the set is designated as DIN;
C. for the sample in each training set, if the number of examples that it is included is less than nmax, then at random from DINMiddle extraction
Example, is added in the training sample, and the number of examples for including it is equal to nmax, after completing the step, for each training sample
This, there is nmaxIndividual example, each example is that m is tieed up (before dummy variable is carried out), and each example is converted into a nmaxRow m is arranged
Real-valued matrix;
D. step C is repeated q times, i.e., for each sample, from DINIn randomly select example, be put into sample, formed new
Sample, this process repeats q time, then the popularization of data set is to about original q times (the wherein example of some samples of possibility
Number is nmax, then this part sample be not involved in expand)
E. example sequence therein is upset p times, p=n by the sample after expansion for eachmax/ 2, every time
Upset the sample of example sequence as a new sample, the popularization of such data is to original p times;
F. the label of sample expands with the expansion of example, i.e., for the sample that certain is classified, the new samples after expansion
As the tag along sort of original sample;
(3) design of convolutional neural networks
A. it is input into:nmaxThe real-valued matrix of row m row, port number is 1
B. convolutional layer does not increase by 0 filling using the convolution kernel of 5*5, does not zoom in and out, and swashs followed by one in convolutional layer
Layer living, is selected in following two activation primitives:
a)Relu:Y=max (x, 0), wherein x are the output of last layer, and y is the output of this active coating;
b)Sigmoid:Wherein x is the output of last layer, and y is the output of this active coating;
C. from 64, each layer increases by 64 passages to the port number of convolutional layer than last layer, and largest passages number is no more than
512 passages, when the port number of a certain convolutional layer reaches 512, then after the port number of convolutional layer be not further added by.
D. when reaching 1 by one of dimension of convolutional layer and the characteristic pattern of the output of active coating, no longer rolled up
Product operation, and characteristic pattern is input to full articulamentum, 8 full articulamentums are set altogether, if last convolutional layer-active coating
Output dimension is 1*w, then this 8 dimensions of full articulamentum random value between interval [w, 8w];
E. increase by dropout layers of 20% between the full articulamentum of each two, i.e., in advance at random upper one full articulamentum
Output unit 20% shielding;
F. output layer and last full articulamentum are connected in the way of full connection, the dimension of output layer and dividing for data set
Class quantity is consistent.
(4) model trainings
The weights of network are adjusted using the error back propagation learning algorithm of convolutional neural networks, adjustment is basis
The input of network is carried out with the difference of output.Specifically, for the sample of each input model, the output of network is one
Vector of the individual dimension as categorical measure, the computational methods of error are between network output vector and real categorization vector
Hamming distances are divided by vectorial dimension, when output vector is identical with true categorization vector, the Hamming distances between them
It is 0, error is 0, when output vector is entirely different with true categorization vector, the Hamming distances between them are identical with dimension,
Error now is 1.
The weights of network are initialized using the random number between [0,1], carry out many wheel training, and all training samples are defeated
Enter in network and complete weighed value adjusting for a wheel, untill the output error of network no longer declines.
(5) sample classifications
When need to example sample is classified more than one when, the property value to sample is normalized first, each category
Numerical value in the maximum and minimum value training set of property, then carries out example expansion, now there is two kinds of situations:
A. the number of examples that sample to be sorted includes is less than or equal to nmax, according to (2nd) of patented invention content of the present invention
Point, expands example, and the number of examples for including it is equal to nmax, i.e., some examples are chosen from negative example set at random, treat
The sample of classification carries out example expansion.For each sample to be sorted, v expansion is carried out, wherein v is odd number, then with instruction
The network perfected is classified to the sample after each expansion, and the v classification results for obtaining are voted, many results of gained vote
As the final classification label of the sample to be sorted.
B. the number of examples for being included in sample to be sorted is more than nmax, the example treated in classification samples randomly selected,
N is extracted every timemaxIt is individual, extract v times altogether, wherein v is odd number, and the sample after example is then taken out to each with the network for training
Originally classified, the v classification results for obtaining are voted, many results of gained vote are divided as the final of the sample to be sorted
Class.
Specific embodiment
The present invention is in UCI data sets (http://archive.ics.uci.edu/ml/) on tested, achieve compared with
Good effect.One embodiment is given below, with the Musk2 of UCI data sets (http://archive.ics.uci.edu/ml/ Datasets/Musk+%28Version+2%29) used as test data set, the data set is example data set more than, is had
Example 6598, data attribute 168, all connection attributes, the minimum example number contained in sample is 13, is contained in sample
Maximum number of examples be 51.
(1) data prediction
For connection attribute, the maximum and minimum value of the attribute in data set are found out, and with the present invention for continuous category
The preprocess method of property is processed.For example:For continuation field f1, its maximum and minimum is obtained in all data
Value, respectively 292 and -3, then for data set in the 1st article record the field, it normalization after value be:(46-(-
3))/(292- (- 3))=0.1661, in the present embodiment, the data after normalization round up, and remain into 4 after decimal point.
(2) example expands
First the example included in all negative example samples in data set is all placed in a set, that is, bears example collection,
Then the sample to whole data set carries out example expansion, bears the total example 5581 of example collection.Sample is included in data set
Maximum number of examples is 51, if the number of examples that then sample is included is less than 51, is concentrated from negative example extract example supplement at random
To in the sample, its number of examples is equal to 51, for each sample, repeat such process 10 times, be i.e. each original sample warp
Crossing after example expands can obtain 10 samples;Afterwards 51 examples inside each sample are carried out with the random of order to upset,
Intend 10 times altogether, each can obtain 10 samples upset comprising same example but order by the sample after example expansion
This.So far, the scale of data set is increased to original 100 times, and 51 samples are included in each example, and dimension is 168.
(3) network design
Using convolutional neural networks deep learning model, the design of network is as shown in table 1:
The network design table of table 1.
(3) network trainings
Data set, is made Matlab data by the network structure during table 1 is realized by configuration file in MatConvNet
File .mat forms, the training script cnn_train.m for then being provided using MatConvNet is trained.Training carries out 30
Wheel, the learning rate for using is that preceding 10 0.05,11-20 of wheel take turns 0.005,21-30 wheels 0.0005.The loss function of training is used
zero-one loss.By after 30 wheel training, system can generate 30 .mat files, at the end of each wheel training is saved respectively
The parameter of model, these .mat files are each model taken turns and train, and can be used for the classification to unknown many example samples.
(4) sample classifications
The network model at the end of the 30th wheel training is used as disaggregated model, many example samples to be sorted for
This, is normalized to its property value first, and the attribute maximum for being used is obtained and minimum value is the analog value in training set;So
After carry out example expansion, processed less than or equal to 51 and more than 51 two kinds of situations according to the number of examples of sample to be sorted, the side for the treatment of
Method is carried out according to (5th) dot of the 5th point " content of invention ", and wherein v values take 11.11 samples expanded by example point
It is not input in sorter network, obtains 11 key words sortings, then voted, that key words sorting more than poll is model
To the final classification mark of the sample to be sorted.
Claims (6)
1. a kind of multi-instance learning method based on deep learning, methods described can be used for the classification to many example samples, and it is special
Levy and be, the normalization including many example sample attributes, the example of many example samples expand, design convolution god methods described successively
Through network, training process and assorting process.
2. the method for claim 1, it is characterised in that the normalization of described many example sample attributes includes following step
Suddenly,
(1) it is many example samples of m to obtain dimension, and many example samples are provided with connection attribute and Category Attributes;
(2) the connection attribute p of many example samples is obtainedi, and seek piMaximum and minimum value, be designated as respectivelyWithAnd the connection attribute is normalized is calculated its normalized value:
(3) Category Attributes to the data carry out dummy variable, and the dummy variable includes, will include k probable value
Category Attributes are recorded as an one-dimensional vector for having k element, when the value of Category Attributes is certain probable value, in one-dimensional vector
Corresponding element is set to 1, and remaining element is 0.
3. method as claimed in claim 2, it is characterised in that the example of many example samples expands to be included,
A. the maximum number of examples n that setting example expandsmaxThe maximum number of examples included by many example samples;
B. many example samples are classified according to different target, for some target classification, by many example samples
Originally it is divided into the sample set D for belonging to the categoryPWith the sample set D for being not belonging to the categoryN, all of sample set DNIn
Sample be placed in a set and upset order, the set is designated as DIN;
C. for many example samples any one described, if the sample number that it is included is less than nmax, then at random from DINMiddle sample drawn pair
It expand the sample number for including it equal to nmax, will each described many example sample be extended for a nmaxRow m row
Real-valued matrix;
D. many example samples are repeated to expand q times using step C;
E. sample order therein is upset p times by many example samples after expansion for each, order is upset each
Many example samples as new many example samples.
4. method as claimed in claim 3, it is characterised in that the design of the convolutional neural networks includes, the convolution god
It is n through the input of networkmaxThe real-valued matrix of row m row, sets port number as 1;
A. the convolutional layer of the convolutional neural networks does not increase by 0 filling using the convolution kernel of 5*5, does not zoom in and out, in convolutional layer
Followed by an active coating, any selection is carried out in following two activation primitives:
a)Relu:Y=max (x, 0), wherein x are the output of last layer, and y is the output of this active coating;
b)Sigmoid:Wherein x is the output of last layer, and y is the output of this active coating;
B. the port number to the convolutional layer increases:I.e. from the convolutional layer that port number is 64, each layer increases than last layer
Plus 64 passages, largest passages number has been no more than 512 passages;
C. when reaching 1 by one of dimension of convolutional layer and the characteristic pattern of the output of active coating, convolution behaviour is no longer carried out
Make, and characteristic pattern is input to full articulamentum, 8 full articulamentums, the output dimension of last convolutional layer-active coating are set altogether
Number is 1*w, described 8 dimensions of full articulamentum random value between interval [w, 8w];
D. dropout layers of 20% is increased between the full articulamentum of each two, i.e., upper one output unit of full articulamentum
20% shielding;
E. output layer is connected with last full articulamentum in the way of full connection, and the dimension of output layer is dividing for many example samples
Class quantity.
5. method as claimed in claim 4, it is characterised in that the model training includes,
The convolutional neural networks are adjusted using the error back propagation learning algorithm of convolutional neural networks, until its is defeated
Go out untill error no longer declines.
6. method as claimed in claim 5, it is characterised in that the sample classification includes,
Many example samples are randomly selected, n is extracted every timemaxIt is individual, extract v times altogether, wherein v is odd number, Ran Houyong
The convolutional neural networks are classified to each sample for extracting, and the v classification results for obtaining are counted, to count knot
Really most classification as many example samples final classification.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611148420.XA CN106682687A (en) | 2016-12-13 | 2016-12-13 | Multi-example learning method using deep learning technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611148420.XA CN106682687A (en) | 2016-12-13 | 2016-12-13 | Multi-example learning method using deep learning technology |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106682687A true CN106682687A (en) | 2017-05-17 |
Family
ID=58869589
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611148420.XA Pending CN106682687A (en) | 2016-12-13 | 2016-12-13 | Multi-example learning method using deep learning technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106682687A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108197638A (en) * | 2017-12-12 | 2018-06-22 | 阿里巴巴集团控股有限公司 | The method and device classified to sample to be assessed |
CN114633774A (en) * | 2022-03-30 | 2022-06-17 | 东莞理工学院 | Rail transit fault detection system based on artificial intelligence |
-
2016
- 2016-12-13 CN CN201611148420.XA patent/CN106682687A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108197638A (en) * | 2017-12-12 | 2018-06-22 | 阿里巴巴集团控股有限公司 | The method and device classified to sample to be assessed |
WO2019114305A1 (en) * | 2017-12-12 | 2019-06-20 | 阿里巴巴集团控股有限公司 | Method and device for classifying samples to be assessed |
CN108197638B (en) * | 2017-12-12 | 2020-03-20 | 阿里巴巴集团控股有限公司 | Method and device for classifying sample to be evaluated |
CN114633774A (en) * | 2022-03-30 | 2022-06-17 | 东莞理工学院 | Rail transit fault detection system based on artificial intelligence |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109344736B (en) | Static image crowd counting method based on joint learning | |
CN103793718B (en) | Deep study-based facial expression recognition method | |
CN103955702B (en) | SAR image terrain classification method based on depth RBF network | |
CN104376326B (en) | A kind of feature extracting method for image scene identification | |
CN110135267A (en) | A kind of subtle object detection method of large scene SAR image | |
CN109190665A (en) | A kind of general image classification method and device based on semi-supervised generation confrontation network | |
CN108564129A (en) | A kind of track data sorting technique based on generation confrontation network | |
CN104866810A (en) | Face recognition method of deep convolutional neural network | |
CN106295507B (en) | A kind of gender identification method based on integrated convolutional neural networks | |
CN104850845A (en) | Traffic sign recognition method based on asymmetric convolution neural network | |
CN110502988A (en) | Group positioning and anomaly detection method in video | |
CN108920445A (en) | A kind of name entity recognition method and device based on Bi-LSTM-CRF model | |
CN106845528A (en) | A kind of image classification algorithms based on K means Yu deep learning | |
CN106056134A (en) | Semi-supervised random forests classification method based on Spark | |
CN107368807A (en) | A kind of monitor video vehicle type classification method of view-based access control model bag of words | |
CN1656472A (en) | Plausible neural network with supervised and unsupervised cluster analysis | |
CN114067368B (en) | Power grid harmful bird species classification and identification method based on deep convolution characteristics | |
CN109299741A (en) | A kind of network attack kind identification method based on multilayer detection | |
CN110097123B (en) | Express mail logistics process state detection multi-classification system | |
CN106991296A (en) | Ensemble classifier method based on the greedy feature selecting of randomization | |
CN110188654A (en) | A kind of video behavior recognition methods not cutting network based on movement | |
CN108596264A (en) | A kind of community discovery method based on deep learning | |
CN107633293A (en) | A kind of domain-adaptive method and device | |
CN112766283B (en) | Two-phase flow pattern identification method based on multi-scale convolution network | |
CN107679550A (en) | A kind of appraisal procedure of data set classification availability |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170517 |
|
RJ01 | Rejection of invention patent application after publication |