CN106682687A

CN106682687A - Multi-example learning method using deep learning technology

Info

Publication number: CN106682687A
Application number: CN201611148420.XA
Authority: CN
Inventors: 张钢; 毕志升
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2016-12-13
Filing date: 2016-12-13
Publication date: 2017-05-17

Abstract

The invention provides a multi-example learning method using a deep learning technology. A series of data filling and segmentation means are used to convert multiple-example samples into characteristic matrixes of the same size, and a convolutional neural network is used to carry out learning with monitoring and classification. According to the method of the invention, hidden abstract concepts in a multi-example data set can be discovered, errors of the data set can be tolerated effectively, and the generalization capability is high.

Description

A kind of multi-instance learning method of application depth learning technology

Technical field

The present invention relates to a kind of machine learning method, a kind of especially multi-instance learning side of application depth learning technology Method.

Background technology

Multi-instance learning is a branch of machine learning, and the wide of machine learning researcher has just been received since appearance General concern.Multi-instance learning is suggested in the activity analysis problem of drug molecule, is known in image classification, voice afterwards Not, the field such as text understanding is all widely used.In the classification problem of multi-instance learning, the sample bag for training learner Containing multiple examples, and only some of which sample can just play a decisive role to the tag along sort of example, and in this example its Its sample is to classification without any effect.Such as piece image is made up of many less regional areas, and only some parts Region just can determine that the classification (such as portrait, landscape, building etc.) of the image.But in the data of general multi-instance learning Concentrate, classification information is only associated with sample, without being associated with certain example inside sample.It is currently used for solving show more The method of example problem concerning study mainly has：MI-SVM, DD-SVM, MI-RBF, Citation-KNN, MI-Boosting, Bayes are more Learn-by-example, the multi-instance learning method based on Gaussian process etc..Document " Kim, Minyoung and Torre, Fernando.Multiple Instance Learning via Gaussian Processes.Data Min.Knowl.Discov., Kluwer Academic Publishers, 2014,28, pp.1078-1106 " reports one kind Multi-instance learning method based on Gaussian process, first passes through core Gram matrix of the foundation based on example and obtains example relative to target The prior distribution of tag along sort, then sets up example to after target classification label by Bayesian formula and maximum likelihood theory Test distribution, and learnt to this Posterior distrbutionp and by its derived prediction distribution by iterative algorithm, finally give one from To the probability distribution of label, the subject matter of the method is huge amount of calculation to sample, and the mistake in the case of sample size is less The situation of fitting is spent than more serious.

The weak point of existing method is：

(1) existing methods are to be based on modifying existing single example machine learning algorithm mostly, are allowed to be applied to and show more In the scene of example study, and this modification is limited by a lot, and the complexity that can cause algorithm increases, efficiency decline, trains The problems such as information loss of data set, so that the effect of algorithm is not very good；

(2) supervised learning, thus feature and the matter of label very dependent on data intensive data are based on existing methods more Amount, the robustness of algorithm is poor, and the slight error in data set can be exaggerated, and have large effect to the accuracy of model；

(3) based on statistics more than existing methods, the probability distribution to input feature vector is modeled, and this is simultaneously unfavorable for many number of examples According to the discovery and modeling of the implicit abstract concept concentrated, the accuracy and Generalization Capability of final disaggregated model is set all to receive sternly The limitation of weight.

Based on this, the present invention proposes a kind of multi-instance learning method based on deep learning, is filled out by a series of data Fill and partition means, many example samples are converted into etc. the eigenmatrix of size, having carried out supervision using convolutional neural networks learns Practise and classify, the present invention is it can be found that the implicit abstract concept of many sample datas concentration, has appearance well to the error of data set Bear ability, there is good generalization ability.

The content of the invention

In order to overcome the weak point of existing multi-instance learning method, including existing method is mostly to existing single example The modification of learning method, it is very sensitive to the quality of data, based on statistics and probability distribution etc., patent of the present invention proposes one kind and is based on The multi-instance learning method of deep learning.The normalization of feature of the invention including many example sample attributes, many example samples Example expansion, training process and assorting process, each process include several steps, and its feature is described as follows respectively：

(1) normalization of many example sample attributes of

If the dimension of example is m, comprising connection attribute and Category Attributes.

A. the method for normalizing of connection attribute is：For certain connection attribute p_i, in all examples in data set, obtain p_iMaximum and minimum value, be designated as respectivelyWithThe computational methods of the connection attribute are after normalization：Meanwhile, each connection attribute in the record all examples of training datasetWithInformation, these letters Breath will be used for the model training well normalization of unknown test data afterwards；

B. the dummy variable of Category Attributes：For including the k Category Attributes of probable value, using dummy variable, that is, turn An one-dimensional vector for having k element is turned to, when the value of the attribute is certain probable value, corresponding element sets in one-dimensional vector 1 is set to, remaining element is 0, in this k dimensional vector, for each data record, it is 1 that there can only be an element, remaining 0 is, Category Attributes no longer carry out other normalization operations after dummy variable；

(2) example of many example samples of expands

The purpose that the example of many example samples expands is that each sample for concentrating training data has identical Number of examples, can so carry out the deep learning of convolutional neural networks.If many sample datas of target include k classification, expansion side Method step is as follows：

A. the maximum number of examples n that setting example expands^maxThe maximum number of examples included by sample in training set；

B. for some target classification, the training sample that training data is concentrated can be divided into the sample for belonging to the category (positive example) set D_PWith sample (negative example) the set D for being not belonging to the category_N, the example in all of negative example sample is placed on one In set and upset order, the set is designated as DI_N；

C. for the sample in each training set, if the number of examples that it is included is less than n^max, then at random from DI_NMiddle extraction Example, is added in the training sample, and the number of examples for including it is equal to n^max, after completing the step, for each training sample This, there is n^maxIndividual example, each example is that m is tieed up (before dummy variable is carried out), and each example is converted into a n^maxRow m is arranged Real-valued matrix；

D. step C is repeated q times, i.e., for each sample, from DI_NIn randomly select example, be put into sample, formed new Sample, this process repeats q time, then the popularization of data set is to about original q times (the wherein example of some samples of possibility Number is n^max, then this part sample be not involved in expand)

E. example sequence therein is upset p times, p=n by the sample after expansion for each^max/ 2, every time Upset the sample of example sequence as a new sample, the popularization of such data is to original p times；

F. the label of sample expands with the expansion of example, i.e., for the sample that certain is classified, the new samples after expansion As the tag along sort of original sample；

(3) design of convolutional neural networks

A. it is input into：n^maxThe real-valued matrix of row m row, port number is 1

B. convolutional layer does not increase by 0 filling using the convolution kernel of 5*5, does not zoom in and out, and swashs followed by one in convolutional layer Layer living, is selected in following two activation primitives：

a)Relu：Y=max (x, 0), wherein x are the output of last layer, and y is the output of this active coating；

b)Sigmoid：Wherein x is the output of last layer, and y is the output of this active coating；

C. from 64, each layer increases by 64 passages to the port number of convolutional layer than last layer, and largest passages number is no more than 512 passages, when the port number of a certain convolutional layer reaches 512, then after the port number of convolutional layer be not further added by.

D. when reaching 1 by one of dimension of convolutional layer and the characteristic pattern of the output of active coating, no longer rolled up Product operation, and characteristic pattern is input to full articulamentum, 8 full articulamentums are set altogether, if last convolutional layer-active coating Output dimension is 1*w, then this 8 dimensions of full articulamentum random value between interval [w, 8w]；

E. increase by dropout layers of 20% between the full articulamentum of each two, i.e., in advance at random upper one full articulamentum Output unit 20% shielding；

F. output layer and last full articulamentum are connected in the way of full connection, the dimension of output layer and dividing for data set Class quantity is consistent.

(4) model trainings

The weights of network are adjusted using the error back propagation learning algorithm of convolutional neural networks, adjustment is basis The input of network is carried out with the difference of output.Specifically, for the sample of each input model, the output of network is one Vector of the individual dimension as categorical measure, the computational methods of error are between network output vector and real categorization vector Hamming distances are divided by vectorial dimension, when output vector is identical with true categorization vector, the Hamming distances between them It is 0, error is 0, when output vector is entirely different with true categorization vector, the Hamming distances between them are identical with dimension, Error now is 1.

The weights of network are initialized using the random number between [0,1], carry out many wheel training, and all training samples are defeated Enter in network and complete weighed value adjusting for a wheel, untill the output error of network no longer declines.

(5) sample classifications

When need to example sample is classified more than one when, the property value to sample is normalized first, each category Numerical value in the maximum and minimum value training set of property, then carries out example expansion, now there is two kinds of situations：

A. the number of examples that sample to be sorted includes is less than or equal to n^max, according to (2nd) of patented invention content of the present invention Point, expands example, and the number of examples for including it is equal to n^max, i.e., some examples are chosen from negative example set at random, treat The sample of classification carries out example expansion.For each sample to be sorted, v expansion is carried out, wherein v is odd number, then with instruction The network perfected is classified to the sample after each expansion, and the v classification results for obtaining are voted, many results of gained vote As the final classification label of the sample to be sorted.

B. the number of examples for being included in sample to be sorted is more than n^max, the example treated in classification samples randomly selected, N is extracted every time^maxIt is individual, extract v times altogether, wherein v is odd number, and the sample after example is then taken out to each with the network for training Originally classified, the v classification results for obtaining are voted, many results of gained vote are divided as the final of the sample to be sorted Class.

Specific embodiment

The present invention is in UCI data sets (http://archive.ics.uci.edu/ml/) on tested, achieve compared with Good effect.One embodiment is given below, with the Musk2 of UCI data sets (http://archive.ics.uci.edu/ml/ Datasets/Musk+%28Version+2%29) used as test data set, the data set is example data set more than, is had Example 6598, data attribute 168, all connection attributes, the minimum example number contained in sample is 13, is contained in sample Maximum number of examples be 51.

(1) data prediction

For connection attribute, the maximum and minimum value of the attribute in data set are found out, and with the present invention for continuous category The preprocess method of property is processed.For example：For continuation field f1, its maximum and minimum is obtained in all data Value, respectively 292 and -3, then for data set in the 1st article record the field, it normalization after value be：(46-(- 3))/(292- (- 3))=0.1661, in the present embodiment, the data after normalization round up, and remain into 4 after decimal point.

(2) example expands

First the example included in all negative example samples in data set is all placed in a set, that is, bears example collection, Then the sample to whole data set carries out example expansion, bears the total example 5581 of example collection.Sample is included in data set Maximum number of examples is 51, if the number of examples that then sample is included is less than 51, is concentrated from negative example extract example supplement at random To in the sample, its number of examples is equal to 51, for each sample, repeat such process 10 times, be i.e. each original sample warp Crossing after example expands can obtain 10 samples；Afterwards 51 examples inside each sample are carried out with the random of order to upset, Intend 10 times altogether, each can obtain 10 samples upset comprising same example but order by the sample after example expansion This.So far, the scale of data set is increased to original 100 times, and 51 samples are included in each example, and dimension is 168.

(3) network design

Using convolutional neural networks deep learning model, the design of network is as shown in table 1：

The network design table of table 1.

(3) network trainings

Data set, is made Matlab data by the network structure during table 1 is realized by configuration file in MatConvNet File .mat forms, the training script cnn_train.m for then being provided using MatConvNet is trained.Training carries out 30 Wheel, the learning rate for using is that preceding 10 0.05,11-20 of wheel take turns 0.005,21-30 wheels 0.0005.The loss function of training is used zero-one loss.By after 30 wheel training, system can generate 30 .mat files, at the end of each wheel training is saved respectively The parameter of model, these .mat files are each model taken turns and train, and can be used for the classification to unknown many example samples.

(4) sample classifications

The network model at the end of the 30th wheel training is used as disaggregated model, many example samples to be sorted for This, is normalized to its property value first, and the attribute maximum for being used is obtained and minimum value is the analog value in training set；So After carry out example expansion, processed less than or equal to 51 and more than 51 two kinds of situations according to the number of examples of sample to be sorted, the side for the treatment of Method is carried out according to (5th) dot of the 5th point " content of invention ", and wherein v values take 11.11 samples expanded by example point It is not input in sorter network, obtains 11 key words sortings, then voted, that key words sorting more than poll is model To the final classification mark of the sample to be sorted.

Claims

1. a kind of multi-instance learning method based on deep learning, methods described can be used for the classification to many example samples, and it is special Levy and be, the normalization including many example sample attributes, the example of many example samples expand, design convolution god methods described successively Through network, training process and assorting process.

2. the method for claim 1, it is characterised in that the normalization of described many example sample attributes includes following step Suddenly,

(1) it is many example samples of m to obtain dimension, and many example samples are provided with connection attribute and Category Attributes；

(2) the connection attribute p of many example samples is obtained_i, and seek p_iMaximum and minimum value, be designated as respectivelyWithAnd the connection attribute is normalized is calculated its normalized value：

(3) Category Attributes to the data carry out dummy variable, and the dummy variable includes, will include k probable value Category Attributes are recorded as an one-dimensional vector for having k element, when the value of Category Attributes is certain probable value, in one-dimensional vector Corresponding element is set to 1, and remaining element is 0.

3. method as claimed in claim 2, it is characterised in that the example of many example samples expands to be included,

A. the maximum number of examples n that setting example expands^maxThe maximum number of examples included by many example samples；

B. many example samples are classified according to different target, for some target classification, by many example samples Originally it is divided into the sample set D for belonging to the category_PWith the sample set D for being not belonging to the category_N, all of sample set D_NIn Sample be placed in a set and upset order, the set is designated as DI_N；

C. for many example samples any one described, if the sample number that it is included is less than n^max, then at random from DI_NMiddle sample drawn pair It expand the sample number for including it equal to n^max, will each described many example sample be extended for a n^maxRow m row Real-valued matrix；

D. many example samples are repeated to expand q times using step C；

E. sample order therein is upset p times by many example samples after expansion for each, order is upset each Many example samples as new many example samples.

4. method as claimed in claim 3, it is characterised in that the design of the convolutional neural networks includes, the convolution god It is n through the input of network^maxThe real-valued matrix of row m row, sets port number as 1；

A. the convolutional layer of the convolutional neural networks does not increase by 0 filling using the convolution kernel of 5*5, does not zoom in and out, in convolutional layer Followed by an active coating, any selection is carried out in following two activation primitives：

B. the port number to the convolutional layer increases：I.e. from the convolutional layer that port number is 64, each layer increases than last layer Plus 64 passages, largest passages number has been no more than 512 passages；

C. when reaching 1 by one of dimension of convolutional layer and the characteristic pattern of the output of active coating, convolution behaviour is no longer carried out Make, and characteristic pattern is input to full articulamentum, 8 full articulamentums, the output dimension of last convolutional layer-active coating are set altogether Number is 1*w, described 8 dimensions of full articulamentum random value between interval [w, 8w]；

D. dropout layers of 20% is increased between the full articulamentum of each two, i.e., upper one output unit of full articulamentum 20% shielding；

E. output layer is connected with last full articulamentum in the way of full connection, and the dimension of output layer is dividing for many example samples Class quantity.

5. method as claimed in claim 4, it is characterised in that the model training includes,

The convolutional neural networks are adjusted using the error back propagation learning algorithm of convolutional neural networks, until its is defeated Go out untill error no longer declines.

6. method as claimed in claim 5, it is characterised in that the sample classification includes,

Many example samples are randomly selected, n is extracted every time^maxIt is individual, extract v times altogether, wherein v is odd number, Ran Houyong The convolutional neural networks are classified to each sample for extracting, and the v classification results for obtaining are counted, to count knot Really most classification as many example samples final classification.