CN106682687A - Multi-example learning method using deep learning technology - Google Patents

Multi-example learning method using deep learning technology Download PDF

Info

Publication number
CN106682687A
CN106682687A CN201611148420.XA CN201611148420A CN106682687A CN 106682687 A CN106682687 A CN 106682687A CN 201611148420 A CN201611148420 A CN 201611148420A CN 106682687 A CN106682687 A CN 106682687A
Authority
CN
China
Prior art keywords
sample
many example
example samples
output
max
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611148420.XA
Other languages
Chinese (zh)
Inventor
张钢
毕志升
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN201611148420.XA priority Critical patent/CN106682687A/en
Publication of CN106682687A publication Critical patent/CN106682687A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06F18/2155Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a multi-example learning method using a deep learning technology. A series of data filling and segmentation means are used to convert multiple-example samples into characteristic matrixes of the same size, and a convolutional neural network is used to carry out learning with monitoring and classification. According to the method of the invention, hidden abstract concepts in a multi-example data set can be discovered, errors of the data set can be tolerated effectively, and the generalization capability is high.

Description

A kind of multi-instance learning method of application depth learning technology
Technical field
The present invention relates to a kind of machine learning method, a kind of especially multi-instance learning side of application depth learning technology Method.
Background technology
Multi-instance learning is a branch of machine learning, and the wide of machine learning researcher has just been received since appearance General concern.Multi-instance learning is suggested in the activity analysis problem of drug molecule, is known in image classification, voice afterwards Not, the field such as text understanding is all widely used.In the classification problem of multi-instance learning, the sample bag for training learner Containing multiple examples, and only some of which sample can just play a decisive role to the tag along sort of example, and in this example its Its sample is to classification without any effect.Such as piece image is made up of many less regional areas, and only some parts Region just can determine that the classification (such as portrait, landscape, building etc.) of the image.But in the data of general multi-instance learning Concentrate, classification information is only associated with sample, without being associated with certain example inside sample.It is currently used for solving show more The method of example problem concerning study mainly has:MI-SVM, DD-SVM, MI-RBF, Citation-KNN, MI-Boosting, Bayes are more Learn-by-example, the multi-instance learning method based on Gaussian process etc..Document " Kim, Minyoung and Torre, Fernando.Multiple Instance Learning via Gaussian Processes.Data Min.Knowl.Discov., Kluwer Academic Publishers, 2014,28, pp.1078-1106 " reports one kind Multi-instance learning method based on Gaussian process, first passes through core Gram matrix of the foundation based on example and obtains example relative to target The prior distribution of tag along sort, then sets up example to after target classification label by Bayesian formula and maximum likelihood theory Test distribution, and learnt to this Posterior distrbutionp and by its derived prediction distribution by iterative algorithm, finally give one from To the probability distribution of label, the subject matter of the method is huge amount of calculation to sample, and the mistake in the case of sample size is less The situation of fitting is spent than more serious.
The weak point of existing method is:
(1) existing methods are to be based on modifying existing single example machine learning algorithm mostly, are allowed to be applied to and show more In the scene of example study, and this modification is limited by a lot, and the complexity that can cause algorithm increases, efficiency decline, trains The problems such as information loss of data set, so that the effect of algorithm is not very good;
(2) supervised learning, thus feature and the matter of label very dependent on data intensive data are based on existing methods more Amount, the robustness of algorithm is poor, and the slight error in data set can be exaggerated, and have large effect to the accuracy of model;
(3) based on statistics more than existing methods, the probability distribution to input feature vector is modeled, and this is simultaneously unfavorable for many number of examples According to the discovery and modeling of the implicit abstract concept concentrated, the accuracy and Generalization Capability of final disaggregated model is set all to receive sternly The limitation of weight.
Based on this, the present invention proposes a kind of multi-instance learning method based on deep learning, is filled out by a series of data Fill and partition means, many example samples are converted into etc. the eigenmatrix of size, having carried out supervision using convolutional neural networks learns Practise and classify, the present invention is it can be found that the implicit abstract concept of many sample datas concentration, has appearance well to the error of data set Bear ability, there is good generalization ability.
The content of the invention
In order to overcome the weak point of existing multi-instance learning method, including existing method is mostly to existing single example The modification of learning method, it is very sensitive to the quality of data, based on statistics and probability distribution etc., patent of the present invention proposes one kind and is based on The multi-instance learning method of deep learning.The normalization of feature of the invention including many example sample attributes, many example samples Example expansion, training process and assorting process, each process include several steps, and its feature is described as follows respectively:
(1) normalization of many example sample attributes of
If the dimension of example is m, comprising connection attribute and Category Attributes.
A. the method for normalizing of connection attribute is:For certain connection attribute pi, in all examples in data set, obtain piMaximum and minimum value, be designated as respectivelyWithThe computational methods of the connection attribute are after normalization:Meanwhile, each connection attribute in the record all examples of training datasetWithInformation, these letters Breath will be used for the model training well normalization of unknown test data afterwards;
B. the dummy variable of Category Attributes:For including the k Category Attributes of probable value, using dummy variable, that is, turn An one-dimensional vector for having k element is turned to, when the value of the attribute is certain probable value, corresponding element sets in one-dimensional vector 1 is set to, remaining element is 0, in this k dimensional vector, for each data record, it is 1 that there can only be an element, remaining 0 is, Category Attributes no longer carry out other normalization operations after dummy variable;
(2) example of many example samples of expands
The purpose that the example of many example samples expands is that each sample for concentrating training data has identical Number of examples, can so carry out the deep learning of convolutional neural networks.If many sample datas of target include k classification, expansion side Method step is as follows:
A. the maximum number of examples n that setting example expandsmaxThe maximum number of examples included by sample in training set;
B. for some target classification, the training sample that training data is concentrated can be divided into the sample for belonging to the category (positive example) set DPWith sample (negative example) the set D for being not belonging to the categoryN, the example in all of negative example sample is placed on one In set and upset order, the set is designated as DIN
C. for the sample in each training set, if the number of examples that it is included is less than nmax, then at random from DINMiddle extraction Example, is added in the training sample, and the number of examples for including it is equal to nmax, after completing the step, for each training sample This, there is nmaxIndividual example, each example is that m is tieed up (before dummy variable is carried out), and each example is converted into a nmaxRow m is arranged Real-valued matrix;
D. step C is repeated q times, i.e., for each sample, from DINIn randomly select example, be put into sample, formed new Sample, this process repeats q time, then the popularization of data set is to about original q times (the wherein example of some samples of possibility Number is nmax, then this part sample be not involved in expand)
E. example sequence therein is upset p times, p=n by the sample after expansion for eachmax/ 2, every time Upset the sample of example sequence as a new sample, the popularization of such data is to original p times;
F. the label of sample expands with the expansion of example, i.e., for the sample that certain is classified, the new samples after expansion As the tag along sort of original sample;
(3) design of convolutional neural networks
A. it is input into:nmaxThe real-valued matrix of row m row, port number is 1
B. convolutional layer does not increase by 0 filling using the convolution kernel of 5*5, does not zoom in and out, and swashs followed by one in convolutional layer Layer living, is selected in following two activation primitives:
a)Relu:Y=max (x, 0), wherein x are the output of last layer, and y is the output of this active coating;
b)Sigmoid:Wherein x is the output of last layer, and y is the output of this active coating;
C. from 64, each layer increases by 64 passages to the port number of convolutional layer than last layer, and largest passages number is no more than 512 passages, when the port number of a certain convolutional layer reaches 512, then after the port number of convolutional layer be not further added by.
D. when reaching 1 by one of dimension of convolutional layer and the characteristic pattern of the output of active coating, no longer rolled up Product operation, and characteristic pattern is input to full articulamentum, 8 full articulamentums are set altogether, if last convolutional layer-active coating Output dimension is 1*w, then this 8 dimensions of full articulamentum random value between interval [w, 8w];
E. increase by dropout layers of 20% between the full articulamentum of each two, i.e., in advance at random upper one full articulamentum Output unit 20% shielding;
F. output layer and last full articulamentum are connected in the way of full connection, the dimension of output layer and dividing for data set Class quantity is consistent.
(4) model trainings
The weights of network are adjusted using the error back propagation learning algorithm of convolutional neural networks, adjustment is basis The input of network is carried out with the difference of output.Specifically, for the sample of each input model, the output of network is one Vector of the individual dimension as categorical measure, the computational methods of error are between network output vector and real categorization vector Hamming distances are divided by vectorial dimension, when output vector is identical with true categorization vector, the Hamming distances between them It is 0, error is 0, when output vector is entirely different with true categorization vector, the Hamming distances between them are identical with dimension, Error now is 1.
The weights of network are initialized using the random number between [0,1], carry out many wheel training, and all training samples are defeated Enter in network and complete weighed value adjusting for a wheel, untill the output error of network no longer declines.
(5) sample classifications
When need to example sample is classified more than one when, the property value to sample is normalized first, each category Numerical value in the maximum and minimum value training set of property, then carries out example expansion, now there is two kinds of situations:
A. the number of examples that sample to be sorted includes is less than or equal to nmax, according to (2nd) of patented invention content of the present invention Point, expands example, and the number of examples for including it is equal to nmax, i.e., some examples are chosen from negative example set at random, treat The sample of classification carries out example expansion.For each sample to be sorted, v expansion is carried out, wherein v is odd number, then with instruction The network perfected is classified to the sample after each expansion, and the v classification results for obtaining are voted, many results of gained vote As the final classification label of the sample to be sorted.
B. the number of examples for being included in sample to be sorted is more than nmax, the example treated in classification samples randomly selected, N is extracted every timemaxIt is individual, extract v times altogether, wherein v is odd number, and the sample after example is then taken out to each with the network for training Originally classified, the v classification results for obtaining are voted, many results of gained vote are divided as the final of the sample to be sorted Class.
Specific embodiment
The present invention is in UCI data sets (http://archive.ics.uci.edu/ml/) on tested, achieve compared with Good effect.One embodiment is given below, with the Musk2 of UCI data sets (http://archive.ics.uci.edu/ml/ Datasets/Musk+%28Version+2%29) used as test data set, the data set is example data set more than, is had Example 6598, data attribute 168, all connection attributes, the minimum example number contained in sample is 13, is contained in sample Maximum number of examples be 51.
(1) data prediction
For connection attribute, the maximum and minimum value of the attribute in data set are found out, and with the present invention for continuous category The preprocess method of property is processed.For example:For continuation field f1, its maximum and minimum is obtained in all data Value, respectively 292 and -3, then for data set in the 1st article record the field, it normalization after value be:(46-(- 3))/(292- (- 3))=0.1661, in the present embodiment, the data after normalization round up, and remain into 4 after decimal point.
(2) example expands
First the example included in all negative example samples in data set is all placed in a set, that is, bears example collection, Then the sample to whole data set carries out example expansion, bears the total example 5581 of example collection.Sample is included in data set Maximum number of examples is 51, if the number of examples that then sample is included is less than 51, is concentrated from negative example extract example supplement at random To in the sample, its number of examples is equal to 51, for each sample, repeat such process 10 times, be i.e. each original sample warp Crossing after example expands can obtain 10 samples;Afterwards 51 examples inside each sample are carried out with the random of order to upset, Intend 10 times altogether, each can obtain 10 samples upset comprising same example but order by the sample after example expansion This.So far, the scale of data set is increased to original 100 times, and 51 samples are included in each example, and dimension is 168.
(3) network design
Using convolutional neural networks deep learning model, the design of network is as shown in table 1:
The network design table of table 1.
(3) network trainings
Data set, is made Matlab data by the network structure during table 1 is realized by configuration file in MatConvNet File .mat forms, the training script cnn_train.m for then being provided using MatConvNet is trained.Training carries out 30 Wheel, the learning rate for using is that preceding 10 0.05,11-20 of wheel take turns 0.005,21-30 wheels 0.0005.The loss function of training is used zero-one loss.By after 30 wheel training, system can generate 30 .mat files, at the end of each wheel training is saved respectively The parameter of model, these .mat files are each model taken turns and train, and can be used for the classification to unknown many example samples.
(4) sample classifications
The network model at the end of the 30th wheel training is used as disaggregated model, many example samples to be sorted for This, is normalized to its property value first, and the attribute maximum for being used is obtained and minimum value is the analog value in training set;So After carry out example expansion, processed less than or equal to 51 and more than 51 two kinds of situations according to the number of examples of sample to be sorted, the side for the treatment of Method is carried out according to (5th) dot of the 5th point " content of invention ", and wherein v values take 11.11 samples expanded by example point It is not input in sorter network, obtains 11 key words sortings, then voted, that key words sorting more than poll is model To the final classification mark of the sample to be sorted.

Claims (6)

1. a kind of multi-instance learning method based on deep learning, methods described can be used for the classification to many example samples, and it is special Levy and be, the normalization including many example sample attributes, the example of many example samples expand, design convolution god methods described successively Through network, training process and assorting process.
2. the method for claim 1, it is characterised in that the normalization of described many example sample attributes includes following step Suddenly,
(1) it is many example samples of m to obtain dimension, and many example samples are provided with connection attribute and Category Attributes;
(2) the connection attribute p of many example samples is obtainedi, and seek piMaximum and minimum value, be designated as respectivelyWithAnd the connection attribute is normalized is calculated its normalized value:
(3) Category Attributes to the data carry out dummy variable, and the dummy variable includes, will include k probable value Category Attributes are recorded as an one-dimensional vector for having k element, when the value of Category Attributes is certain probable value, in one-dimensional vector Corresponding element is set to 1, and remaining element is 0.
3. method as claimed in claim 2, it is characterised in that the example of many example samples expands to be included,
A. the maximum number of examples n that setting example expandsmaxThe maximum number of examples included by many example samples;
B. many example samples are classified according to different target, for some target classification, by many example samples Originally it is divided into the sample set D for belonging to the categoryPWith the sample set D for being not belonging to the categoryN, all of sample set DNIn Sample be placed in a set and upset order, the set is designated as DIN
C. for many example samples any one described, if the sample number that it is included is less than nmax, then at random from DINMiddle sample drawn pair It expand the sample number for including it equal to nmax, will each described many example sample be extended for a nmaxRow m row Real-valued matrix;
D. many example samples are repeated to expand q times using step C;
E. sample order therein is upset p times by many example samples after expansion for each, order is upset each Many example samples as new many example samples.
4. method as claimed in claim 3, it is characterised in that the design of the convolutional neural networks includes, the convolution god It is n through the input of networkmaxThe real-valued matrix of row m row, sets port number as 1;
A. the convolutional layer of the convolutional neural networks does not increase by 0 filling using the convolution kernel of 5*5, does not zoom in and out, in convolutional layer Followed by an active coating, any selection is carried out in following two activation primitives:
a)Relu:Y=max (x, 0), wherein x are the output of last layer, and y is the output of this active coating;
b)Sigmoid:Wherein x is the output of last layer, and y is the output of this active coating;
B. the port number to the convolutional layer increases:I.e. from the convolutional layer that port number is 64, each layer increases than last layer Plus 64 passages, largest passages number has been no more than 512 passages;
C. when reaching 1 by one of dimension of convolutional layer and the characteristic pattern of the output of active coating, convolution behaviour is no longer carried out Make, and characteristic pattern is input to full articulamentum, 8 full articulamentums, the output dimension of last convolutional layer-active coating are set altogether Number is 1*w, described 8 dimensions of full articulamentum random value between interval [w, 8w];
D. dropout layers of 20% is increased between the full articulamentum of each two, i.e., upper one output unit of full articulamentum 20% shielding;
E. output layer is connected with last full articulamentum in the way of full connection, and the dimension of output layer is dividing for many example samples Class quantity.
5. method as claimed in claim 4, it is characterised in that the model training includes,
The convolutional neural networks are adjusted using the error back propagation learning algorithm of convolutional neural networks, until its is defeated Go out untill error no longer declines.
6. method as claimed in claim 5, it is characterised in that the sample classification includes,
Many example samples are randomly selected, n is extracted every timemaxIt is individual, extract v times altogether, wherein v is odd number, Ran Houyong The convolutional neural networks are classified to each sample for extracting, and the v classification results for obtaining are counted, to count knot Really most classification as many example samples final classification.
CN201611148420.XA 2016-12-13 2016-12-13 Multi-example learning method using deep learning technology Pending CN106682687A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611148420.XA CN106682687A (en) 2016-12-13 2016-12-13 Multi-example learning method using deep learning technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611148420.XA CN106682687A (en) 2016-12-13 2016-12-13 Multi-example learning method using deep learning technology

Publications (1)

Publication Number Publication Date
CN106682687A true CN106682687A (en) 2017-05-17

Family

ID=58869589

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611148420.XA Pending CN106682687A (en) 2016-12-13 2016-12-13 Multi-example learning method using deep learning technology

Country Status (1)

Country Link
CN (1) CN106682687A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108197638A (en) * 2017-12-12 2018-06-22 阿里巴巴集团控股有限公司 The method and device classified to sample to be assessed
CN114633774A (en) * 2022-03-30 2022-06-17 东莞理工学院 Rail transit fault detection system based on artificial intelligence

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108197638A (en) * 2017-12-12 2018-06-22 阿里巴巴集团控股有限公司 The method and device classified to sample to be assessed
WO2019114305A1 (en) * 2017-12-12 2019-06-20 阿里巴巴集团控股有限公司 Method and device for classifying samples to be assessed
CN108197638B (en) * 2017-12-12 2020-03-20 阿里巴巴集团控股有限公司 Method and device for classifying sample to be evaluated
CN114633774A (en) * 2022-03-30 2022-06-17 东莞理工学院 Rail transit fault detection system based on artificial intelligence

Similar Documents

Publication Publication Date Title
CN109344736B (en) Static image crowd counting method based on joint learning
CN103793718B (en) Deep study-based facial expression recognition method
CN103955702B (en) SAR image terrain classification method based on depth RBF network
CN104376326B (en) A kind of feature extracting method for image scene identification
CN110135267A (en) A kind of subtle object detection method of large scene SAR image
CN109190665A (en) A kind of general image classification method and device based on semi-supervised generation confrontation network
CN108564129A (en) A kind of track data sorting technique based on generation confrontation network
CN104866810A (en) Face recognition method of deep convolutional neural network
CN106295507B (en) A kind of gender identification method based on integrated convolutional neural networks
CN104850845A (en) Traffic sign recognition method based on asymmetric convolution neural network
CN110502988A (en) Group positioning and anomaly detection method in video
CN108920445A (en) A kind of name entity recognition method and device based on Bi-LSTM-CRF model
CN106845528A (en) A kind of image classification algorithms based on K means Yu deep learning
CN106056134A (en) Semi-supervised random forests classification method based on Spark
CN107368807A (en) A kind of monitor video vehicle type classification method of view-based access control model bag of words
CN1656472A (en) Plausible neural network with supervised and unsupervised cluster analysis
CN114067368B (en) Power grid harmful bird species classification and identification method based on deep convolution characteristics
CN109299741A (en) A kind of network attack kind identification method based on multilayer detection
CN110097123B (en) Express mail logistics process state detection multi-classification system
CN106991296A (en) Ensemble classifier method based on the greedy feature selecting of randomization
CN110188654A (en) A kind of video behavior recognition methods not cutting network based on movement
CN108596264A (en) A kind of community discovery method based on deep learning
CN107633293A (en) A kind of domain-adaptive method and device
CN112766283B (en) Two-phase flow pattern identification method based on multi-scale convolution network
CN107679550A (en) A kind of appraisal procedure of data set classification availability

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170517

RJ01 Rejection of invention patent application after publication