CN112200255A - Information redundancy removing method for sample set - Google Patents

Information redundancy removing method for sample set Download PDF

Info

Publication number
CN112200255A
CN112200255A CN202011110339.9A CN202011110339A CN112200255A CN 112200255 A CN112200255 A CN 112200255A CN 202011110339 A CN202011110339 A CN 202011110339A CN 112200255 A CN112200255 A CN 112200255A
Authority
CN
China
Prior art keywords
sample
model
sample set
training
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011110339.9A
Other languages
Chinese (zh)
Other versions
CN112200255B (en
Inventor
程战战
许昀璐
吴飞
浦世亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202011110339.9A priority Critical patent/CN112200255B/en
Publication of CN112200255A publication Critical patent/CN112200255A/en
Application granted granted Critical
Publication of CN112200255B publication Critical patent/CN112200255B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention provides an information redundancy removing method for a sample set, which comprises the following steps: obtaining a sample to be processed and a corresponding trainable label to obtain an original sample set to be processed; performing feature extraction on each sample based on a pre-trained machine learning model to obtain a feature vector set of an original sample set; inputting a characteristic vector set to a learnable sample selector model, carrying out sample selection on the characteristic vector set, and obtaining a representative characteristic vector subset according to a preset threshold; and acquiring original samples corresponding to the feature vector subsets as a sub-sample set after redundant information is removed. According to the technical scheme, the original sample set can be efficiently simplified, redundant information is removed, the samples with valuable information are reserved, and the training efficiency of the algorithm on the sample set can be improved.

Description

Information redundancy removing method for sample set
Technical Field
The invention relates to the technical field of data processing, in particular to an information redundancy removing method for a sample set.
Background
With the development of deep learning technology, machine learning methods based on large-scale data sets are continuously proposed. However, in practice, there is often a large amount of information data redundancy in large-scale datasets, e.g., single-class sample-scale excess, duplicate or approximate samples, etc.; on the other hand, the large-scale data set causes the training process of the machine learning model to require more computing power and computing time, and consumes a great deal of resources. Therefore, in the face of large-scale training tasks in different scenes, for example, a very large-scale computational visual classification task is often trained by using tens of millions of image samples, or a very large-scale natural language processing task is often trained by using hundreds of millions of language samples, so that the method for removing redundancy based on information for a sample set is more urgent. Considering that the data set is large in scale, the relation between samples is complex, and the comparison and analysis of samples based on pair is large in calculation power, no directly-used technical scheme can be used for redundancy removal of information of the large-scale data set at present.
Disclosure of Invention
The present invention is directed to solve the problems in the prior art, and provides an information redundancy removing method for a sample set, so as to achieve information redundancy removal of a data set.
The technical scheme adopted by the invention is as follows:
a method of information de-redundancy for a sample set, the method comprising:
obtaining a sample to be processed and a corresponding trainable label to obtain an original sample set;
and performing feature extraction on each acquired original sample through a pre-prepared feature extraction model to obtain a feature vector set of the original sample set.
Selecting a sample for the characteristic vector set through a pre-prepared learnable sample selector model, and obtaining a representative characteristic vector subset according to a preset threshold value;
and acquiring original samples corresponding to the feature vector subsets as a sub-sample set after redundant information is removed.
Preferably, the step of preparing the feature extraction model in advance includes:
acquiring a sample set to be processed, recording the sample set as a first sample set, and acquiring a trainable label corresponding to the sample;
inputting the samples and the corresponding labels in the first sample set into a preset first machine learning model for training to obtain a preset first model, namely a pre-prepared feature extraction model; the preset first machine learning model comprises a feature extraction part and a model constraint convergence part; the characteristic extraction part is used for obtaining characteristic vectors of the samples, obtaining a characteristic vector set of an original sample set and recording the characteristic vector set as a second sample set, and the model constraint convergence part is used for controlling normal training of the characteristic extraction model until convergence.
Preferably, the step of sample selecting the feature vector set by the sample selector model comprises:
acquiring a second sample set, and acquiring a training label of an original sample corresponding to the feature vector as a trainable label; inputting the second sample set into a sample selector model, obtaining a representative feature vector subset and a corresponding trainable label according to a first preset threshold value, and recording as a third sample set;
the sample selector model comprises a neural network and an activation function, and the activation value obtained after the sample is input into the sample selector model is considered to be representative when the activation value is larger than a first preset threshold value.
Preferably, the sample selector model performs learning optimization through a teacher-student model structure, and stops training when the convergence index of the whole training process reaches a second preset threshold value, so as to obtain the sample selector.
Preferably, the step of determining the teacher-student model comprises:
inputting the second sample set and the labels corresponding to the samples in the set into a preset second machine learning model for training to obtain a teacher model, which is called a second model; the second machine learning model comprises a feature extraction part and a Loss constraint part, wherein the feature extraction part is used for obtaining high-level abstract features of the sample, and the Loss constraint part is used for optimizing the teacher model to realize training;
inputting the samples in the third sample set and the corresponding labels into a preset third machine learning model for training to obtain a student model, namely a third model; the preset third machine learning model comprises a feature extraction part and a Loss constraint part, wherein the feature extraction part is used for obtaining high-level abstract features of the sample, and the Loss constraint part is used for optimizing the student model to realize training;
the teacher model deals with the student models to carry out knowledge distillation, more value information is transmitted to the student models, and the student models are helped to achieve the best performance on the selected third sample set.
Preferably, the knowledgeable distillation method is layer-by-layer eigendistillation, resulting in distillation Loss.
Preferably, there are a plurality of ways to determine the first preset threshold, including a) or b):
a) a preset lowest threshold value is given, and when the activation value obtained after the sample is input into the sample selector model is larger than the set lowest threshold value, the sample is considered as representative and selected;
b) and sorting the activation values obtained after all the samples are input into the sample selector model according to the capacity of the third sample set, and determining the lowest threshold value to ensure that the sample amount exceeding the lowest threshold value is equal to the capacity of the third sample set.
Preferably, the learning optimization process of the sample selector model includes:
a complete sample selector training process should include at least one forward operation and at least one reverse operation;
in the forward operation, the samples in the second sample set are input into a sample selector model, the corresponding activation value of each sample is output, and a third sample set is obtained according to a first preset threshold; inputting the third sample set into a third model, and outputting the Loss of each sample;
in the reverse operation, the distillation Loss gradient output by the third model is fed back to the network parameters of the sample selector model, and only the Loss result generated by the third sample set exceeding the first preset threshold is calculated in the reverse gradient operation process, and the network weight is updated; and the training Loss of the student model and the teacher model is reversely returned to the respective feature extraction networks for parameter updating.
The information redundancy removing method for the sample set, provided by the invention, comprises the steps of firstly obtaining a sample set to be processed; then, acquiring a corresponding characteristic vector set through a characteristic extraction model; selecting the obtained feature vector set through a sample selector model to obtain a screened feature vector subset; and finally, obtaining the original sample corresponding to the feature vector subset to obtain a subsample set after redundant information is removed. The method can effectively remove redundant information in a large-scale sample set, and effectively reduces the scale of an original sample set while ensuring the performance of a training model of the reduced sample set; on the other hand, model training is carried out based on the simplified data set, so that model training cost can be greatly saved; the method can meet the practical application requirements of users, has strong applicability and comprises the common machine learning fields of voice recognition, image recognition, natural language recognition and the like. Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a diagram illustrating an information redundancy removing method for a sample set according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of feature extraction in an information redundancy removal method for a sample set according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a teacher-student model in an information redundancy removing method for a sample set according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
First, for DNN (Deep Neural Network) of the present invention: the method is a multilayer feedforward artificial neural network, and the neurons of the feedforward artificial neural network can respond to the peripheral units in the preset coverage range and can effectively extract the characteristic information of the sample through weight sharing and characteristic convergence.
Teacher-Student (Teacher Student Model): the teacher network is a distillation-based neural network structure, and the teacher network can transmit effective information to the student network in a distillation mode, so that the information construction capability of students is improved.
In order to implement redundancy removal of sample set information, an embodiment of the present invention provides an information redundancy removal method for a sample set, and referring to fig. 1, the method includes:
s101, obtaining a sample to be processed (marked as an original sample) and a corresponding trainable label to obtain an original sample set.
The sample can be any data to be processed, including speech recognition, image recognition, natural language processing, and the like.
S102, after the original sample set is obtained, feature extraction is carried out on each obtained original sample through a pre-prepared feature extraction model, and a feature vector set of the original sample set is obtained.
The feature extraction model is used for extracting features of the original sample set to obtain a corresponding feature vector set, and ensures that feature vectors in the feature vector set correspond to samples in the original sample set one by one. In the invention, the original sample set can be an image sample, a voice sample, a natural language processing sample and the like; the feature map is also one of the feature vectors. In a possible implementation of the embodiment of the present invention, we only use the image classification sample as an implementation of the embodiment.
The preset feature extraction model can be a pre-trained self-coding neural network, and the method for pre-training the self-coding neural network comprises the steps of inputting an original image sample into the self-coding neural network for training, and obtaining a pre-trained coding part neural network after convergence to serve as a feature extraction module.
In a possible embodiment, the step of predetermining the preset feature extraction model includes:
step one, acquiring a plurality of original image samples from a sample set to be processed, wherein the original image samples are called a first sample set, and acquiring trainable labels corresponding to the samples.
Inputting the obtained image sample and the corresponding label into a preset self-coding neural network model for training to obtain a preset self-coding neural network model, namely the pre-prepared feature extraction model, wherein the self-coding neural network model comprises a feature coding part and a feature decoding part, the feature coding part is used for obtaining the depth high-level features of the image to obtain a basic feature vector, and the feature decoding part is used for decoding the basic feature vector to obtain an original image sample. And compared with the characteristic coding part, the characteristic decoding part can be regarded as a characteristic coding model constraint convergence part, and the normal training of the characteristic extraction model can be controlled until convergence.
And step three, when an image sample to be processed is obtained, inputting the image sample to the feature extraction model to obtain a corresponding feature vector set, which is called a second sample set.
Specifically, as shown in fig. 2, in the embodiment of the present invention, a self-coding neural network is adopted as a framework of a feature extraction model, where the self-coding model includes a feature coding (Encode) part and a feature decoding (Decode) part, and includes:
optionally, the feature encoding part is configured to perform feature compression on the input image sample to obtain a feature Vector (Base Vector) of the image; the feature coding sub-module includes a multilayer convolutional neural network, for example, a residual feature extraction neural network (ResNet-18, residual network 18 layers) may be used as a neural network model of the basic feature vector extraction sub-module, so as to complete a down-sampling process from an original image to a feature vector.
Optionally, the feature decoding part is configured to perform feature restoration on the extracted feature vector to obtain an original image sample; the feature decoding sub-module comprises a multilayer convolutional neural network, for example, the inverse process of the coding network is generally performed, and the up-sampling process from the feature vector to the original image is completed.
Optionally, constructing the first sample set as a first training data set; constructing an object function to enable the recovery image I 'output by the neural network'i,jAnd the original graph Ii,jKeeping consistent, the loss function defined for the training samples is:
Figure BDA0002728392360000051
where H and W are the pixel height and width of the image, respectively.
And inputting the original image sample into the preset self-coding neural network model for training, and obtaining the preset self-coding neural network model when the preset neural network model converges or the training times reach the preset times. The feature encoding part (Encode) serves as a feature extraction model.
And S103, after the characteristic vector set is obtained, carrying out sample selection on the characteristic vector set through a pre-prepared learnable sample selector model, and obtaining a representative characteristic vector subset according to a preset threshold value.
And the preset sample selector model is used for selecting the feature vector set to obtain the feature vector subset with high-value information.
The preset sample selector model can be a pre-trained neural network, and the method for pre-training the neural network comprises the steps of inputting the characteristic vector set and the corresponding trainable labels into the selector neural network for training, and obtaining the pre-trained neural network after convergence. The training process may be jointly optimized through a teacher-student model structure.
In a possible embodiment, the step of predetermining the preset sample selector model comprises:
step one, a second sample set is obtained, and a training label of an original sample corresponding to the feature vector is used as a trainable label.
And step two, inputting the second sample set into a sample selector, and obtaining a representative feature vector subset and a corresponding trainable label according to a first preset threshold value, wherein the representative feature vector subset is called as a third sample set. The sample selector model comprises a multilayer neural network and an activation function, and when the activation value obtained after the sample is input into the sample selector model is larger than a first preset threshold value, the sample selector model is regarded as representative and selected.
The sample selector can perform learning optimization through a teacher-student model structure, and stops training when the convergence index of the whole training process reaches a second preset threshold value, so as to obtain the sample selector. The teacher-student model comprises a teacher network and a student network, the teacher network is trained on the basis of a second sample set to obtain a teacher model, the student network is trained on the basis of a third sample set to obtain a student model, the teacher model is used for distilling the student model to improve the knowledge extraction capability of the student model, the convergence index of the training process can be the Loss or the accuracy of the model, and a second preset threshold value can be set to be the Loss or the accuracy of the model when the Loss does not decrease.
Specifically, as shown in fig. 3, the optimization of the sample selector model by using the teacher-student model structure in the embodiment of the present invention includes:
while obtaining the second sample set, construct a second training based on the second sample setTraining data set (V)1,V2,…,VN) Its corresponding trainable category label is (Y)1,Y2,…,YN) And N is the size of the sample set.
Further, a second set of samples is fed into a sample selector, comprising a multi-layer neural network and a non-linear activation function, such as sigmoid, for sample selection according to a first preset threshold. Specifically, there are various ways to determine the first preset threshold, including but not limited to: a) a preset lowest threshold value is given, and when the activation value obtained after the sample is input into the sample selector model is larger than the set lowest threshold value, the sample is considered as representative and selected; b) and sorting the activation values obtained after all the samples are input into the sample selector model according to the capacity of the third sample set, and determining the lowest threshold value to ensure that the sample amount exceeding the lowest threshold value is equal to the capacity of the third sample set. For example, the first threshold may be set based on the activation value of sigmoid, for example, if the activation value of sigmoid >0.8 is considered as a high-value sample, the sample is selected; the predetermined threshold may also be set based on the size of the selected subset of samples, for example, the selected subset of samples is agreed to be 50% of the second set of samples, and then the top 50% is selected to constitute the third set of samples after sorting according to the sigmoid activation value size.
Further, in acquiring the third set of samples, a third set of training data (V ') is constructed based on the third set of samples'1,V′2,…,V′M) Its corresponding trainable category label is (Y'1,Y′2,…,Y′M) And M is the sample set size.
And further, the corresponding labels of the samples in the second training data set are sent to a teacher neural network for training, the teacher neural network comprises a feature extraction part and a Loss constraint part, wherein the feature extraction part is used for obtaining high-level abstract features of the samples, and the Loss constraint part is used for optimizing a teacher model to realize training. Specifically, the teacher neural network includes a multi-layer neural network for deriving the predictive tag values. Meanwhile, an objective function needs to be constructed, so that the label value predicted by the neural network is consistent with the true label value of the corresponding image, and the loss function defined for the training sample is as follows:
Figure BDA0002728392360000061
wherein, YiIs the true value of the label, yiIs the predicted probability value.
And further, sending the third training data set into a student neural network for training, wherein the student neural network comprises a feature extraction part and a Loss constraint part, the feature extraction part is used for obtaining high-level abstract features of the sample, and the Loss constraint part is used for optimizing the student model to realize training. Specifically, the student neural network comprises a multilayer neural network, and is used for obtaining a predicted label value and constructing an objective function, so that the label value predicted by the neural network is consistent with a true label value of a corresponding image, and a loss function defined for a training sample is as follows:
Figure BDA0002728392360000071
wherein, Y'iIs a label truth value, y'iIs the predicted probability value.
Optionally, the teacher model deals with the student model to perform knowledge distillation, so as to transfer more value information to the student model, improve the knowledge extraction capability of the student model, and help the student model to achieve the best performance on the selected third sample set:
Figure BDA0002728392360000072
wherein H 'and W' are respectively the resolution height and width of the corresponding characteristic layer of the teacher student network,
Figure BDA0002728392360000073
is the characteristic value label true value of the corresponding teacher model,
Figure BDA0002728392360000074
is the eigenvalue label true value of the corresponding student model.
Optionally, the sample selector may be trained according to a teacher-student model structure, and a complete sample selector training process should include at least one forward operation and at least one backward operation:
in the forward operation, inputting the second training data set into a sample selector model, outputting an activation value corresponding to each sample, and acquiring a third training data set according to a preset threshold; and inputting the third training data set into the student model, outputting the Loss of each sample, and driving the student model to train.
In the reverse operation, the network parameters of the sample selector model are subjected to distillation loss generated by distillation of the knowledge of the student model through the teacher model
Figure BDA0002728392360000075
And (4) performing back propagation updating parameters by gradient back transmission, wherein only the Loss result generated by the third training data set exceeding the first preset threshold is calculated in the process of back gradient operation, and updating the network weight. Training losses in models of teacher and student networks generated by respective training models
Figure BDA0002728392360000076
And
Figure BDA0002728392360000077
and reversely returns to the respective feature extraction network for parameter updating.
And inputting the second training data set and the third training negligence sniping into the preset neural network model for training, and obtaining a preset sample selector model when the preset neural network model converges or the training times reach preset times.
And S104, obtaining an original image data set corresponding to the obtained feature vector subset as a selected sub-sample set after redundant information is removed.
In the embodiment of the invention, an efficient and robust information redundancy removing method for a sample set is realized. The method can effectively remove redundant information in a large-scale sample set, and effectively reduces the scale of the original sample set while ensuring the performance of a training model of the reduced sample set; in addition, model training is carried out based on the simplified data set, so that model training cost can be greatly saved. The method can meet the actual application requirements of users and has strong applicability.
It should be noted that, in this document, the technical features in the various alternatives can be combined to form the scheme as long as the technical features are not contradictory, and the scheme is within the scope of the disclosure. Relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (8)

1. A method of information de-redundancy for a sample set, the method comprising:
obtaining a sample to be processed and a corresponding trainable label to obtain an original sample set;
and performing feature extraction on each acquired original sample through a pre-prepared feature extraction model to obtain a feature vector set of the original sample set.
Selecting a sample for the characteristic vector set through a pre-prepared learnable sample selector model, and obtaining a representative characteristic vector subset according to a preset threshold value;
and acquiring original samples corresponding to the feature vector subsets as a sub-sample set after redundant information is removed.
2. The method of claim 1, wherein the step of preparing the feature extraction model in advance comprises:
acquiring a sample set to be processed, recording the sample set as a first sample set, and acquiring a trainable label corresponding to the sample;
inputting the samples and the corresponding labels in the first sample set into a preset first machine learning model for training to obtain a preset first model, namely a pre-prepared feature extraction model; the preset first machine learning model comprises a feature extraction part and a model constraint convergence part; the characteristic extraction part is used for obtaining characteristic vectors of the samples, obtaining a characteristic vector set of an original sample set and recording the characteristic vector set as a second sample set, and the model constraint convergence part is used for controlling normal training of the characteristic extraction model until convergence.
3. The method of claim 1, wherein the step of sample selecting the set of feature vectors by the sample selector model comprises:
acquiring a second sample set, and acquiring a training label of an original sample corresponding to the feature vector as a trainable label; inputting the second sample set into a sample selector model, obtaining a representative feature vector subset and a corresponding trainable label according to a first preset threshold value, and recording as a third sample set;
the sample selector model comprises a neural network and an activation function, and the activation value obtained after the sample is input into the sample selector model is considered to be representative when the activation value is larger than a first preset threshold value.
4. The method of claim 3, wherein the sample selector model is optimized for learning by a teacher-student model structure, and the training is stopped when the convergence index of the whole training process reaches a second preset threshold, so as to obtain the sample selector.
5. The method of claim 4, wherein the step of determining the teacher-student model comprises:
inputting the second sample set and the labels corresponding to the samples in the set into a preset second machine learning model for training to obtain a teacher model, which is called a second model; the second machine learning model comprises a feature extraction part and a Loss constraint part, wherein the feature extraction part is used for obtaining high-level abstract features of the sample, and the Loss constraint part is used for optimizing the teacher model to realize training;
inputting the samples in the third sample set and the corresponding labels into a preset third machine learning model for training to obtain a student model, namely a third model; the preset third machine learning model comprises a feature extraction part and a Loss constraint part, wherein the feature extraction part is used for obtaining high-level abstract features of the sample, and the Loss constraint part is used for optimizing the student model to realize training;
the teacher model deals with the student models to carry out knowledge distillation, more value information is transmitted to the student models, and the student models are helped to achieve the best performance on the selected third sample set.
6. The method of claim 5, wherein the knowledgeable distillation method is layer-by-layer eigendistillation, resulting in distillation Loss.
7. The method of claim 3, wherein the first predetermined threshold is determined in a plurality of ways, including a) or b):
a) a preset lowest threshold value is given, and when the activation value obtained after the sample is input into the sample selector model is larger than the set lowest threshold value, the sample is considered as representative and selected;
b) and sorting the activation values obtained after all the samples are input into the sample selector model according to the capacity of the third sample set, and determining the lowest threshold value to ensure that the sample amount exceeding the lowest threshold value is equal to the capacity of the third sample set.
8. The method of claim 4, wherein the learning optimization process of the sample selector model comprises:
a complete sample selector training process should include at least one forward operation and at least one reverse operation;
in the forward operation, the samples in the second sample set are input into a sample selector model, the corresponding activation value of each sample is output, and a third sample set is obtained according to a first preset threshold; inputting the third sample set into a third model, and outputting the Loss of each sample;
in the reverse operation, the distillation Loss gradient output by the third model is fed back to the network parameters of the sample selector model, and only the Loss result generated by the third sample set exceeding the first preset threshold is calculated in the reverse gradient operation process, and the network weight is updated; and the training Loss of the student model and the teacher model is reversely returned to the respective feature extraction networks for parameter updating.
CN202011110339.9A 2020-10-16 2020-10-16 Information redundancy removing method for sample set Active CN112200255B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011110339.9A CN112200255B (en) 2020-10-16 2020-10-16 Information redundancy removing method for sample set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011110339.9A CN112200255B (en) 2020-10-16 2020-10-16 Information redundancy removing method for sample set

Publications (2)

Publication Number Publication Date
CN112200255A true CN112200255A (en) 2021-01-08
CN112200255B CN112200255B (en) 2021-09-14

Family

ID=74009216

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011110339.9A Active CN112200255B (en) 2020-10-16 2020-10-16 Information redundancy removing method for sample set

Country Status (1)

Country Link
CN (1) CN112200255B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116580847A (en) * 2023-07-14 2023-08-11 天津医科大学总医院 Modeling method and system for prognosis prediction of septic shock

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108710907A (en) * 2018-05-15 2018-10-26 苏州大学 Handwritten form data classification method, model training method, device, equipment and medium
CN110991473A (en) * 2019-10-11 2020-04-10 平安信托有限责任公司 Feature selection method and device for image sample, computer equipment and storage medium
CN111259917A (en) * 2020-02-20 2020-06-09 西北工业大学 Image feature extraction method based on local neighbor component analysis
CN111768457A (en) * 2020-05-14 2020-10-13 北京航空航天大学 Image data compression method, device, electronic equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108710907A (en) * 2018-05-15 2018-10-26 苏州大学 Handwritten form data classification method, model training method, device, equipment and medium
CN110991473A (en) * 2019-10-11 2020-04-10 平安信托有限责任公司 Feature selection method and device for image sample, computer equipment and storage medium
CN111259917A (en) * 2020-02-20 2020-06-09 西北工业大学 Image feature extraction method based on local neighbor component analysis
CN111768457A (en) * 2020-05-14 2020-10-13 北京航空航天大学 Image data compression method, device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张国东等: ""基于栈式自编码神经网络对高光谱遥感图像分类研究"", 《红外技术》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116580847A (en) * 2023-07-14 2023-08-11 天津医科大学总医院 Modeling method and system for prognosis prediction of septic shock
CN116580847B (en) * 2023-07-14 2023-11-28 天津医科大学总医院 Method and system for predicting prognosis of septic shock

Also Published As

Publication number Publication date
CN112200255B (en) 2021-09-14

Similar Documents

Publication Publication Date Title
CN110490946B (en) Text image generation method based on cross-modal similarity and antagonism network generation
CN110083692B (en) Text interactive matching method and device for financial knowledge question answering
CN113656570B (en) Visual question-answering method and device based on deep learning model, medium and equipment
CN112699247B (en) Knowledge representation learning method based on multi-class cross entropy contrast complement coding
CN109977250B (en) Deep hash image retrieval method fusing semantic information and multilevel similarity
CN111639240A (en) Cross-modal Hash retrieval method and system based on attention awareness mechanism
CN113516133B (en) Multi-modal image classification method and system
CN113204633B (en) Semantic matching distillation method and device
CN113822776B (en) Course recommendation method, device, equipment and storage medium
CN113094534B (en) Multi-mode image-text recommendation method and device based on deep learning
Dai et al. Hybrid deep model for human behavior understanding on industrial internet of video things
CN113239897A (en) Human body action evaluation method based on space-time feature combination regression
CN112200255B (en) Information redundancy removing method for sample set
Jiang et al. An intelligent recommendation approach for online advertising based on hybrid deep neural network and parallel computing
CN114373092A (en) Progressive training fine-grained vision classification method based on jigsaw arrangement learning
CN116663523A (en) Semantic text similarity calculation method for multi-angle enhanced network
US20240037335A1 (en) Methods, systems, and media for bi-modal generation of natural languages and neural architectures
CN116542080A (en) Condition generation countermeasure network topology optimization method and system based on contrast learning
CN115795035A (en) Science and technology service resource classification method and system based on evolutionary neural network and computer readable storage medium thereof
CN115455162A (en) Answer sentence selection method and device based on hierarchical capsule and multi-view information fusion
CN113344060B (en) Text classification model training method, litigation state classification method and device
Wang et al. Hierarchical multimodal fusion network with dynamic multi-task learning
CN113989566A (en) Image classification method and device, computer equipment and storage medium
CN110659962B (en) Commodity information output method and related device
Rathod et al. Leveraging CNNs and Ensemble Learning for Automated Disaster Image Classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant