CN109934281B

CN109934281B - Unsupervised training method of two-class network

Info

Publication number: CN109934281B
Application number: CN201910175530.2A
Authority: CN
Inventors: 师君; 王琛; 周泽南; 周远远; 杨夏青
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2019-03-08
Filing date: 2019-03-08
Publication date: 2021-01-26
Anticipated expiration: 2039-03-08
Also published as: CN109934281A

Abstract

The invention provides an unsupervised training method of a two-class network, and belongs to the field of image processing, deep learning and pattern recognition. According to the invention, through a series of processing of sample data, the clustering network is trained in different stages, and the trained clustering network is used for classifying images, so that the problems of lack of supervision samples, overlarge data acquisition difficulty and overhigh cost required by data annotation are avoided, and the training of the clustering network and the realization of classification results are simpler and more efficient.

Description

Unsupervised training method of two-class network

Technical Field

The invention belongs to the field of image processing, deep learning and pattern recognition, and particularly relates to an unsupervised training method for a two-class network.

Background

With the development of deep learning, the classification network is widely applied to various fields such as face recognition, image retrieval, public monitoring, biological recognition, intelligent automobiles, medical assistance, remote sensing and the like. Due to the excellent characteristic characterization capability of the deep neural network, the deep classification network can obtain the classification performance close to human.

Most of the current classification networks are supervised learning networks, and the networks are trained by using images and corresponding labels (namely, in a supervised training mode), so that a large amount of labeled data is required. The data annotation process typically requires a significant amount of labor and time. Therefore, in some application fields with difficult data acquisition and high cost, the lack of supervision samples becomes a main factor for limiting the effect of the deep classification network.

Unlike supervised networks, clustering methods do not require supervised samples and can achieve classification through analysis of a set of samples. Therefore, the research on how to realize the unsupervised training of the classification network has important significance.

Disclosure of Invention

The invention aims to solve the problem that a supervised learning network in the prior art needs a large number of labeled samples, and provides an unsupervised training method for a two-class network.

An unsupervised training method of a two-class network comprises the following steps:

s1, collecting images to construct a data set S, wherein the data set comprises images to be clustered, the number of the images in the data set is M, and any image is I;

s2, constructing a clustering network, wherein the clustering network comprises a convolution layer, a pooling layer and a full-connection layer, and the output of the full-connection layer is used as the output of the clustering network;

s3, training a clustering network, generating corresponding label sets of images participating in training in several stages of training, and training the clustering network based on the images participating in training and the corresponding label sets to obtain final network model training parameters;

and S4, inputting the data set into the trained clustering network to obtain the classified output of the images in the data set.

Further, the step S2 includes:

constructing a clustering network, wherein the clustering network comprises a convolutional layer, a pooling layer and a fully-connected layer, and the output of the fully-connected layer is used as the output of the clustering network;

unifying the sizes of the images in the data set S into (H, W, C), randomly disorganizing and inputting the images into the clustering network, wherein H represents the image height, W represents the image width, and C represents the number of image channels; the input of the clustering network is a four-dimensional vector (M, H, W, C), the number of classified output classes is 2, the output of the clustering network is a two-class output matrix V represented by a two-dimensional tensor (M, 2), and a prediction class vector L represented by a one-dimensional vector (M,1) can be obtained from the matrix V, wherein an element L in L_iRepresents a cluster class, represented as

L_i＝argmax(V(i,1),V(i,2))

Where V (i,1), V (i,2) respectively represent the element values at the corresponding indices in the matrix V.

Further, the step S3 includes:

s31, setting labels of all images in the data set S to be (0, 0), and forming a first training sample set dataset with the original images₀Training the clustering network, wherein the training loss function is a two-norm loss function

Wherein, G is an image corresponding label, and V (i) represents an output vector corresponding to the ith image corresponding to the matrix V;

training network weight by adopting a random gradient descent method through s₁After step iteration, a first network weight omega is obtained₁；

S32, randomly selecting one image in the data set S as a positive sample, setting the label of the positive sample to be (1,0), setting the corresponding category to be A, setting the labels of the rest images to be (0, 0), copying M parts of the positive sample, and forming a second training sample set dataset together with the data set₁；

Set dataset of the second training samples₁The data in (1) is randomly disturbed to be used as a training sample, and the first network weight omega is used as the training sample₁Training the network for initial weights, s₂After step iteration, a second network weight omega is obtained₂；

S33, the network pair dataset obtained by training in the step S32₀The predicted values corresponding to the category A are subjected to size sorting to obtain Q images corresponding to the largest Q predicted values, the corresponding label values of the Q images are set to be (1,0), the labels of the other images are set to be (0, 0), and a third training sample set dataset is obtained₂；

Set dataset of the third training samples₂The data in (1) is randomly disturbed to be used as a training sample, and the second network weight omega is used₂Training the network for initial weights, s₃After step iteration, a third network weight omega is obtained₃；

S34, the network pair dataset obtained by training in the step S33₂The predicted values corresponding to the category B are subjected to size sorting to obtain 2 XQ images corresponding to the largest 2 XQ predicted values, and Q images are randomly extractedSet its corresponding sample tag to (0,1) while preserving dataset₂The image and the label corresponding to the middle category A are obtained, and a fourth training sample set dataset consisting of Q category A labels, Q category B labels and the corresponding image is obtained₃；

Set dataset of the fourth training sample₃With the first network weight ω as a training sample₁Training the network for initial weights, s₄After step iteration, a fourth network weight omega is obtained₄；

S35, performing output analysis according to the network obtained by training in the step S34 to generate target labels, exchanging the generated target labels to obtain a label set, training the network by taking the label set as a category label, and processing the training by S₅After step iteration, the final network weight omega is obtained₅。

Further, the step S35 includes:

s351, performing output analysis according to the network obtained by training in the step S34 to generate a target label;

s352, exchanging the generated target labels to obtain a label set;

s353, taking the label set as a class label, training the network, and obtaining the training result S₅After step iteration, the final network weight omega is obtained₅。

Further, the step S351 includes:

searching in the matrix V to obtain the maximum value p in the matrix V_maxThe maximum value corresponding position is (m, n);

searching in a dimension V (:, n) where n is located, obtaining a sample corresponding to the largest T labels in the dimension, and setting the labels of the samples as n, wherein: representing all elements under the index, and T is a hyper-parameter;

removing the T data with the set labels from the data set S, and updating the data set without the labeled labels to be S_new(ii) a Removing vectors corresponding to the T data with the set labels from the matrix V to obtain a matrix V_new(ii) a If n is 0, then S_newOf the middle sampleThe label is set to B; if n is 1, then S_newThe label of the middle sample is set as A;

obtaining a fifth training sample set (S, G) comprising the original image and the corresponding label₀) Is denoted as dataset₄Wherein G is₀Is a one-dimensional tensor (M,1) representing the corresponding generated labelset of the dataset S.

Further, the step S352 includes:

dividing the matrix V into two sub-matrices V according to the corresponding prediction class vector L₁And V₂Wherein V is₁Denotes a matrix composed of all vectors corresponding to (1,0), V₂A matrix composed of all vectors corresponding to (0,1) is represented;

extraction of V₁And V₂Respectively obtaining the predicted values at the indexes corresponding to the respective categories in the vector v₁And v₂Respectively to vector v₁And v₂Sorting the medium elements to obtain the minimum M × r predicted values and corresponding indexes i in respective vectors₁And i₂Wherein r is a hyperparameter;

exchange index i₁And i₂And obtaining a label set according to the corresponding label.

The invention has the beneficial effects that: the invention provides an unsupervised training method of a two-class network, which trains a clustering network at different stages through a series of processing of sample data, avoids the problems of lack of supervised samples, overlarge data acquisition difficulty and overhigh cost required by data labeling, and ensures that the training of the clustering network and the realization of classification results are simpler and more efficient.

Drawings

Fig. 1 is a flow chart provided by an embodiment of the present invention.

Fig. 2 is a flowchart of step S3 in fig. 1.

Detailed Description

Before describing the present invention, the following definitions of terms are made:

definition 1, convolution layer

The convolutional layer is realized by four-dimensional tensors (N, K, K, C)₁) Convolution kernel of representationPerforming convolution operation with the input to extract different features of the input to obtain a four-dimensional tensor (N, H)₀，W₀，C₂) The output of the representation. N, K, C therein₁Respectively representing the number of input feature maps, the size of a convolution kernel and the number of feature map channels. Its input can be the input layer in definition 1 or the feature map in definition 3. Wherein H₀，W₀Height and width, C, representing the output characteristic image of the convolutional layer₂Representing the number of convolutions.

Definition 2, pooling layer

The pooling layer is an operation that enables downsampling of the feature map in definition 4. Its input is the four-dimensional tensor (N, H) in definition 3_f，W_f，C_f) The output of the characteristic diagram is H_f，W_fAnd (5) reducing the characteristic diagram.

Define 3, activate function

The activation function is a nonlinear function after convolution. We call the four-dimensional tensor (N, H) output after activating the function_f，W_f，C_f) Is a characteristic diagram. The activation functions in the network may be chosen to be different functions, as defined by f (x) max (x,0), f (x) tanh (x),

define 4, Softmax layer

The Softmax layer is a Softmax function used at the output layer of the classification network and is defined as

Definition 5, fully connected layer

The fully-connected layer is a network structure in which each neuron in the neural network is connected with each neuron in the previous layer.

Definition 6, learning Rate

The learning rate is a coefficient before the parameter updating amount when the model training is carried out by adopting a back propagation algorithm, and is used for controlling the parameter updating amount each time.

Definition 7, image normalization

Image normalization refers to adjusting the gray value range of each channel in the image to be within a specific range, and is defined as follows:

wherein,

σ denotes standard deviation, N_pRepresenting the number of pixels of the image I.

The embodiments of the present invention will be further described with reference to the accompanying drawings.

Referring to fig. 1, the present invention provides an unsupervised training method for a two-class network, and the implementation process is implemented by a tensoflow frame, and specifically implemented by the following steps:

and S1, acquiring images to construct a data set S, wherein the data set acquired in the invention comprises images to be clustered, the number of the images in the data set is M, and any image is I.

In this embodiment, the data set S is a MNIST data set, and the present invention selects 400 images (200 images per class) of 2 classes in the MNIST data set, that is, M is 400. The training settings N, H, W, C are 400, 28, 1, respectively. The MNIST data set is from the national institute of standards and technology, and the training set consists of numbers handwritten from 250 different people, 50% of which are high school students and 50% of which are from the staff of the census bureau of population. The test set is also handwritten digital data of the same scale. In the invention, only the image data in the MNIST data set is selected, and the corresponding class label file is not used.

S2, constructing a clustering network, wherein the clustering network comprises a convolution layer, a pooling layer and a full-connection layer, and the output of the full-connection layer is used as the output of the clustering network.

In the embodiment, a clustering network is constructed, wherein the clustering network comprises 3 convolutional layers, 2 pooling layers and 2 full-connection layers, and the output of the full-connection layers is used as the output of the clustering network; the activation function is a relu function, defined as relu (x) max (x,0), the pooling step in the pooling layer is 2, and the number of neurons in the fully-connected layer is 1024 and 2, respectively. The network removes a softmax layer which is commonly used in a classification network, and directly uses the output of a full connection layer as the output of the clustering network. The invention carries out standardized preprocessing operation on the image and then uses the image as network input.

Unifying the sizes of the images in the data set S into (H, W, C), randomly disorganizing and inputting the images into a clustering network, wherein H represents the image height, W represents the image width, and C represents the number of image channels; the input of the clustering network is a four-dimensional vector (M, H, W, C), the number of classified output categories is 2, the output of the clustering network is a two-category output matrix V represented by a two-dimensional tensor (M, 2), and a prediction category vector L represented by a one-dimensional vector (M,1) can be obtained from the matrix V, wherein an element L in the L_iRepresents a cluster class, represented as

L_i＝argmax(V(i,1),V(i,2))

And S3, training a clustering network, generating corresponding label sets of the images participating in training in several stages of training, and training the clustering network based on the images participating in training and the corresponding label sets to obtain final network model training parameters.

Referring to fig. 2, in the present embodiment, step S3 is implemented by the following sub-steps:

s31, first-stage training

The labels of all images in the data set S are set to (0, 0), and together with the original images, a first training sample set dataset is formed₀Training the clustering network, wherein the training loss function is a two-norm loss function

training by using random gradient descent methodTraining the network weight, through s₁After step iteration, a first network weight omega is obtained₁；

In the subsequent steps, random gradient descent method and two-norm loss function are adopted for training.

S32 second-stage training

Randomly selecting an image in a data set S as a positive sample, setting the label of the positive sample to be (1,0), and setting the corresponding category to be A; and setting the labels of the rest images to be (0, 0) and constructing a new training set. The method comprises a sample copying step to solve the problem of uneven samples, and comprises the following specific steps: copying M parts of randomly selected 1 positive samples, and forming a new training set, namely a second training sample set dataset, together with the original data set₁；

Set dataset of second training samples₁The data in (1) is randomly disturbed to be used as a training sample, and the first network weight omega is used₁Training the network for initial weights, s₂After step iteration, a second network weight omega is obtained₂。

S33, training in the third stage: obtaining more positive samples

The pairs of networks dataset trained in step S32₀The predicted values corresponding to the category A are subjected to size sorting to obtain Q images corresponding to the largest Q predicted values, the corresponding label values of the Q images are set to be (1,0), the labels of the other images are set to be (0, 0), and a third training sample set dataset is obtained₂；

Set dataset of third training samples₂The data in (1) is randomly disturbed to be used as a training sample, and the second network weight omega is used₂Training the network for initial weights, s₃After step iteration, a third network weight omega is obtained₃。

S34, fourth-stage training: randomly extracting Q negative samples

The pairs of networks dataset trained in step S33₂The predicted values corresponding to the category B are subjected to size sorting to obtain 2 × Q images corresponding to the largest 2 × Q predicted values, and the sample labels corresponding to the Q images are set to be (0,1) in random extraction.While conserving dataset₂The image and the label corresponding to the middle category A are obtained to obtain a new training set consisting of Q labels of the category A, Q labels of the category B and the corresponding image, namely a fourth training sample set dataset₃；

Set dataset of fourth training samples₃The data in (1) is used as a training sample, and the first network weight omega is used₁Training the network for initial weights, setting a smaller learning rate, s₄After step iteration, a fourth network weight omega is obtained₄。

S35, fifth stage training

Performing output analysis according to the network obtained by training in the step S34 to generate target labels, exchanging the generated target labels to obtain a label set, training the network by taking the label set as a category label, and processing the training by S₅After step iteration, the final network weight omega is obtained₅。

In this embodiment, step S35 is implemented by the following sub-steps:

s351, performing output analysis according to the network obtained by training in the step S34 to generate a target label, wherein the process is as follows:

searching in the dimension where n is located, namely V (:, n), obtaining a sample corresponding to the maximum T labels in the dimension, and setting the labels thereof as n, wherein, ": "means all elements under the index are included, T is a hyper-parameter;

removing the T data with the set labels from the data set S, and updating the data set without the labeled labels to be S_new(ii) a Removing the vector corresponding to the T data with the set label from the matrix V to obtain the matrix V_new. If n is 0 (corresponding to category a), S_newThe label of the middle sample can be set as B; if n is 1 (corresponding to class B), S_newThe label of the middle sample may be set to a.

The process in step S351 described above is a single-class label generation process.

In the second category problem, after the generation process of the single category label is completed, the category label in the remaining data can be set as another type label, so that the above process is only needed to be executed once.

After the above process is completed, a fifth training sample set (S, G) containing the original image and the corresponding label can be obtained₀) Is denoted as dataset₄. Wherein G is₀Is a one-dimensional tensor (M,1) representing the set of generated labels for the data set S.

S352, the label set G obtained in the step S351₀And exchanging the labels with the proportion of r to generate a new label set, wherein r is a hyperparameter, the labels are set to be different in size in different training step lengths, and the number of samples corresponding to the exchanged labels is M multiplied by r. In the training iteration process, the ratio r is gradually reduced to 0. The process of exchanging labels is as follows:

for class two problems, the matrix V is divided into two sub-matrices V according to the corresponding prediction class vector L₁And V₂Wherein V is₁Denotes a matrix composed of all vectors corresponding to (1,0), V₂A matrix composed of all vectors corresponding to (0,1) is represented;

extraction of V₁And V₂Respectively obtaining the predicted values at the indexes corresponding to the respective categories in the vector v₁And v₂Respectively to vector v₁And v₂Sorting the medium elements to obtain the minimum M × r predicted values and corresponding indexes i in respective vectors₁And i₂；

S353, training the network by taking the label set as a class label required by the classification network training, and obtaining the label set₅After step iteration, the final network weight omega is obtained₅。

In this embodiment, steps S352 and S353 need to be executed for multiple times, where the number of times of execution is equal to the number of different values r in the super-parameter r setting.

In this embodiment, when the clustering network is trained in step 3, the gradientdescnoptimizing optimizer is adopted, and the learning rates at different stages are set to be 0.05, 0.1, 0.01, 0.05,0.01, training step size(s) in different stages₁To s₅) Are respectively set to 30, 20, 200, 400. Step 3 comprises five stages of training, wherein Q is set to 5 in the third stage of training and the fourth stage of training. In the fifth stage of training, the parameter T is set to be 200, and the parameter r is set to be a piecewise constant of {0.3,0.2,0.1 and 0}, and the training is changed every 100 steps.

In this embodiment, the data set S in step S1 is input into the network model constructed in step S2 and trained in step S3, and classification output is obtained.

In this embodiment, when the clustering network is tested, the images in the data set S are classified. In the test, the inputs are set to N ═ 1, H ═ 28, W ═ 28, and C ═ 1. And the input image is preprocessed by image standardization operation and then is accessed into a clustering network, and a classification result is obtained by output of the clustering network.

It will be appreciated by those of ordinary skill in the art that the examples provided herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited examples and embodiments. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims

1. An unsupervised training method of a two-class network is characterized by comprising the following steps:

s2, constructing a clustering network, wherein the clustering network comprises a convolution layer, a pooling layer and a full-connection layer, and the output of the full-connection layer is used as the output of the clustering network; the method comprises the following steps:

L_i＝argmax(V(i，1)，V(i，2))

Wherein, V (i,1) and V (i,2) respectively represent the element values at the corresponding indexes in the matrix V;

s3, training a clustering network, generating corresponding label sets of images participating in training in several stages of training, and training the clustering network based on the images participating in training and the corresponding label sets to obtain final network model training parameters; the method comprises the following steps:

S32, randomly selecting one image in the data set S as a positive sample, setting the label of the positive sample to be (1,0), setting the corresponding category to be A, setting the labels of the rest images to be (0, 0), and copying M copies of the positive sampleForming a second set of training samples dataset together with said data set₁；

S34, the network pair dataset obtained by training in the step S33₂The predicted values corresponding to the category B are subjected to size sorting to obtain 2 xQ images corresponding to the largest 2 xQ predicted values, the sample labels corresponding to the Q images are set to be (0,1) in the Q images, and simultaneously, the dataset is reserved₂The image and the label corresponding to the middle category A are obtained, and a fourth training sample set dataset consisting of Q category A labels, Q category B labels and the corresponding image is obtained₃；

S35, performing output analysis according to the network obtained by training in the step S34 to generate target labels, exchanging the generated target labels to obtain a label set, training the network by taking the label set as a category label, and processing the training by S₅After step iteration, the final network weight omega is obtained₅；

2. The method for unsupervised training of a classification network of claim 1, wherein the step S35 comprises:

s352, exchanging the generated target labels to obtain a label set;

3. The method for unsupervised training of a classification network of claim 2, wherein the step S351 comprises:

searching in a dimension V (:, n) where n is located, obtaining samples corresponding to the largest T labels in the dimension, and setting the labels of the samples as n, wherein: representing all elements under the index, and T is a hyper-parameter;

removing the T data with the set labels from the data set S, and updating the data set without the labeled labels to be S_new(ii) a Removing vectors corresponding to the T data with the set labels from the matrix V to obtain a matrix V_new(ii) a If n is 0, then S_newThe label of the middle sample is set as B; if n is 1, then S_newThe label of the middle sample is set as A;

4. The method for unsupervised training of a classification network of claim 3, wherein the step S352 comprises: