CN112200307A

CN112200307A - Recognizer processing method based on picture data expansion

Info

Publication number: CN112200307A
Application number: CN202011111459.0A
Authority: CN
Inventors: 纪雪飞; 王珏; 李业; 孙强; 丁瑞; 徐晨
Original assignee: Nantong University
Current assignee: Nantong University
Priority date: 2020-10-16
Filing date: 2020-10-16
Publication date: 2021-01-08

Abstract

The invention discloses a recognizer processing method based on image data expansion, which belongs to the field of machine learning and comprises the following steps: data set preparation: making the existing multiple types of picture data into a data set, and attaching a label; data expansion: inputting a data set consisting of pictures and labels into a network, generating the network to complete data expansion, sending the expanded data set and original data together, and training weight parameters of a recognizer; and (3) feedback: feeding back the test result of the recognizer on the original data to the data expansion module, and dynamically adjusting the weight parameters of the data expansion module; and (4) taking a period of complete training of a batch of data, and repeating the cycle expansion and the feedback step until reaching the preset training period number. The method of data expansion is utilized, a limited number of samples are expanded, and then the original data and the expanded data are used for training the weight parameters of the recognizer; the method improves the identification accuracy of the identifier under the scene of less samples.

Description

Recognizer processing method based on picture data expansion

Technical Field

The invention relates to the technical field of machine learning, in particular to a recognizer processing method based on picture data expansion.

Background

The deep learning method is a new algorithm which has received much attention in recent years as a branch of machine learning. Image recognition is a research topic, and compared with a traditional recognition method based on feature extraction, deep learning has higher accuracy and higher recognition speed. In deep learning, the data volume usually determines the recognition accuracy of a deep learning network, the deep learning network trained by mass data has better generalization capability, the weight parameters of the network are more mature, and the recognition capability of unknown data types is higher. However, in some cases, the number of obtained samples is very limited, and how to train a better network with a smaller number of training samples is a promising prospect. On the other hand, in order to extract more feature information, the depth of the deep learning network is increased, but with the problem of "overfitting", which is the most direct solution to increase the size of the training data set.

There are two common data expansion algorithms: the data regeneration based on the generation of the antagonistic neural network and the data expansion based on the variation self-coding machine. The generation of the confrontation neural network is to set up a generator and a discriminator from the perspective of game theory, the discriminator needs to distinguish data generated by the generator as far as possible, the generator needs to generate data which is difficult for the discriminator to distinguish true and false, and the generator and the discriminator are confronted with each other and are cooperatively optimized. The variational self-coding machine stands in the angle of probability, extracts the mean value and the variance of original data, and synthesizes new data. Both methods are based on image data, the main purpose of which is to produce, from the point of view of the generated data, generated data that is close to the original data but different. At present, two kinds of indexes for identifying the quality of generated data are provided: diversity and clarity. The diversity reflects the difference between the generated images, and whether the same data is generated. The sharpness reflects the degree of sharpness of the generated image.

Disclosure of Invention

The invention aims to provide a recognizer processing method based on image data expansion, which aims to train a recognizer network by adding generated data and improve the generalization capability of a recognizer.

In order to achieve the purpose, the invention adopts the technical scheme that: the innovation point of the recognizer processing method based on the picture data expansion is that the recognizer processing method comprises the following steps:

s1, data set preparation: making the existing multiple types of picture data into a data set, and attaching a label;

s2, data expansion: inputting a data set consisting of pictures and labels into a network, generating the network to complete data expansion, sending the expanded data set and original data together, and training weight parameters of a recognizer;

s3, feedback structure: feeding back the test result of the recognizer on the original data to the data expansion module, and dynamically adjusting the weight parameters of the data expansion module; taking a period of complete training of a batch of data, repeating the cycle expansion and the feedback step until reaching the preset training period number;

wherein, a feedback structure is formed between the data expansion module and the recognizer module.

Furthermore, the data expansion module adopts a deep learning method, and the deep learning network is a differential condition variation automatic encoder.

Further, the recognizer adopts a deep learning method, and the deep learning network is a convolutional neural network.

Further, the feedback structure calculates a loss value between the output distribution of the identifier and the data tag distribution, and uses the loss value as a part of a loss function of the data expansion module.

Furthermore, the data expansion module comprises an encoding unit and a decoding unit; the encoding unit samples the posterior distribution of the original data to obtain an intermediate variable, and the decoding unit reconstructs the original data from the distribution of the intermediate variable to obtain expanded data.

Furthermore, a feedback structure is formed between the decoding unit and the identifier, the weight parameters of the decoding unit are adjusted, and the generated data reconstructed by the decoder is optimized.

Further, the recognizer is a weight parameter for training the network with the augmented data along with the raw data.

The invention has the beneficial effects that:

the invention utilizes a data expansion method to expand the existing data, then trains the recognizer network together with the expanded data, feeds back the recognition result to the data expansion module, and dynamically adjusts the weight parameters of the data expansion module. Compared with the method that after thousands of training, the generated countermeasure network trains the recognizer by using the expanded data, the efficiency is lower. The method can obviously improve the generalization ability of the recognizer only by hundreds of times of training, improves the recognition accuracy of the recognizer, and is simple and convenient and high in efficiency.

Drawings

FIG. 1 is a flow chart of a method of the present invention;

FIG. 2 is a diagram of a conventional recognizer that is trained without adding augmented data;

FIG. 3 is a diagram of recognizers trained by other researchers to generate data against network augmentation;

FIG. 4 shows the test accuracy of the network obtained by training only with the raw data in scenario 1;

FIG. 5 shows the test accuracy of the network trained by adding the extended data generated by the generation of the countermeasure network in scenario 1;

FIG. 6 shows the test accuracy of the network trained by adding the extended data generated by the method under scenario 1;

FIG. 7 shows the test accuracy of the network trained with the original data and the extended data added by the method under scenario 2.

Detailed Description

The invention will be further described with reference to the accompanying drawings.

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and the detailed description. It should be understood that the detailed description and specific examples, while indicating the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.

Fig. 1 is a block diagram of an algorithm, the functions of the different modules being as follows:

the encoder obtains the mean and variance from the original data, which are recorded as mu_zAnd σ_z(ii) a The parameter to be updated by the encoder is phi.

The decoder uses the intermediate variables to generate the extended data. The parameter of the decoder is θ.

The identifier uses a convolution neural network to extract the characteristic information of the signal, and then uses a logistic regression algorithm to output the probability values of different types of labels. Theta_CAre parameters of the identifier.

The feedback structure informs the decoder of the decision result of the identifier so as to adjust the weight parameters of the decoder.

The algorithm is divided into two steps, the first step is to train a recognizer by using the generated data and the original data, and the second step is to recognize the regenerated new data by using the recognizer and feed back the result to a decoder so as to adjust the parameters of the decoder. The algorithm flow is as follows:

randomly extracting a batch of data from a data set, wherein x and y are respectively pictures and labels, and the labels are preprocessed by one-hot coding and can be recorded as { x, y } P_dataI.e. the distribution of x, y follows the distribution of the original data set.

The posterior probability of the intermediate variable z is q_φ(z | x, y) assuming a priori probability p for z_θ(z) and p_θ(z) is subject to a standard normal distribution.

From q_φSampling of the probability distribution of (z | x, y) can yield z:

z＝g_φ(x,y,ε)＝μ_z+σ_z·ε (1)

wherein g is_φ(x, y, ε) represents the pair q_φThe distribution of (z | x, y) is sampled, and ε follows a standard normal distribution.

The loss function of the encoder can be written as:

L_KL＝KL(q_φ(z|x,y)||p_θ(z)) (2)

where KL (-) represents the KL divergence loss function. Denotes q_φ(z | x, y) vs. p_θEntropy of (z) in order to expect the posterior probability distribution of the intermediate variables to approximate the standard normal distribution.

By probability distribution

The data that can be generated is recorded as

Can be implemented by a decoder:

where D (-) denotes the decoder.

And according to the newly generated data, identifying the data by using an identifier, wherein the loss function of the identifier is as follows:

wherein the content of the first and second substances,

a category of the output of the recognizer is represented,

is expressed as input

The recognizer outputs probabilities for different types of tags, if any.

E (-) is a cross-entropy loss function expressed in the distribution q_φUnder the condition of (z | x, y),computing

The entropy of the distribution. The purpose is to hope

Can approach the distribution of x.

Accordingly, the loss function of the decoder can be written as:

then, the loss function of the encoder is reversely differentiated by using a gradient descent algorithm, and the weight parameters of the encoder are updated:

where λ is an empirical parameter, it may be taken to be 0.5.

Updating the recognizer by using a gradient descent algorithm:

the encoder again generates a new batch of data (step two), which is recorded as

The loss function of the feedback structure is:

wherein the content of the first and second substances,

representing input data

As a result of the recognition at the time of the day,

indicating the probability of the recognizer outputting different types of tags.

The gradient descent algorithm is used to follow the parameters of the new decoder:

the complete loss function is as follows:

L＝L_CVAE+L_C+L_DC (11)

repeating the steps (1) to (11) until a preset training period is reached.

And saving the weight parameter of the last training as the final training weight.

Feasible, in the present embodiment, the recognizer is obtained by comparing the experiment without data expansion training, as shown in fig. 2; other researchers have joined recognizers trained to generate augmented data against network generation, as in FIG. 3.

Test cases:

in order to test the effectiveness of the algorithm provided by the patent, the actual picture recognition is taken as an example for explanation.

The transmitted signal is modulated according to a common signal modulation type to obtain a received signal under different channel environments, the signal can be represented in the form of pictures, and the purpose of the case is to obtain a corresponding modulation type from the received signal.

The collected data are divided into a training set and a test set according to different proportions, the training set is used for training the model, and the test set is used for testing the recognition accuracy of the model. The recognizer adopts a simplest deep learning network with a convolutional neural network layer and a fully connected layer.

The test scenarios are divided into the following two types:

the recognition accuracy under the condition of additive white Gaussian noise simulation is that the sum of training data and test data of each modulation type is 300 pictures.

And actually measuring the identification accuracy under the indoor channel condition, wherein the sum of the training data and the test data of each modulation type is 500 pictures.

And (3) testing results:

under the condition of an additive white gaussian noise channel, fig. 4, fig. 5 and fig. 6 are respectively schematic diagrams of the relationship between the test accuracy and the picture quality after adding the data generated by the method into the data generated by the generation of the countermeasure network by only using the original data. The illustration in the figure refers to the scale of the training and test sets. Therefore, the test accuracy rate is increased along with the increase of the training set; the accuracy rate changes along with the quality degree of the picture, and when the picture quality is good, the accuracy rate reaches 100 percent; further, when the extended data is added, the accuracy is further improved, especially under the condition of poor picture quality. Compared with the method for generating the confrontation network expansion data, the method has better performance.

Under measured indoor channel conditions, the test results are shown in fig. 7. It can be seen that the recognizer provided by the patent is also better than the traditional recognizer trained only with raw data.

The above description is only for the purpose of illustrating the technical solutions of the present invention and not for the purpose of limiting the same, and other modifications or equivalent substitutions made by those skilled in the art to the technical solutions of the present invention should be covered within the scope of the claims of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A recognizer processing method based on picture data expansion is characterized by comprising the following steps:

2. The picture data expansion-based recognizer processing method according to claim 1, wherein: the data expansion module adopts a deep learning method, and the deep learning network is a differential condition variation automatic encoder.

3. The picture data expansion-based recognizer processing method according to claim 1, wherein: the recognizer adopts a deep learning method, and the deep learning network is a convolutional neural network.

4. The picture data expansion-based recognizer processing method according to claim 1, wherein: the feedback structure calculates a loss value between the output distribution of the recognizer and the data tag distribution, and uses the loss value as a part of a loss function of the data expansion module.

5. The picture data expansion-based recognizer processing method according to claim 1, wherein: the data expansion module comprises an encoding unit and a decoding unit; the encoding unit samples the posterior distribution of the original data to obtain an intermediate variable, and the decoding unit reconstructs the original data from the distribution of the intermediate variable to obtain expanded data.

6. The picture data expansion-based recognizer processing method according to claim 5, wherein: and a feedback structure is formed between the decoding unit and the identifier, the weight parameter of the decoding unit is adjusted, and the generated data reconstructed by the decoder is optimized.

7. The picture data expansion-based recognizer processing method according to claim 6, wherein: the recognizer is a weight parameter that trains the network with the augmented data along with the raw data.