CN114842233A

CN114842233A - Sequence random network image classification method

Info

Publication number: CN114842233A
Application number: CN202110131810.0A
Authority: CN
Inventors: 李朝荣; 覃凤清; 曾安平
Original assignee: Yibin University
Current assignee: Yibin University
Priority date: 2021-02-01
Filing date: 2021-02-01
Publication date: 2022-08-02

Abstract

The invention relates to the field of image classification and identification, in particular to an image classification method using a sequence random network. Aiming at the defects of the conventional image classification and retrieval method, the invention provides a random network image classification model capable of capturing an image one-dimensional depth feature sequence, which is called BilSTM-TDN. The BilSTM-TDN image classification model consists of a bidirectional long-short term memory module BilSTM and a plurality of Tanh-Dropout blocks. If the target image is a target training database consisting of airplane images, the airplane type can be accurately and automatically identified by using the method; if the animal type identification method is a target training database composed of animals, the animal type can be accurately and automatically identified by the method.

Description

Sequence random network image classification method

Technical Field

The invention relates to the field of image classification and identification, in particular to an image classification method using a sequence random network.

Background

Image recognition and classification are very widely used: such as identifying the type of car, the type of animal, identifying the identity from a human face, etc. Most images today use deep convolutional neural models (such as ResNet, DenseNet) to extract image features, and then perform image content recognition. In recent years, a tanformer image recognition model based on an attention mechanism has also appeared. Although these models greatly improve the recognition performance of images, these methods are difficult to meet the practical requirements due to the higher and higher application requirements.

The deepening and the horizontal operation of the network enable the requirement of the model on machine hardware to be higher and higher, and usually, several days or even dozens of days are needed for training a deep model; however, the performance improvement in exchange for this is only marginal, at least not as expected. One of the fundamental reasons that existing network models are limited in performance is that correlation properties between image sequences are ignored. Usually, a plurality of sample images are arranged in a class in a training set, and the sample images have great similarity; the features extracted by the deep network also have strong correlation, and the classification effect can be effectively improved by utilizing the correlation.

Disclosure of Invention

Aiming at the defects of the existing image classification and retrieval method, the invention provides a random network image classification model capable of capturing an image one-dimensional depth feature sequence, which is called BilSTM-TDN. The BilSTM-TDN image classification model is composed of a bidirectional long-short term memory module (BilSTM) and a plurality of Tanh-Dropout blocks (TD blocks for short). BilSTM is a recurrent neural network, and for each time t, the input is simultaneously provided to two opposite long-short memory (LSTM) networks, and the output is jointly determined by the LSTMs in the two directions. BilSTM is used here to learn information about long-term dependencies between several feature sequences. TD blocks refer to the combination of a tan layer and a drop Dropout layer by a bi-tangent activation function. The double tangent activation function tanh is expressed as follows:

wherein sinh and cosh are hyperbolic sine function sinh and hyperbolic cosine function cosh, respectively. The Dropout layer is a layer commonly used in a deep convolutional network, and randomly selects a subset of the weight set W to be reserved with a reserved probability p during training, i.e., a part of weights are discarded with a probability 1-p, and is represented as follows:

y＝W| _p *x

where x represents the input to the layer and y is the output of the layer. When classification is performed, the output is multiplied by the probability p, which is expressed as follows:

y＝W*px

when the classification is carried out, firstly, a network model ResNet-101 pre-trained on ImageNet is subjected to transfer learning on a target database, and then all the characteristics of the images in the database are extracted. The features of the images are one-dimensional, and then the images are trained by the BilSTM-TDN according to the features, and finally the trained BilSTM-TDN is used for image classification. If the target image is a target training database formed by airplane images, the airplane type can be accurately and automatically identified by using the method; if the target training database is composed of animals, the animal type can be accurately and automatically identified by using the method.

Drawings

FIG. 1 is a diagram of a structure of a sequential stochastic network model according to the present invention.

Fig. 2 is a schematic diagram of a TD block structure in a sequential random network.

FIG. 3 is a flow chart of the classification implemented by the method of the present invention.

Detailed Description

FIG. 1 is a sequence random network BilSTM-TDN of the present invention, in which the first layer from left to right is a sequence data input layer 1-a feature sequence input layer; the second layer is a 2-Dropout layer, with the retention probability p set to 0.6; the third layer is a BilSTM layer; the fourth layer to the seventh layer are intermediate layers composed of TD blocks, and respectively are: 4-TD block-1, 4-TD block-2, 4-TD block-3, 4-TD block-4, wherein the retention probability p in 4-TD block-1 and 4-TD block-2 is set to 0.6, and the retention probability p in 4-TD block-3 and 4-TD block-4 is set to 0.5; the eighth layer is a 5-full connection layer, and the number of nodes in the layer is the number of categories of the training data set; the ninth layer is a 6-Softmax output layer and is used for realizing image classification; FIG. 2 is the structure of a TD block in a BilSTM-TDN; the TD block consists of one Tanh layer and one Dropout layer.

The method of the invention is further explained by combining the attached drawings, and the specific implementation steps are as follows:

step 1, carrying out transfer learning on a ResNet-101 model pre-trained on ImageNet: firstly, initializing a ResNet-101 model by network parameters trained on an ImageNet large database, and then performing transfer learning on a specific target database to obtain a ResNet-101 transfer model;

step 2, extracting one-dimensional features of the image by using a ResNet-101 migration model, and combining the one-dimensional features into a feature vector database: calculating the characteristics of each image on a target database by using a ResNet-101 migration model, wherein the image characteristics are characteristic vectors of 2048 dimensions; recombining the image features in the target database into a feature vector database;

step 3, training a BilSTM-TDN network on the feature vector database;

and 4, classifying and identifying images by using the trained BilSTM-TDN network: given an image, 2048-dimensional feature vectors of the image are extracted by using a ResNet-101 migration model, and the feature vectors of the image are input into a network to identify the category of the image.

Claims

1. An image recognition and classification method, characterized in that it comprises the following steps:

step 2, extracting one-dimensional features of the image by using a ResNet-101 migration model, and combining the one-dimensional features into a feature vector database: calculating the characteristics of each image on a target database by using the ResNet-101 migration model, wherein the image characteristics are characteristic vectors with 2048 dimensions; recombining the image features in the target database into a feature vector database;

step 3, training a sequence random network BiLSTM-TDN on a feature vector database;

and 4, classifying and identifying images by using the trained sequence random network BilSTM-TDN: given an image, 2048-dimensional feature vectors of the image are extracted by using a ResNet-101 migration model, and the feature vectors of the image are input into a network to identify the category of the image.

2. An image recognition and classification method according to claim 1, characterized in that a sequential stochastic network BiLSTM-TDN is designed, in which network the first layer is a sequential data input layer 1-a feature sequence input layer; the second layer is a 2-Dropout layer, with the retention probability p set to 0.6; the third layer is a BilSTM layer; the fourth layer to the seventh layer are intermediate layers composed of TD blocks, and respectively are: 4-TD block-1, 4-TD block-2, 4-TD block-3, 4-TD block-4, wherein the retention probability p in 4-TD block-1 and 4-TD block-2 is set to 0.6, and the retention probability p in 4-TD block-3 and 4-TD block-4 is set to 0.5; the eighth layer is a 5-full connection layer, and the number of nodes in the layer is the number of categories of the training data set; the ninth layer is a 6-Softmax output layer and is used for realizing image classification; the TD block consists of one Tanh layer and one Dropout layer.