CN113469224A

CN113469224A - Rice classification method based on fusion of convolutional neural network and feature description operator

Info

Publication number: CN113469224A
Application number: CN202110663578.5A
Authority: CN
Inventors: 周梦秋; 胡浩基; 焦书迪; 王思宇; 阳雷; 龙永文
Original assignee: Zhejiang University ZJU; Foshan Shunde Midea Electrical Heating Appliances Manufacturing Co Ltd
Current assignee: Zhejiang University ZJU; Foshan Shunde Midea Electrical Heating Appliances Manufacturing Co Ltd
Priority date: 2021-06-16
Filing date: 2021-06-16
Publication date: 2021-10-01

Abstract

The invention discloses a rice classification method based on the fusion of a convolutional neural network and a feature description operator, which comprises the steps of firstly constructing a rice data set; then carrying out rice image global feature extraction through a convolutional neural network; extracting local features of the rice image by using an HOG feature description operator; fusing the global features of the rice image extracted by the convolutional neural network with the local features of the rice image extracted by the HOG-based feature description operator through a full connection layer; and (4) the fused rice image features are transmitted to a trained classification neural network, and the classification of the rice types is realized by utilizing a Softmax function. The invention provides a rice classification method based on the fusion of a convolutional neural network and a traditional feature description operator from the requirements of actual engineering projects and oriented to rice classification tasks, so that a better rice image classification effect is obtained.

Description

Rice classification method based on fusion of convolutional neural network and feature description operator

Technical Field

The invention relates to the fields of deep learning, traditional machine learning, image classification, multi-feature fusion and the like, in particular to a rice classification method based on fusion of a convolutional neural network and a traditional feature description operator.

Background

Since AlexNet structures have shown absolute advantages in ILSVRC games, Convolutional Neural Networks (CNNs) have become an object of much interest in the field of machine learning. With the continuous enhancement of computer computing power, more and more large convolutional neural networks have been successful in Artificial Intelligence (AI) tasks such as image recognition, target detection, natural language processing and the like by virtue of their complex network structures and strong feature expression capability.

Image classification, one of the earliest core problems in the development of machine vision tasks, aims to assign a corresponding label to an image from a given classification set so as to acquire category information of the image, wherein the input of the image is an image, and the output of the image is a corresponding label assigned to the image, which is the basis and key of other visual tasks such as object detection, instance segmentation, pose estimation and the like.

The image classification mainly comprises two types of image classification algorithms based on traditional feature description operators and deep learning classification algorithms based on convolutional neural networks, and before the convolutional neural networks are widely applied, the traditional image classification algorithms are generally adopted for image classification and identification. The traditional image classification algorithm generally comprises four processes of feature extraction, feature coding, space constraint and image classification. Since the birth of the deep learning image classification algorithm based on the convolutional neural network, the classification effect greatly surpasses that of the traditional image classification algorithm, and a historical breakthrough is obtained. With the increasing number of network layers and the more delicate structure design, a series of convolutional neural network models continuously refresh the performance on the ImageNet data set. Its TOP-5 error rate on ImageNet dataset decreased to around 3.5%. Meanwhile, the error rate of human eye identification is about 5%, which fully proves that the convolutional neural network classification algorithm based on deep learning exceeds the identification capability of human eyes.

The AlexNet model is the champion winner of the first ILSVRC game, and the network architecture contains 7 hidden layers in addition to some max pooling layers, and uses a linear activation function in each hidden layer to make it have faster training speed and better performance than the logic unit. In addition to this, AlexNet uses competitive normalization to suppress hidden activity when there is stronger activity in nearby cells with the goal of contributing to intensity variations. The VGG-16 model was proposed by the Oxford university in 2014, and compared with the previous deep learning convolutional neural network model, the VGG-16 model further widens the network structure and deepens the network layer number. Google lenet was first proposed in 2014, the model consisting of multiple inclusion modules, which acquired a champion in the current year of the ILSVRC competition. The google lenet model refers to the design idea of a nin (network in network) network, and a mean pooling layer is adopted to replace a multilayer full-connection layer in a traditional convolutional neural network. The ResNet (residual network) network is proposed for the first time in 2015, the champions of image classification, image object positioning and image object detection in the ILSVRC match in the current year are obtained, and the ResNet provides a residual error learning method for relieving the training of a deep network aiming at the problem of accuracy reduction caused by the deepening of network training.

Although the image algorithm technology based on deep learning is relatively mature, in an actual application scene, due to the influence of interference factors such as complex image background, fuzzy target object outline, unobvious difference of different types of target textures and the like, the global features extracted by the convolutional neural network are not enough to well express image information, so that the problems of wrong discrimination, incapability of discrimination and the like are caused. With the increasingly saturated computational power and the increasingly saturated number of network layers, the global features extracted based on the convolutional neural network can effectively and completely express the overall information of the whole image, but neglect fine-grained local features such as shapes and textures, and the problem of image classification with small feature difference cannot be well solved.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to solve the problem that the depth global features based on the convolutional neural network are not enough to well represent the images, and provides a rice classification method based on the fusion of the convolutional neural network and the traditional feature description operator for a rice classification task from the requirements of actual engineering projects, so that a better rice image classification effect is obtained. The method mainly comprises two steps: (1) and selecting VGG-16 as a main network to extract the depth global features of the rice image based on the rice image classification algorithm of the convolutional neural network. In the process, the background interference of the rice image is reduced by utilizing methods such as image automatic white balance and the like in combination with the practical engineering application condition. (2) On the basis of the rice image global features extracted based on the convolutional neural network, extracting the traditional local features of the rice image by using a traditional HOG (histogram of Oriented gradients) feature description operator, and fusing the traditional local features with the rice image depth global features extracted in the step (1), thereby realizing good expression of the rice image features.

The purpose of the invention can be realized by the following technical method: a rice classification method based on fusion of a convolutional neural network and a feature description operator comprises the following steps:

(1) constructing a rice data set; collecting a plurality of types of rice image data, and constructing a rice data set for training a convolutional neural network and extracting features by using a feature description operator;

(2) extracting the global features of the rice image; the input sizes of the rice images of the color three channels (RGB) are unified, and the extraction of the global features of the rice images is carried out by utilizing a VGG-16 model comprising a plurality of convolutional layer network modules; simultaneously, the model is connected with a maximum pooling layer at the end of each convolution layer network module to reduce the size of the picture;

(3) extracting local features of the rice image; the method comprises the following steps of extracting local features of the rice image by adopting an HOG feature description operator, wherein the HOG feature description operator forms the features by calculating and counting a directional gradient histogram of a local region of the rice image, and the method comprises the following specific steps:

(3.1) normalizing the rice image, wherein the normalization processing of the image is carried out by adopting a method of converting the rice image input with a color three channel (RGB) into a gray rice image and carrying out Gamma correction on the gray rice image;

and (3.2) calculating the gradient of the rice image, calculating the gradient value of each pixel point of the rice image in the horizontal direction and the vertical direction by using a gradient operator, and capturing texture information related to the rice image. Calculating the total direction gradient amplitude and gradient direction of each pixel point position in the rice image based on the calculated horizontal direction gradient component of each pixel point position of the rice image and the calculated vertical direction gradient component of each pixel point position of the rice image;

(3.3) constructing a direction gradient histogram for each cell, artificially setting the size of the cell, dividing pixel points into a plurality of cells, uniformly dividing gradient directions within the range of [0 degrees and 180 degrees ] into n bins under the condition that adjacent cells are not overlapped, setting the angle range interval of the n bins as the horizontal axis of the gradient direction histogram of each cell, and setting the accumulated value of gradient amplitudes corresponding to the angle range as the vertical axis of the gradient direction histogram of each cell so as to obtain a feature vector with the dimension size of n;

(3.4) acquiring features by taking cell blocks formed by combining cell units as units, wherein each cell block is square and comprises cells with the same number, and the feature vector of each cell block is normalized;

(3.5) collecting HOG local features, combining the feature vectors of all the unit blocks to form a rice image local feature vector extracted based on an HOG feature description operator, recording the rice image local feature vector as the HOG local features, and performing PCA (principal component analysis) dimension reduction processing on the obtained HOG local features to realize dimension matching with the image global features extracted by the VGG-16 network;

(4) fusing rice image features; fusing the global features of the rice image extracted based on the VGG-16 convolutional neural network in the step (2) with the local features of the rice image extracted based on the HOG feature description operator in the step (3) through a full connecting layer;

(5) classifying the rice image; and (4) the fused rice image features are transmitted to a trained classification neural network, and the classification of the rice types is realized by utilizing a Softmax function.

Further, the step (2) is specifically as follows: the rice image input size is unified into 224 × 224 pixels, and the extraction of the global features of the rice image is performed by using a VGG-16 model pre-trained based on the ImageNet data set. The VGG-16 network consists of five convolutional layer network modules, wherein each convolutional layer network module contains 2 or 3 convolutional layer networks.

Further, in step (3.1), the Gray scale processing (Gray) formula of the rice image is as follows:

Gray＝0.3×R+0.59×G+0.11×B

the formula of Gamma correction is as follows, where the rice image after the gradation processing is represented by I (x, Y), the rice gradation image after the Gamma correction is represented by Y (x, Y), and γ is 0.5:

Y(x，y)＝I(x，y)^γ。

further, the step (3.2) comprises the following specific processes: first, use [ -1, 0, 1 [ -1]Performing convolution operation on rice image pixel points by gradient operators to obtain gradient components G of the pixel points in the horizontal direction_x(x, y); then, use [1, 0, -1]Performing convolution operation on rice image pixel points by gradient operators to obtain gradient components G in the vertical direction of the pixel points_y(x, y). The calculation formula is as follows:

G_x(x，y)＝H(x+1，y)-H(x-1，y)

G_y(x，y)＝H(x，y+1)-H(x，y-1)

wherein G is_x(x, y) represents the horizontal square of the pixel (x, y)Gradient in direction, G_y(x, y) represents the vertical gradient of the pixel point (x, y), and H (x, y) represents the pixel value of the pixel point.

Finally, based on the calculated horizontal gradient component G of each pixel point position of the rice image_x(x, y) and the vertical gradient component G of each pixel point position of the rice image_y(x, y), calculating the total direction gradient amplitude G (x, y) and gradient direction theta of each pixel point position in the rice image according to the following formula:

here, since the absolute value of the gradient direction is set, the gradient direction range here is [0 °, 180 ° ].

Further, in step (3.3), the cell size is artificially set to 8 × 8 pixels, so that a total of 128 values can be obtained for each cell, the gradient direction in the range of [0 °, 180 ° ] is uniformly divided into 9 bins, that is, 9-dimensional feature vectors, and a corresponding numerical value is obtained for each bin according to the interval position of the bin in which each pixel point in the rice image is located.

Further, in step (3.4), since the illumination variation in the rice image is different from the background, each 4 cells are formed into a cell block, and each cell block is subjected to data normalization by using an L2-norm formula. Since a cell has a feature vector with dimension size of 9, a cell block has a vector with feature dimension of 36, and thus, normalization is performed on each cell block, i.e., the 36-dimensional feature vector is normalized. The specific formula for normalizing the feature vector is as follows, where v is used to represent the feature vector:

where ε is a constant, and exists in the sense of preventing the denominator from being 0, where ε is defined as 0.001.

The invention has the beneficial effects that:

(1) from the requirements of actual engineering projects, a multi-feature fusion classification network based on the fusion of global features of rice images extracted by a VGG-16 convolutional neural network and local features of rice images extracted by an HOG traditional feature description operator is provided for rice classification tasks and aiming at the problems that the features of different types of rice are very slightly different and the features of the same type of rice are different due to batch reasons of manufacturers.

(2) The multi-feature fusion rice classification network, the original VGG-16 convolutional neural network and the HOG feature description operator are compared on the rice data set used by the invention through experiments, the top1 accuracy and the top5 accuracy are higher than those of the original VGG-16 convolutional neural network and the HOG feature description operator, and the superiority and the effectiveness of the multi-feature fusion rice classification network are well proved.

Drawings

FIG. 1 is an overall flow chart of the present invention.

FIG. 2 is a diagram of a rice classification network based on the fusion of a convolutional neural network and a conventional feature description operator.

Detailed Description

The following describes embodiments of the present invention in further detail with reference to the accompanying drawings.

As shown in fig. 1 and 2, the rice classification method based on the fusion of the convolutional neural network and the feature description operator provided by the invention specifically includes the following steps:

(1) and constructing a rice data set. Firstly, starting from actual project requirements, 20761 pieces of rice image data of 14 types of rice are collected for a rice classification task, wherein each type of rice has about 1500 pieces of data. And based on the collected data, dividing the collected data into a training data set and a testing data set of the rice according to the proportion of 7: 3, thereby constructing the rice data set required by the invention.

(2) And extracting the global features of the rice image. The input sizes of the rice images of the three color channels (RGB) are unified into 224 multiplied by 224 pixels, and the extraction of the global features of the rice images is carried out by utilizing a VGG-16 model which is pre-trained based on an ImageNet data set. The VGG-16 network consists of five convolutional layer network modules, each of which contains 2 or 3 convolutional layer networks, and the network is connected with a Max Pooling layer (Max Pooling) at the end of each convolutional layer network module to reduce the size of the picture.

(3) And extracting local features of the rice image. The method comprises the steps of extracting local features of a rice image by adopting a traditional HOG (histogram of Oriented gradients) feature description operator, forming the features by calculating and counting a directional gradient histogram of a local region of the rice image by the HOG feature description operator, wherein the feature extraction process comprises five steps of normalizing the rice image, calculating the gradient of the rice image, constructing the directional gradient histogram for each cell unit (cell), acquiring the features by taking a cell block (block) formed by combining the cell units as a unit, and collecting the HOG features.

In the step of normalizing the rice image, the normalization processing of the image is carried out by adopting a method of converting the rice image with input color three channels (RGB) into a gray-scale rice image and carrying out Gamma correction on the gray-scale rice image. The Gray processing (Gray) formula of the rice image is as follows:

Gray＝0.3×R+0.59×G+0.11×B

Y(x，y)＝I(x，y)^γ

in the step of calculating the gradient of the rice image, gradient operators are used for calculating the gradient value of each pixel point of the rice image in the horizontal direction and the vertical direction, so that the texture information related to the rice image can be well captured. First, use [ -1, 0, 1 [ -1]Pair of gradient operatorsPerforming convolution operation on rice image pixel points to obtain gradient components G of the pixel points in the horizontal direction_x(x, y); then, use [1, 0, -1]Performing convolution operation on rice image pixel points by gradient operators to obtain gradient components G in the vertical direction of the pixel points_y(x, y). The calculation formula is as follows:

G_x(x，y)＝H(x+1，y)-H(x-1，y)

G_y(x，y)＝H(x，y+1)-H(x，y-1)

wherein G is_x(x, y) represents the horizontal gradient of the pixel point (x, y), G_y(x, y) represents the vertical gradient of the pixel point (x, y), and H (x, y) represents the pixel value of the pixel point.

here, since the absolute value of the gradient direction is set, the gradient direction range here is [0 °, 180 ° ]

After the two steps of rice image normalization and rice image gradient calculation are completed, each pixel point in the rice image has two values of a direction gradient amplitude G (x, y) and a gradient direction theta. In the step of constructing the histogram of directional gradients for each cell (cell), a total of 128 values, i.e., 8 × 8 pixels, can be obtained for each cell (cell) by artificially dividing the pixel points into a plurality of cells (cells) and by dividing the pixel points into the cells (cells) without overlapping between adjacent cells (cells). Based on this, the gradient direction with the range of [0 °, 180 ° ] is divided into 9 bins (i.e., 9-dimensional feature vectors), and the interval position of the bin where each pixel point is located in the rice image is counted. Based on the operation process, a corresponding numerical value can be obtained for each bin, the angular range interval of the 9 bins is set as the horizontal axis of the gradient direction histogram of each cell (cell), and the corresponding gradient magnitude accumulated value in the angular range is set as the vertical axis of the gradient direction histogram of each cell (cell), so that an array with the size of 9 can be obtained.

In the step of obtaining characteristics by taking a cell block (block) formed by combining cell units as a unit, because the illumination condition change in a rice image is different from the background, a cell block is formed by every 4 cells, and data normalization is carried out on each cell block by adopting an L2-norm formula. Since a cell (cell) has a feature vector with dimension size of 9, a cell block has a vector with feature dimension of 36, and thus, normalization is performed on each cell block, i.e., the 36-dimensional feature vector. The specific formula for normalizing the feature vector is as follows, where v is used to represent the feature vector:

where ε is a constant with a small value, and exists in the sense of preventing the denominator from being 0, we define ε as 0.001.

In the step of collecting the HOG features, the local feature vectors of the rice image extracted based on the HOG feature description operator are combined by combining the feature vectors of all unit blocks (blocks), but because the dimension of the feature vectors is overlarge, in order to well fuse the HOG local features and the rice image global features extracted based on the VGG-16 convolutional neural network in the step (2), PCA dimension reduction processing needs to be carried out on the obtained HOG local features to obtain 4096-dimensional feature vectors so as to realize dimension matching with the image global features extracted by the VGG-16 network.

(4) And fusing rice image characteristics. And (3) fusing the global features of the rice image extracted based on the VGG-16 convolutional neural network in the step (2) with the local features of the rice image extracted based on the HOG traditional feature description operator in the step (3) in a full connection layer.

(5) And classifying the rice image. And (4) conveying the fused rice image features to a network classification layer, and realizing the classification of the rice types by utilizing a Softmax function.

The embodiment of the invention is as follows:

(1) preparation work

(1.1) data set preparation work. From the actual project requirements, 20761 pieces of rice image data of 14 types of rice are collected for the rice classification task, wherein each type of rice has about 1500 pieces of data. The rice data set used by the invention is constructed by dividing the rice training data set and the rice testing data set according to the proportion of 7: 3.

And (1.2) network structure configuration work. The convolutional neural network architecture adopted in the invention is a pre-trained VGG-16 model (download link is https:// download. catalog. org/models/VGG16-397923af. pth) based on ImageNet dataset (dataset official: http:// www.image-net. org /).

(2) Rice identification method based on fusion of convolutional neural network and traditional feature description operator

And (2.1) on the basis of the completion of the data set preparation work, performing data preprocessing by using random clipping, color dithering and an Adam optimizer to optimize the overall target, thereby achieving the purpose of data enhancement.

And (2.2) in a model training stage, using a training data set part of the self-made rice data set in the step (1.1) to train the rice classification network based on the fusion of the convolutional neural network and the traditional feature description operator, and using VGG-16 model parameters pre-trained based on the ImageNet data set as initial parameters.

(2.3) in order to compare and analyze the experimental effect and performance of the invention, an experimental control group is additionally arranged. The pre-trained raw VGG-16 model based on the ImageNet dataset was subjected to model-Fine-tuning training (Fine-tune) on the training dataset of the rice dataset.

And (2.4) aiming at the experiment in (2.2) and the control group experiment in (2.3), the same experiment setting parameters are adopted. Wherein the batch size is set to 16, and the initial learning rate is set to 1 × 10^-2. Meanwhile, a training strategy with a step learning rate is adopted, and the learning rate is 1 multiplied by 10⁵Decay by a factor of 10 after each time until the model iterates to 4 × 10⁵And training the model by using the training method of the secondary stop training.

(3) Based on an environment configuration of a PyTorch deep learning framework and a CUDA10.2, on 2 blocks of display cards of 24G display memory of Nvidia Geforma GTX 1080Ti with an input image resolution of 224 multiplied by 3, respectively carrying out accelerated training on an original VGG-16 model and a VGG-16 convolution neural network fused with a traditional feature description operator by using the same experimental configuration, and then verifying the trained model on a rice test set, wherein the average top-1 accuracy of the original VGG-16 model on the rice test set is 71.49%, and the average top-5 accuracy is 95.78%; the average top-1 accuracy rate of the VGG-16 convolutional neural network fusing the traditional feature description operators on the rice test set is 72.81%, and the average top-5 accuracy rate is 96.41%.

The above-described embodiments are intended to illustrate rather than to limit the invention, and any modifications and variations of the present invention are within the spirit of the invention and the scope of the appended claims.

Claims

1. A rice classification method based on the fusion of a convolutional neural network and a feature description operator is characterized by comprising the following steps:

2. The rice classification method based on the fusion of the convolutional neural network and the feature description operator as claimed in claim 1, wherein the step (2) is specifically as follows: the rice image input size is unified into 224 × 224 pixels, and the extraction of the global features of the rice image is performed by using a VGG-16 model pre-trained based on the ImageNet data set. The VGG-16 network consists of five convolutional layer network modules, wherein each convolutional layer network module contains 2 or 3 convolutional layer networks.

3. The rice classification method based on the fusion of the convolutional neural network and the feature descriptor as claimed in claim 1, wherein in the step (3.1), the Gray processing (Gray) formula of the rice image is as follows:

Gray＝0.3×R+0.59×G+0.11×B

Y(x，y)＝I(x，y)^γ。

4. the rice classification method based on the fusion of the convolutional neural network and the feature description operator as claimed in claim 1, wherein the specific process of the step (3.2) is as follows: first, use [ -1, 0, 1 [ -1]Performing convolution operation on rice image pixel points by gradient operators to obtain gradient components G of the pixel points in the horizontal direction_x(x, y); then, use [1, 0, -1]Performing convolution operation on rice image pixel points by gradient operators to obtain gradient components G in the vertical direction of the pixel points_y(x, y). The calculation formula is as follows:

G_x(x，y)＝H(x+1，y)-H(x-1，y)

G_y(x，y)＝H(x，y+1)-H(x，y-1)

5. The rice classification method based on the fusion of the convolutional neural network and the feature descriptor as claimed in claim 1, wherein in step (3.3), the cell size is artificially set to 8 × 8 pixels, so that a total of 128 values can be obtained for each cell, the gradient direction in the range of [0 °, 180 ° ] is uniformly divided into 9 bins, i.e. 9-dimensional feature vectors, and a corresponding numerical value is obtained for each bin according to the interval position of the bin in which each pixel point is located in the rice image.

6. The rice classification method based on the fusion of the convolutional neural network and the feature descriptor as claimed in claim 5, wherein in step (3.4), because the illumination change in the rice image is different from the background, each 4 cells are formed into a cell block, and the data normalization is performed on each cell block by using the L2-norm formula. Since a cell has a feature vector with dimension size of 9, a cell block has a vector with feature dimension of 36, and thus, normalization is performed on each cell block, i.e., the 36-dimensional feature vector is normalized. The specific formula for normalizing the feature vector is as follows, where v is used to represent the feature vector: