CN109858495B

CN109858495B - Feature extraction method and device based on improved convolution block and storage medium thereof

Info

Publication number: CN109858495B
Application number: CN201910039450.4A
Authority: CN
Inventors: 应自炉; 甄俊杰; 陈俊娟; 甘俊英; 龙祥; 黄尚安; 赵毅鸿; 宣晨
Original assignee: Wuyi University
Current assignee: Wuyi University
Priority date: 2019-01-16
Filing date: 2019-01-16
Publication date: 2023-09-22
Anticipated expiration: 2039-01-16
Also published as: CN109858495A

Abstract

The application discloses a feature extraction method, a device and a storage medium based on an improved convolution block. And setting a compression convolution block in the VGG16 network, after the compression convolution block receives the image features input by the previous pooling layer, compressing the channel number of the input image features through a first convolution layer of the compression convolution block to obtain the output features of the first convolution layer as the input of a second convolution layer and a third convolution layer, and splicing the output features of the second convolution layer and the output features of the third convolution layer to obtain the output features of the compression convolution block. The number of channels of the convolution layer is smaller than that of the pooling layer, so that the dimension of the input image features is compressed, the dimension of the feature map is effectively reduced, the calculated amount of network training is reduced, the time of feature map training is shortened, and the feature extraction efficiency is greatly improved.

Description

Feature extraction method and device based on improved convolution block and storage medium thereof

Technical Field

The application relates to the field of neural networks, in particular to a feature extraction method and device based on an improved convolution block for image recognition and a storage medium thereof.

Background

At present, with the wider application of image recognition technology, in order to accurately retrieve images from a mass database, the image features of the original images need to be extracted by using a convolutional neural network, and then the image features are quantized and encoded, and then a recognition result is output according to the similarity obtained by calculating the hamming distance. In the process of image extraction, the depth of the network determines the nonlinear capability of the network, and the nonlinear capability is critical to the extraction of image features. In order to obtain better nonlinear capability, most convolutional neural networks adopted in the prior art are VGG16 networks, the VGG16 networks can obtain better nonlinear capability, but the traditional VGG16 networks comprise a plurality of convolutional blocks, the number of layers of the convolutional layers in the convolutional blocks is large, and after multiple convolutions, the characteristic dimension is large, so that the time consumed by training a characteristic diagram is long, and therefore, the better nonlinear capability cannot be obtained in a short time, and the image recognition efficiency is affected.

Disclosure of Invention

In order to overcome the defects in the prior art, the application aims to provide a feature extraction method and device based on an improved convolution block and a storage medium thereof, which can reduce the calculated amount in a VGG16 network in practical application and improve the speed of feature extraction on the premise of keeping the original feature extraction effect.

The application solves the problems by adopting the following technical scheme: in a first aspect, the present application provides a feature extraction method based on an improved convolution block, comprising the steps of:

the method comprises the steps of obtaining input image features of a pooling layer input to a compression convolution block, wherein the compression convolution block comprises a first convolution layer, a second convolution layer and a third convolution layer;

the method comprises the steps of obtaining convolution parameters of all convolution layers in the compressed convolution block, wherein the convolution parameters comprise a convolution kernel and channel numbers, the channel number of a first convolution layer is smaller than that of a pooling layer, and the sum of the channel numbers of a second convolution layer and a third convolution layer is equal to that of a pooling layer of a later layer;

the input image features are sent to a first convolution layer to carry out convolution, and output features of the first convolution layer are obtained;

and respectively inputting the output characteristics of the first convolution layer into a second convolution layer and a third convolution layer for convolution, splicing the acquired output characteristics of the second convolution layer and the third convolution layer, obtaining the output characteristics of the compressed convolution block, and transmitting the compressed convolution block to a later pooling layer.

Further, the channel number of the first convolution layer is 3/4 of the channel number of the previous pooling layer; the number of channels of the second convolution layer and the third convolution layer is equal and is 1/2 of the number of channels of the subsequent pooling layer.

Further, the convolution kernels of the first convolution layer and the second convolution layer are 1×1, and the convolution kernel of the third convolution layer is 3×3.

Further, the compressed convolution blocks are the 2 nd to 5 th convolution blocks in the VGG16 network.

Further, the convolution kernel of the 1 st convolution block in the VGG16 network is 3×3, and the number of channels is 1.

Further, the convolution kernel of the pooling layer is 2×2, and the channel number of the pooling layer is 2.

In a second aspect, the present application provides a feature extraction device based on an improved convolution block, comprising:

an input image feature obtaining unit, configured to obtain an input image feature of a pooled layer input to a compressed convolution block, where the compressed convolution block includes a first convolution layer, a second convolution layer, and a third convolution layer;

a convolution parameter obtaining unit, configured to obtain a convolution parameter of each convolution layer in the compressed convolution block, where the convolution parameter includes a convolution kernel and a channel number, the channel number of the first convolution layer is smaller than the channel number of the pooling layer, and a sum of channel numbers of the second convolution layer and the third convolution layer is equal to the channel number of the pooling layer of the later layer;

the first convolution layer output characteristic acquisition unit is used for transmitting the input image characteristic to a first convolution layer to carry out convolution to acquire the first convolution layer output characteristic;

the compressed convolution block output characteristic acquisition unit is used for respectively inputting the output characteristics of the first convolution layer into the second convolution layer and the third convolution layer to carry out convolution, splicing the acquired output characteristics of the second convolution layer and the third convolution layer, obtaining the compressed convolution block output characteristics and sending the compressed convolution block output characteristics to a later pooling layer.

In a third aspect, the present application provides an improved convolution block-based feature extraction device comprising at least one control processor and a memory for communication connection with the at least one control processor; the memory stores instructions executable by the at least one control processor to enable the at least one control processor to perform the improved convolution block-based feature extraction method described above.

In a fourth aspect, the present application provides a computer-readable storage medium storing computer-executable instructions for causing a computer to perform the improved convolution block-based feature extraction method described above.

In a fifth aspect, the present application also provides a computer program product comprising a computer program stored on a computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, cause the computer to perform the improved convolution block-based feature extraction method as described above.

One or more technical solutions provided in the embodiments of the present application have at least the following beneficial effects: the application adopts a feature extraction method, a device and a storage medium based on an improved convolution block. And setting a compression convolution block in the VGG16 network, after the compression convolution block receives the image features input by the previous pooling layer, compressing the channel number of the input image features through a first convolution layer of the compression convolution block to obtain the output features of the first convolution layer as the input of a second convolution layer and a third convolution layer, and splicing the output features of the second convolution layer and the output features of the third convolution layer to obtain the output features of the compression convolution block. Compared with the prior art, the convolution layer has the advantages that the channel number of the convolution layer is smaller than that of the pooling layer, the dimension of the input image features is compressed, the dimension of the feature map is effectively reduced, the calculated amount of network training is reduced, the time of feature map training is shortened, and the feature extraction efficiency is greatly improved.

Drawings

The application is further described below with reference to the drawings and examples.

FIG. 1 is a flow chart of a feature extraction method based on an improved convolution block according to a first embodiment of the present application;

FIG. 2 is a schematic view of feature extraction according to a feature extraction method based on an improved convolution block according to a first embodiment of the present application;

FIG. 3 is a complete step diagram of a feature extraction method based on an improved convolution block according to a first embodiment of the present application;

FIG. 4 is a schematic diagram of a feature extraction device based on an improved convolution block according to a second embodiment of the present application;

fig. 5 is a schematic structural diagram of a feature extraction device based on an improved convolution block according to a third embodiment of the present application.

Detailed Description

At present, with the wider application of image recognition technology, in order to accurately retrieve images from a mass database, the image features of the original images need to be extracted by using a convolutional neural network, and then the image features are quantized and encoded, and then a recognition result is output according to the similarity obtained by calculating the hamming distance. In the process of image extraction, the depth of the network determines the nonlinear capability of the network, and the nonlinear capability is critical to the extraction of image features. In order to obtain better nonlinear capability, most convolutional neural networks adopted in the prior art are VGG16 networks, the VGG16 networks can obtain better nonlinear capability, but the traditional VGG16 networks comprise a plurality of convolutional blocks, the number of layers of the convolutional layers in the convolutional blocks is large, and the characteristic dimension is large after the convolutional blocks are convolved for a plurality of times, so that more time is consumed for training, and therefore, better nonlinear capability cannot be obtained in a short time, and the image recognition efficiency is affected.

Based on the above, the application adopts a feature extraction method, a device and a storage medium based on improved convolution blocks. And setting a compression convolution block in the VGG16 network, after the compression convolution block receives the image features input by the previous pooling layer, compressing the channel number of the input image features through a first convolution layer of the compression convolution block to obtain the output features of the first convolution layer as the input of a second convolution layer and a third convolution layer, and splicing the output features of the second convolution layer and the output features of the third convolution layer to obtain the output features of the compression convolution block. Compared with the prior art, the convolution layer has the advantages that the channel number of the convolution layer is smaller than that of the pooling layer, the dimension of the input image features is compressed, the dimension of the feature map is effectively reduced, the calculated amount of network training is reduced, the time of feature map training is shortened, and the feature extraction efficiency is greatly improved.

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

It should be noted that, if not in conflict, the features of the embodiments of the present application may be combined with each other, which is within the protection scope of the present application. In addition, while functional block division is performed in a device diagram and logical order is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than the block division in the device, or in the flowchart.

Referring to fig. 1, the present application provides a feature extraction method based on an improved convolution block, comprising the steps of:

step S1, obtaining input image characteristics of a pooling layer input to a compressed convolution block, wherein the compressed convolution block comprises a first convolution layer, a second convolution layer and a third convolution layer;

s2, obtaining convolution parameters of all convolution layers in the compressed convolution block, wherein the convolution parameters comprise convolution kernels and channel numbers, the channel number of a first convolution layer is smaller than that of a pooling layer, and the sum of the channel numbers of a second convolution layer and a third convolution layer is equal to that of a pooling layer of a later layer;

step S3, the input image features are sent to a first convolution layer to carry out convolution, and output features of the first convolution layer are obtained;

and S4, respectively inputting the output characteristics of the first convolution layer into a second convolution layer and a third convolution layer for convolution, splicing the acquired output characteristics of the second convolution layer and the third convolution layer, obtaining the output characteristics of the compressed convolution block, and transmitting the compressed convolution block to a later pooling layer.

In this embodiment, the VGG16 network used may be any network trained in any form, and in this embodiment, a network trained by an ImageNet image database is preferably used. The huge data size of the ImageNet database can enable the VGG16 network to be more universal. The super parameters of the VGG16 network preferred in this embodiment are set as follows: the number of input images is 256, the momentum is 0.9, the weight attenuation coefficient is 0.0005, the full-connection layer dropout proportion is 0.5, and the initial learning rate is 0.01.

In this embodiment, the compressed convolution block may have any number of layers, and preferably includes 3 layers, where the number of channels of the first convolution layer is less than the number of channels of the pooling layer. Because the parameter quantity of each layer in the VGG16 network is equal to the channel quantity of the previous layer multiplied by the convolution kernel size and then multiplied by the channel quantity of the next layer, the channel quantity of the first convolution layer is smaller than the channel quantity of the pooling layer, so that the channel quantity of the input image characteristic can be reduced, and the calculated quantity is effectively reduced.

Referring to fig. 2, in this embodiment, the second convolution layer output feature and the third convolution layer output feature are spliced to form a dimension of the output feature. For example, in this embodiment, the number of channels of the first convolution layer is E1, the number of channels of the second convolution layer is E2, and the number of channels of the third convolution layer is E3, and an input image feature T of h×w×m is input, where h×w is the resolution, and M is the dimension of the feature map. The first convolution layer output characteristic X1 obtained after passing through the first convolution layer is h×w×e1, the second convolution layer output characteristic X2 is h×w×e2, the third convolution layer output characteristic X3 is h×w×e3, and the compressed convolution layer output characteristic X is h×w× (e2+e3), thereby realizing the number of compressed channels without changing the resolution of the input image characteristic.

Further, in another embodiment provided by the present application, the number of channels of the first convolution layer is 3/4 of the number of channels of the previous pooling layer; the number of channels of the second convolution layer and the third convolution layer is equal and is 1/2 of the number of channels of the subsequent pooling layer.

In this embodiment, the number of channels of the first convolution layer may be any proportion of the number of channels of the previous pooling layer, where the proportion is smaller than 1, and in this embodiment, 3/4 is preferred, so as to implement preliminary compression on the number of channels of the input image feature.

In this embodiment, the number of channels of the second convolution layer and the third convolution layer may be any number, and the sum is equal to the number of channels of the subsequent pooling layer. In this embodiment, 1/2 of the pooling layer is preferable, so that the output features of the second convolution layer and the third convolution layer have the same dimension, and the obtained image features have better universality.

Further, in another embodiment provided by the present application, the convolution kernels of the first convolution layer and the second convolution layer are 1×1, and the convolution kernel of the third convolution layer is 3×3.

In this embodiment, the convolution kernel may be any parameter, and in this embodiment, it is preferable that the convolution kernels of the first convolution layer and the second convolution layer be 1×1, and the convolution kernel of the third convolution layer be 3×3. The first convolution layer is used for primarily compressing the input image features, and the convolution kernel of the first convolution layer is 1×1 because the reduction of the calculation amount is mainly realized by compressing the channel number. The convolution kernels of the second convolution layer and the third convolution layer are respectively 1 multiplied by 1 and 3 multiplied by 3, which is beneficial to enabling the output characteristics of the spliced compressed convolution blocks to obtain different receptive fields, and the calculated amount of the convolution kernels of the second convolution layer and the third convolution layer is smaller, so that the quality of the characteristics can be improved while the calculated amount is reduced.

Further, in another embodiment provided by the present application, the compressed convolution block is the 2 nd to 5 th convolution block in the VGG16 network.

In this embodiment, the compressed convolution block may be any convolution block in the VGG16 network, and in this embodiment, it is preferable to start from the second layer, because the first layer network after the VGG16 network inputs the original RGB image is the convolution block, and the original image is not yet extracted with features, so that no compression is required, and the network structure can be more reasonable by using the compressed convolution block from the second layer.

Further, in another embodiment of the present application, the convolution kernel of the 1 st convolution block in the VGG16 network is 3×3, and the number of channels is 1.

Further, in another embodiment provided by the present application, the convolution kernel of the pooling layer is 2×2, and the number of channels of the pooling layer is 2.

Referring to fig. 3, in addition, another embodiment of the present application further provides a feature extraction method based on an improved convolution block, including the steps of:

step S3100, inputting an RGB original image into a VGG16 network, respectively obtaining input image characteristics through a 1 st convolution block and a first pooling layer in the VGG16 network, and inputting the input image characteristics into a first compression convolution block;

step S3200, obtaining convolution parameters of the compressed convolution block, wherein the convolution parameters comprise convolution kernels and channel numbers, the convolution kernels of the first convolution layer are 1 multiplied by 1, the channel numbers are 3/4 of the channel numbers of the former pooling layer, the convolution kernels of the second convolution layer are 1 multiplied by 1, the channel numbers are 1/2 of the channel numbers of the latter pooling layer, the convolution kernels of the third convolution layer are 3 multiplied by 3, and the channel numbers are 1/2 of the channel numbers of the latter pooling layer;

step S3300, inputting the input image features to a first convolution layer, and obtaining output of the first convolution layer;

step S3400, the output of the first convolution layer is simultaneously input into a second convolution layer and a third convolution layer, and the output characteristics of the second convolution layer and the output characteristics of the third convolution layer are obtained;

step S3500, the second convolution layer output characteristics and the third convolution layer output characteristics are spliced to obtain compressed block output characteristics.

Step S3600, the output features of the compressed block are sent to the pooling layer, the output image features of the pooling layer are used as the input image features of the convolution block of the next layer, and step S3200 is repeatedly executed until the next layer of the pooling layer is a fully connected layer.

In this embodiment, a compression convolution block is set in the VGG16 network, and after the compression convolution block receives the image feature input by the previous pooling layer, the number of channels of the input image feature is compressed to 3/4 by the first convolution layer of the compression convolution block, so that the calculation amount is reduced; and the output characteristics of the first convolution layer are obtained and used as the input of the second convolution layer and the third convolution layer, and then the output characteristics of the second convolution layer and the output characteristics of the third convolution layer are spliced to obtain the output characteristics of the compressed convolution block, so that the channel number of the output characteristics of the compressed convolution block is the same as that of the next pooling layer, and the applicability of the network is maintained. Compared with the prior art, the convolution layer has the advantages that the channel number of the convolution layer is smaller than that of the pooling layer, the dimension of the input image features is compressed, the dimension of the feature map is effectively reduced, the calculated amount of network training is reduced, the time of feature map training is shortened, and the feature extraction efficiency is greatly improved.

Referring to fig. 4, a second embodiment of the present application further provides a feature extraction device based on an improved convolution block, where the feature extraction device 1000 based on an improved convolution block includes, but is not limited to: an input image feature acquisition unit 1100, a convolution parameter acquisition unit 1200, a first convolution layer output feature acquisition unit 1300, and a compressed convolution block output feature acquisition unit 1400.

The input image feature obtaining unit 1100 is configured to obtain an input image feature of the pooled layer input to a compressed convolution block, where the compressed convolution block includes a first convolution layer, a second convolution layer, and a third convolution layer;

the convolution parameter obtaining unit 1200 is configured to obtain a convolution parameter of each convolution layer in the compressed convolution block, where the convolution parameter includes a convolution kernel and a channel number, the channel number of the first convolution layer is smaller than the channel number of the pooling layer, and a sum of the channel numbers of the second convolution layer and the third convolution layer is equal to the channel number of the pooling layer of the subsequent layer;

the first convolution layer output feature obtaining unit 1300 is configured to send the input image feature to a first convolution layer for convolution, to obtain a first convolution layer output feature;

the compressed convolution block output feature obtaining unit 1400 is configured to input the first convolution layer output feature to the second convolution layer and the third convolution layer to perform convolution, splice the obtained second convolution layer output feature and the obtained third convolution layer output feature, obtain a compressed convolution block output feature, and send the compressed convolution block output feature to a later pooling layer.

It should be noted that, since the feature extraction device based on the improved convolution block in the present embodiment and the feature extraction method based on the improved convolution block described above are based on the same inventive concept, the corresponding content in the method embodiment is also applicable to the embodiment of the present device, and will not be described in detail herein.

Referring to fig. 5, a third embodiment of the present application further provides a feature extraction device based on an improved convolution block, where the feature extraction device 6000 based on the improved convolution block may be any type of intelligent terminal, such as a mobile phone, a tablet computer, a personal computer, etc.

Specifically, the modified convolution block-based feature extraction device 6000 includes: one or more control processors 6001 and memory 6002, one control processor 6001 being illustrated in fig. 5.

The control processor 6001 and memory 6002 may be connected by a bus or otherwise, for example in fig. 5.

The memory 6002 serves as a non-transitory computer readable storage medium, and is operable to store a non-transitory software program, a non-transitory computer-executable program, and modules, such as program instructions/modules corresponding to the improved convolution block-based feature extraction device in the embodiment of the present application, for example, the input image feature acquisition unit 1100 and the convolution parameter acquisition unit 1200 shown in fig. 5. The control processor 6001 executes various functional applications and data processing of the modified-convolution-block-based feature extraction device 1000 by running non-transitory software programs, instructions, and modules stored in the memory 6002, that is, implements the modified-convolution-block-based feature extraction method of the above-described method embodiments.

The memory 6002 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created according to the use of the improved convolution block-based feature extraction device 1000, or the like. In addition, memory 6002 may include high speed random access memory, and may include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some implementations, memory 6002 optionally includes memory located remotely from control processor 6001, which may be connected to the modified convolution block-based feature extraction device 6000 through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The one or more modules are stored in the memory 6002, which when executed by the one or more control processors 6001, perform the improved convolution block-based feature extraction method of the method embodiments described above, e.g., perform method steps S1-S4 of fig. 1 described above, to implement the functions of the apparatus 1100-1400 of fig. 4.

Embodiments of the present application also provide a computer-readable storage medium storing computer-executable instructions for execution by one or more control processors, e.g., one of control processors 6001 in fig. 5, which cause the one or more control processors 6001 to perform the improved convolutional block-based feature extraction method of the method embodiments described above, e.g., to perform method steps S1 through S4 of fig. 1 described above, to implement the functions of apparatus 1100-1400 of fig. 4.

The device embodiments described above are merely illustrative, in that the devices illustrated as separate components may or may not be physically separate, i.e., may be located in one place, or may be distributed over multiple network devices. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

From the above description of embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented in software plus a general purpose hardware platform. Those skilled in the art will appreciate that all or part of the processes implementing the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, may include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read Only Memory (ROM), a random access Memory (Random Access Memory, RAM), or the like.

While the preferred embodiment of the present application has been described in detail, the present application is not limited to the above embodiment, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the present application, and these equivalent modifications and substitutions are intended to be included in the scope of the present application as defined in the appended claims.

Claims

1. A feature extraction method based on an improved convolution block, comprising the steps of: the method comprises the steps of obtaining input image features of a pooling layer input to a compression convolution block, wherein the compression convolution block comprises a first convolution layer, a second convolution layer and a third convolution layer;

the output characteristics of the first convolution layer are respectively input into a second convolution layer and a third convolution layer to carry out convolution, the acquired output characteristics of the second convolution layer and the acquired output characteristics of the third convolution layer are spliced to obtain output characteristics of a compressed convolution block, and the compressed convolution block is sent to a later pooling layer;

wherein, the channel number of the first convolution layer is 3/4 of the channel number of the previous pooling layer; the channel numbers of the second convolution layer and the third convolution layer are equal and are 1/2 of the channel number of the subsequent pooling layer;

wherein the convolution kernels of the first convolution layer and the second convolution layer are 1×1, and the convolution kernel of the third convolution layer is 3×3;

the compressed convolution blocks are the 2 nd to 5 th convolution blocks in the VGG16 network.

2. A method of feature extraction based on modified convolution blocks as claimed in claim 1, characterised in that: the convolution kernel of the 1 st convolution block in the VGG16 network is 3 multiplied by 3, and the channel number is 1.

3. A method of feature extraction based on modified convolution blocks as claimed in claim 1, characterised in that: the convolution kernel of the pooling layer is 2×2, and the channel number of the pooling layer is 2.

4. A feature extraction apparatus based on an improved convolution block, comprising: an input image feature obtaining unit, configured to obtain an input image feature of a pooled layer input to a compressed convolution block, where the compressed convolution block includes a first convolution layer, a second convolution layer, and a third convolution layer; a convolution parameter obtaining unit, configured to obtain a convolution parameter of each convolution layer in the compressed convolution block, where the convolution parameter includes a convolution kernel and a channel number, the channel number of the first convolution layer is smaller than the channel number of the pooling layer, and a sum of channel numbers of the second convolution layer and the third convolution layer is equal to the channel number of the pooling layer of the later layer;

the compressed convolution block output characteristic acquisition unit is used for respectively inputting the output characteristics of the first convolution layer into a second convolution layer and a third convolution layer to carry out convolution, splicing the acquired output characteristics of the second convolution layer and the third convolution layer to obtain the compressed convolution block output characteristics and transmitting the compressed convolution block output characteristics into a later pooling layer;

5. A feature extraction apparatus based on an improved convolution block, characterized by: comprising at least one control processor and a memory for communication connection with the at least one control processor; the memory stores instructions executable by the at least one control processor to enable the at least one control processor to perform an improved convolutional block-based feature extraction method as recited in any one of claims 1-3.

6. A computer-readable storage medium, characterized by: the computer-readable storage medium stores computer-executable instructions for causing a computer to perform a modified convolution block-based feature extraction method according to any one of claims 1 to 3.