CN112686320B

CN112686320B - Image classification method, device, computer equipment and storage medium

Info

Publication number: CN112686320B
Application number: CN202011638500.XA
Authority: CN
Inventors: 王东; 程骏; 张惊涛; 胡淑萍; 郭渺辰; 顾在旺; 刘业鹏
Original assignee: Ubtech Robotics Corp
Current assignee: Ubtech Robotics Corp
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2023-10-13
Anticipated expiration: 2040-12-31
Also published as: CN112686320A

Abstract

The application discloses an image classification method, which comprises the following steps: acquiring a target image to be classified; taking a target image as input of an image classification model, and obtaining a classification result output by the image classification model, wherein the image classification model comprises a convolution layer, the convolution layer comprises a dynamic convolution module, and the dynamic convolution module comprises: a convolution kernel weight factor generator and a convolution kernel generator; the convolution kernel weight factor generator is used for generating N convolution kernel weight factors according to the input feature graph; the convolution kernel generator is used for generating N convolution kernels according to the input characteristic diagram, wherein N is a positive integer; the dynamic convolution module is also used for carrying out nonlinear aggregation according to the N convolution kernel weight factors and the N convolution kernels to obtain dynamic convolution kernels; the convolution layer processes the input feature map based on the dynamic convolution check to obtain an output feature map. The image classification method greatly improves the classification accuracy. In addition, an image classification device, a computer device and a storage medium are also provided.

Description

Image classification method, device, computer equipment and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to an image classification method, an image classification device, a computer device, and a storage medium.

Background

The convolution kernels of the traditional convolution neural network for solving the image classification task are static, namely, the parameter values of the convolution kernels cannot be changed along with the change of the input image characteristics in the reasoning stage. The method for improving the classification accuracy of the static convolutional neural network is generally to increase the Width (Width) and Depth (Depth) of the network, however, the heavyweight network obtained by increasing the Width and Depth of the network is unfavorable for deployment on a mobile terminal and an embedded device.

The dynamic convolution neural network applies attention mechanisms (Squeeze and Excitation) to each set of convolution kernels based on the feature map by setting a plurality of sets of convolution kernels, and endows each set of convolution kernels with different weight factors, so that the data dependence of the convolution kernels is endowed, and finally, the dynamic convolution is realized through linear weighted aggregation of the convolution kernels.

However, the conventional method of weighting convolution kernels in a dynamic convolution neural network (such as CondConv) is that only the weight factors are related to the input feature map, so that the accuracy of image classification is limited, and therefore, the accuracy of image classification still needs to be further improved.

Disclosure of Invention

In view of the foregoing, it is necessary to provide an image classification method, apparatus, computer device, and storage medium that can improve the accuracy of image classification.

An image classification method, comprising:

acquiring a target image to be classified;

taking the target image as input of an image classification model, and obtaining a classification result output by the image classification model, wherein the image classification model comprises a convolution layer, the convolution layer comprises a dynamic convolution module, and the dynamic convolution module comprises: a convolution kernel generator and a convolution kernel weight factor generator; the convolution kernel generator is used for generating N convolution kernels according to the input feature map, the convolution kernel weight factor generator is used for generating N convolution kernel weight factors according to the input feature map, and N is a positive integer; the dynamic convolution module is further used for carrying out nonlinear aggregation according to the N convolution kernels and the N convolution kernel weight factors to obtain dynamic convolution kernels; and the convolution layer is used for processing the input feature map based on the dynamic convolution check to obtain an output feature map.

An image classification apparatus comprising:

the acquisition module is used for acquiring target images to be classified;

the classification module is used for taking the target image as the input of an image classification model, obtaining a classification result output by the image classification model, wherein the image classification model comprises a convolution layer, the convolution layer comprises a dynamic convolution module, and the dynamic convolution module comprises: a convolution kernel weight factor generator and a convolution kernel generator; the convolution kernel weight factor generator is used for generating N convolution kernel weight factors according to the input feature graph; the convolution kernel generator is used for generating N convolution kernels according to the input feature map, wherein N is a positive integer; the dynamic convolution module is further used for carrying out nonlinear aggregation according to the N convolution kernel weight factors and the N convolution kernels to obtain dynamic convolution kernels; and the convolution layer is used for processing the input feature map based on the dynamic convolution check to obtain an output feature map.

A computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of:

acquiring a target image to be classified;

A computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of:

acquiring a target image to be classified;

The image classification method, the device, the computer equipment and the storage medium comprise a dynamic convolution module in a convolution layer of an image classification model, wherein the dynamic convolution module comprises a convolution kernel generator and a convolution kernel weight factor generator, the convolution kernel generator is used for generating N convolution kernels according to an input feature image, the convolution kernel weight factors are used for generating N convolution kernel weight factors according to the input feature image, then the dynamic convolution kernels are obtained through calculation based on the N convolution kernels and the N convolution kernel weight factors, and then the convolution layer processes the input feature image according to a target convolution layer to obtain an output feature image. The dynamic convolution module is embedded in the image classification model, so that the accuracy of data classification can be greatly improved under the condition of increasing a small amount of calculation amount.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Wherein:

FIG. 1 is a flow diagram of a method of image classification in one embodiment;

FIG. 2 is a flow diagram of a method of generating convolution weight factors in one embodiment;

FIG. 3 is a flow diagram of a method of generating a convolution kernel in one embodiment;

FIG. 4 is a schematic diagram of the structure of a dynamic convolution module in one embodiment;

FIG. 5 is a block diagram of an image classification apparatus in one embodiment;

FIG. 6 is an internal block diagram of a computer device in one embodiment.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

As shown in fig. 1, an image classification method is proposed, which can be applied to a terminal or a server, and this embodiment is exemplified as being applied to a terminal. The image classification method specifically comprises the following steps:

step 102, obtaining a target image to be classified.

The target image may be an image obtained by direct shooting, or may be a stored image obtained from an album.

104, taking the target image as input of an image classification model, and obtaining a classification result output by the image classification model, wherein the image classification model comprises a convolution layer which comprises a dynamic convolution module, and the dynamic convolution module comprises: a convolution kernel weight factor generator and a convolution kernel generator; the convolution kernel weight factor generator is used for generating N convolution kernel weight factors according to the input feature graph; the convolution kernel generator is used for generating N convolution kernels according to the input feature map, wherein N is a positive integer; the dynamic convolution module is also used for carrying out nonlinear aggregation according to the N convolution kernel weight factors and the N convolution kernels to obtain dynamic convolution kernels; the convolution layer processes the input feature map based on the dynamic convolution check to obtain an output feature map.

The dynamic convolution module is a dynamic convolution network embedded in the convolution layer and is used for dynamically generating a convolution kernel and a weight factor of the convolution kernel according to the input feature map. Specifically, the dynamic convolution module includes: the convolution kernel generator and the convolution kernel weight factor generator are realized by adopting a network structure. The generated convolution kernels and convolution kernel weight factors are associated with the input feature map, i.e. the parameter values in the convolution kernels and convolution kernel weight factors have a dependency on the input feature map. The N convolution kernels and the N convolution kernel weight factors are in one-to-one correspondence, i.e. one convolution kernel corresponds to one convolution kernel weight factor. After N convolution kernel weight factors and N convolution kernels are obtained, calculating according to the N convolution kernel weight factors and the N convolution kernels to obtain a dynamic convolution kernel.

In one embodiment, the calculation of the dynamic convolution kernel is as follows:wherein alpha is _l Represents the first convolution kernel weight factor, W _l Represents the first convolution kernel, α _l And W is _l Are functions of the input profile G. And finally, carrying out standard convolution on the dynamic convolution kernel and the input feature map to obtain an output feature map.

In one embodiment, the image classification method can be applied to image classification in ImageNet. ImageNet is a computer vision system that recognizes item names, and is the database that is the largest in image recognition in the world at present.

In the image classification method, a dynamic convolution module is included in a convolution layer of an image classification model, the dynamic convolution module includes a convolution kernel generator and a convolution kernel weight factor generator, the convolution kernel generator is used for generating N convolution kernels according to an input feature image, the convolution kernel weight factors are used for generating N convolution kernel weight factors according to the input feature image, then the dynamic convolution kernels are obtained through calculation based on the N convolution kernels and the N convolution kernel weight factors, and then the convolution layer processes the input feature image according to a target convolution layer to obtain an output feature image. The dynamic convolution module is embedded in the image classification model, so that the accuracy of data classification can be greatly improved under the condition of increasing a small amount of calculation amount.

In one embodiment, the N convolution kernel weight factors are determined from a channel dimension of the input feature map, and the N convolution kernels are determined from a spatial dimension of the input feature map.

The input feature map is divided into a channel dimension and a space dimension. For example, the input feature map is m×m×c, where m×m is a feature of a space dimension range, C is a feature of a channel dimension range, that is, a convolution kernel weight factor is calculated by applying an attention mechanism to a channel dimension of the input feature map, and a convolution kernel is obtained by applying a 1*1 convolution calculation to a space dimension of the input feature map.

As shown in fig. 2, in one embodiment, the convolution kernel weight factor generator is configured to generate N convolution kernel weight factors from an input feature map, including:

step 202, carrying out global average pooling processing on the input feature map to obtain a one-dimensional vector.

The input feature map is composed of a plurality of channel feature maps, for example, the input feature map is k×k×c, and then the input feature map is actually composed of C channel feature maps k×k. The global average pooling is to process the whole channel feature map and output a value correspondingly, namely, one channel feature map corresponds to one value after global average pooling. After global average pooling processing is carried out on the input feature map, only C values (C channels) are correspondingly output, and the C values form a one-dimensional vector with the length of C. Only the information of the channel is reserved after the global averaging pooling process, so the generation of the convolution kernel weight factors is only related to the channel dimension of the input feature map.

In step 204, two full connection layers are used to convert the one-dimensional vector into a one-dimensional vector of length N.

Each node of the full-connection layer is connected with all nodes of the upper layer and is used for integrating the features extracted from the front edge. The function of the full connection layer is: the previously extracted image features are synthesized and then classified.

Assuming that the two full-connection layers are a first full-connection layer and a second full-connection layer respectively, taking a one-dimensional vector as input of the first full-connection layer, obtaining output of the first full-connection layer, taking output of the first full-connection layer as input of the second full-connection layer, and finally outputting the one-dimensional vector with the length of N by the second full-connection layer. It should be noted that, the number of inputs and the number of outputs of the full connection layer may be preset, and the second full connection layer sets the output to N in advance, that is, divides the convolution kernel weight factors into N classes in advance, and outputs the weight of each convolution kernel.

And 206, carrying out normalization processing on the one-dimensional vector with the length of N to obtain N convolution kernel weight factors.

Wherein, in order to convert the convolution kernel weight factor of the output to [0-1 ]]And ensuring the sum of a plurality of convolution kernel weight factors to be 1, and carrying out normalization processing on the one-dimensional vector with the length of N after obtaining the one-dimensional vector with the length of N, for example, carrying out normalization processing by adopting a Sigmoid function. The normalized convolution kernel weight factor satisfies

In the above embodiment, the convolution kernel weight factor generator generates the convolution kernel weight factor according to the channel dimension in the input feature map, that is, the convolution kernel weight factor depends on the input feature map, which is beneficial to extracting the input feature map in a targeted manner, thereby being beneficial to improving the accuracy of feature extraction.

As shown in fig. 3, in one embodiment, the convolution kernel generator is configured to generate N convolution kernels from an input feature map, including:

step 302, performing adaptive average pooling conversion on the input feature map into a three-dimensional tensor.

The adaptive average pooling means that the size of the output characteristic diagram after pooling is only required to be given, the adaptive average pooling can automatically carry out pooling operation according to the size of the given output characteristic diagram, and the number of channels after adaptive pooling is unchanged before and after pooling. For example, assuming that the input feature map is 1248×1248×100 and the output feature map is 512×512, 512×512×100 is obtained after adaptive averaging pooling.

A three-dimensional tensor is understood to include three dimensions, length, width and height, for example 512 x 100 is a three-dimensional tensor. If the size of the output feature map is set to k×k in advance, the resulting three-dimensional tensor is denoted by k×k×c, where C represents the number of channels.

Since adaptive averaging pooling only works on the image space dimension, it can be appreciated that the generation of convolution kernels is only related to the space dimension of the input feature map.

In step 304, the three-dimensional tensor is converted into a three-dimensional tensor comprising N convolution kernels using two 1*1 convolutions.

The number of 1*1 convolutions determines the number of channels to be output, for example, if the number of channels to be output is 8, 8 convolutions of 1*1 are used to check the image features to extract the number of channels of the feature map to obtain 8, so that the image can be reduced in dimension or increased in dimension by 1*1 convolutions. Specifically, the resulting three-dimensional tensor is converted to a three-dimensional tensor containing a preset number of three-dimensional tensors using the first 1*1 convolution. For example, assuming that the three-dimensional tensor of the input is kxk C1, the first 1*1 convolution process obtains kxk (n×n), and the second 1*1 convolution process outputs kxk (n×c 2)), where C1 represents the number of input channels, C2 represents the number of output channels, and N represents N sets of vectors, i.e., N sets of vectors kxk×c1×c2.

Step 306, dividing the three-dimensional tensor containing the N convolution kernels into N parts to obtain N convolution kernels.

The three-dimensional tensor obtained finally is divided into N parts, and N convolution kernels are obtained correspondingly. The three-dimensional tensor comprising N convolution kernels is denoted as k x k (N x (C1 x C2)) and divided into N parts, each part being denoted as k x k (C1 x C2).

In the embodiment, the convolution kernel generator generates the convolution kernel according to the input feature map, that is, the convolution kernel depends on the input feature map, and feature extraction is performed according to the generated convolution kernel, so that the input feature map is extracted in a targeted manner, and the accuracy of feature extraction is improved.

In one embodiment, the N convolution kernels are the same size; the dynamic convolution module is further configured to perform nonlinear aggregation according to the N convolution kernel weight factors and the N convolution kernels to obtain a dynamic convolution kernel, and includes: the dynamic convolution module is further configured to perform nonlinear weighted average according to the N convolution kernel weight factors and the N convolution kernels to obtain the dynamic convolution kernels.

The size of the generated N convolution kernels is the same, and the parameters in the convolution kernels are different. And carrying out nonlinear weighted average calculation according to the convolution kernel weight factors corresponding to each convolution kernel to obtain a dynamic convolution kernel. The nonlinear weighted average is because both the convolution kernel and the convolution kernel weight factor are functions of the input feature map, and thus both are weighted as belonging to nonlinear aggregation.

In one embodiment, the image classification model is a lightweight neural network, and the dynamic convolution module is embedded in a convolution layer in the lightweight neural network.

The image classification model adopts a lightweight neural network model, such as MobileNet V1-V3. The dynamic convolution module is embedded in the convolution layer to replace part of the convolution kernels or all of the convolution kernels in the conventional convolution layer. The dynamic convolution module can dynamically generate a plurality of convolution kernels matched with the input feature images and weight factors corresponding to the convolution kernels according to the features of the input feature images, so that more accurate image features can be extracted, and the accuracy of image classification is improved.

As shown in fig. 4, the internal structure of the dynamic convolution module is schematically shown, and the dynamic convolution module is internally divided into 3 branches, namely a convolution kernel weight factor generator, a convolution kernel generator and another dynamic convolution kernel for convolution with the generated dynamic convolution kernel. Wherein the convolution kernel weight generator (Route Function Generator, RFG) comprises 4 steps: global situationAverage pooling (Global Average Pooling), full connectivity Layer 1 (S-fusion-Connected Layer), full connectivity Layer 2 (E-fusion-Connected Layer), and normalization (Sigmoid function). The convolution kernel generator (Convolution Kernel Generator, CKG) also includes 3 steps: adaptive pooling (Adaptive Average Pooling, 1*1 convolution layers, 1*1 convolution layers). Then, the convolution kernel weight factor generator outputs N convolution kernel weight factors, the convolution kernel generator outputs N convolution kernels, the N convolution kernel weight factors and the N convolution kernels are subjected to weighted average to obtain a dynamic convolution kernel, and the calculation formula of the dynamic convolution kernel isAnd then, carrying out convolution processing on the input feature map G by using the dynamic convolution check to obtain an output feature map.

As shown in fig. 5, in one embodiment, an image classification apparatus is provided, including:

an acquisition module 502, configured to acquire a target image to be classified;

the classification module 504 is configured to take the target image as an input of an image classification model, and obtain a classification result output by the image classification model, where the image classification model includes a convolution layer, and the convolution layer includes a dynamic convolution module, where the dynamic convolution module includes: a convolution kernel weight factor generator and a convolution kernel generator; the convolution kernel weight factor generator is used for generating N convolution kernel weight factors according to the input feature graph; the convolution kernel generator is used for generating N convolution kernels according to the input feature map, wherein N is a positive integer; the dynamic convolution module is further used for carrying out nonlinear aggregation according to the N convolution kernel weight factors and the N convolution kernels to obtain dynamic convolution kernels; and the convolution layer is used for processing the input feature map based on the dynamic convolution check to obtain an output feature map.

In one embodiment, the convolution kernel weight factor generator is further configured to perform global average pooling processing on the input feature map to obtain a one-dimensional vector; converting the one-dimensional vector into a one-dimensional vector with the length of N by adopting two full connection layers; and carrying out normalization processing on the one-dimensional vector with the length of N to obtain N convolution kernel weight factors.

In one embodiment, the convolution kernel generator is further configured to adaptively average pooling the input feature map into a three-dimensional tensor; converting the three-dimensional tensor into a three-dimensional tensor comprising N convolution kernels using two 1*1 convolutions; dividing the three-dimensional tensor containing N convolution kernels into N parts to obtain N convolution kernels.

In one embodiment, the N convolution kernels are the same size; the dynamic convolution module is further configured to perform nonlinear weighted average according to the N convolution kernel weight factors and the N convolution kernels to obtain the dynamic convolution kernels.

In one embodiment, the image classification model has a plurality of convolution layers; the classification module is further used for taking the target image as an input feature image of a first-layer convolution layer, the first-layer convolution layer calculates to obtain a dynamic convolution kernel according to the input feature image and the dynamic convolution module, and the input feature image is checked based on the dynamic convolution kernel to obtain an output feature image; the output characteristic diagram is used as an input characteristic diagram of a next layer of convolution layer, and the output characteristic diagram is pushed in the same way until an output characteristic diagram of the tail layer of convolution layer is obtained; and determining the classification result of the target image according to the output characteristic diagram output by the tail layer convolution layer.

FIG. 6 illustrates an internal block diagram of a computer device in one embodiment. As shown in fig. 6, the computer device includes a processor, a memory, a camera, and a network interface connected by a system bus. The memory includes a nonvolatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system, and may also store a computer program that, when executed by a processor, causes the processor to implement the image classification method described above. The internal memory may also store a computer program which, when executed by the processor, causes the processor to perform the image classification method described above. It will be appreciated by those skilled in the art that the structure shown in FIG. 6 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.

In one embodiment, a computer device is presented comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of: acquiring a target image to be classified; taking the target image as input of an image classification model, and obtaining a classification result output by the image classification model, wherein the image classification model comprises a convolution layer, the convolution layer comprises a dynamic convolution module, and the dynamic convolution module comprises: a convolution kernel weight factor generator and a convolution kernel generator; the convolution kernel weight factor generator is used for generating N convolution kernel weight factors according to the input feature graph; the convolution kernel generator is used for generating N convolution kernels according to the input feature map, wherein N is a positive integer; the dynamic convolution module is further used for carrying out nonlinear aggregation according to the N convolution kernel weight factors and the N convolution kernels to obtain dynamic convolution kernels; and the convolution layer is used for processing the input feature map based on the dynamic convolution check to obtain an output feature map.

In one embodiment, the convolution kernel weight factor generator is configured to generate N convolution kernel weight factors from an input feature map of an input, including: carrying out global average pooling treatment on the input feature map to obtain a one-dimensional vector; converting the one-dimensional vector into a one-dimensional vector with the length of N by adopting two full connection layers; and carrying out normalization processing on the one-dimensional vector with the length of N to obtain N convolution kernel weight factors.

In one embodiment, the convolution kernel generator is configured to generate N convolution kernels according to an input feature map of an input, including: performing self-adaptive average pooling conversion on the input feature map to obtain a three-dimensional tensor; converting the three-dimensional tensor into a three-dimensional tensor comprising N convolution kernels using two 1*1 convolutions; dividing the three-dimensional tensor containing N convolution kernels into N parts to obtain N convolution kernels.

In one embodiment, the image classification model has a plurality of convolution layers; taking the target image as the input of an image classification model, and acquiring a classification result output by the image classification model, wherein the method comprises the following steps: the target image is used as an input feature image of a first-layer convolution layer, the first-layer convolution layer calculates to obtain a dynamic convolution kernel according to the input feature image and the dynamic convolution module, and the input feature image is checked based on the dynamic convolution to be processed to obtain an output feature image; the output characteristic diagram is used as an input characteristic diagram of a next layer of convolution layer, and the output characteristic diagram is pushed in the same way until an output characteristic diagram of the tail layer of convolution layer is obtained; and determining the classification result of the target image according to the output characteristic diagram output by the tail layer convolution layer.

In one embodiment, a computer-readable storage medium is provided, storing a computer program which, when executed by a processor, causes the processor to perform the steps of: acquiring a target image to be classified; taking the target image as input of an image classification model, and obtaining a classification result output by the image classification model, wherein the image classification model comprises a convolution layer, the convolution layer comprises a dynamic convolution module, and the dynamic convolution module comprises: a convolution kernel weight factor generator and a convolution kernel generator; the convolution kernel weight factor generator is used for generating N convolution kernel weight factors according to the input feature graph; the convolution kernel generator is used for generating N convolution kernels according to the input feature map, wherein N is a positive integer; the dynamic convolution module is further used for carrying out nonlinear aggregation according to the N convolution kernel weight factors and the N convolution kernels to obtain dynamic convolution kernels; and the convolution layer is used for processing the input feature map based on the dynamic convolution check to obtain an output feature map.

Those skilled in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a non-volatile computer readable storage medium, and where the program, when executed, may include processes in the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims

1. An image classification method, comprising:

acquiring a target image to be classified;

taking the target image as input of an image classification model, and obtaining a classification result output by the image classification model, wherein the image classification model comprises a convolution layer, the convolution layer comprises a dynamic convolution module, and the dynamic convolution module comprises: a convolution kernel weight factor generator and a convolution kernel generator; the convolution kernel weight factor generator is used for generating N convolution kernel weight factors according to the input feature graph; the convolution kernel generator is used for generating N convolution kernels according to the input feature map, wherein N is a positive integer;

the dynamic convolution module is further used for carrying out nonlinear aggregation according to the N convolution kernel weight factors and the N convolution kernels to obtain dynamic convolution kernels;

the convolution layer checks the input feature map based on the dynamic convolution to process the input feature map to obtain an output feature map;

the convolution kernel generator is configured to generate N convolution kernels according to an input feature map, and includes:

performing self-adaptive average pooling conversion on the input feature map to obtain a three-dimensional tensor;

converting the three-dimensional tensor into a three-dimensional tensor comprising N convolution kernels using two 1*1 convolutions;

dividing the three-dimensional tensor containing N convolution kernels into N parts to obtain N convolution kernels.

2. The method of claim 1, wherein the N convolution kernel weight factors are determined from a channel dimension of the input feature map and the N convolution kernels are determined from a spatial dimension of the input feature map.

3. The method of claim 1, wherein the convolution kernel weight factor generator is configured to generate N convolution kernel weight factors from the input signature, comprising:

carrying out global average pooling treatment on the input feature map to obtain a one-dimensional vector;

converting the one-dimensional vector into a one-dimensional vector with the length of N by adopting two full connection layers;

and carrying out normalization processing on the one-dimensional vector with the length of N to obtain N convolution kernel weight factors.

4. The method of claim 1, wherein the N convolution kernels are the same size;

the dynamic convolution module is further configured to perform nonlinear aggregation according to the N convolution kernel weight factors and the N convolution kernels to obtain a dynamic convolution kernel, and includes:

the dynamic convolution module is further configured to perform nonlinear weighted average according to the N convolution kernel weight factors and the N convolution kernels to obtain the dynamic convolution kernels.

5. The method of claim 1, wherein the image classification model is a lightweight neural network, and the dynamic convolution module is embedded in a convolution layer in the lightweight neural network.

6. The method of claim 1, wherein there are a plurality of convolution layers in the image classification model;

taking the target image as the input of an image classification model, and acquiring a classification result output by the image classification model, wherein the method comprises the following steps:

the target image is used as an input feature image of a first-layer convolution layer, the first-layer convolution layer calculates to obtain a dynamic convolution kernel according to the input feature image and the dynamic convolution module, and the input feature image is checked based on the dynamic convolution to be processed to obtain an output feature image;

the output characteristic diagram is used as an input characteristic diagram of a next layer of convolution layer, and the output characteristic diagram is pushed in the same way until an output characteristic diagram of the tail layer of convolution layer is obtained;

and determining the classification result of the target image according to the output characteristic diagram output by the tail layer convolution layer.

7. An image classification apparatus, comprising:

the acquisition module is used for acquiring target images to be classified;

the classification module is used for taking the target image as the input of an image classification model, obtaining a classification result output by the image classification model, wherein the image classification model comprises a convolution layer, the convolution layer comprises a dynamic convolution module, and the dynamic convolution module comprises: a convolution kernel weight factor generator and a convolution kernel generator; the convolution kernel weight factor generator is used for generating N convolution kernel weight factors according to the input feature graph; the convolution kernel generator is used for generating N convolution kernels according to the input feature map, wherein N is a positive integer; the dynamic convolution module is further used for carrying out nonlinear aggregation according to the N convolution kernel weight factors and the N convolution kernels to obtain dynamic convolution kernels; the convolution layer checks the input feature map based on the dynamic convolution to process the input feature map to obtain an output feature map;

8. A computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the image classification method according to any one of claims 1 to 6.

9. A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the image classification method of any of claims 1 to 6.