CN117456268A - Image multi-label classification method, device, equipment and storage medium - Google Patents

Image multi-label classification method, device, equipment and storage medium Download PDF

Info

Publication number
CN117456268A
CN117456268A CN202311544865.XA CN202311544865A CN117456268A CN 117456268 A CN117456268 A CN 117456268A CN 202311544865 A CN202311544865 A CN 202311544865A CN 117456268 A CN117456268 A CN 117456268A
Authority
CN
China
Prior art keywords
network
layer
attention
image
point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311544865.XA
Other languages
Chinese (zh)
Inventor
天尧
张力文
谢志强
金子杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyi Shilian Technology Co ltd
Original Assignee
Tianyi Digital Life Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianyi Digital Life Technology Co Ltd filed Critical Tianyi Digital Life Technology Co Ltd
Priority to CN202311544865.XA priority Critical patent/CN117456268A/en
Publication of CN117456268A publication Critical patent/CN117456268A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Abstract

The application discloses a method, a device, equipment and a storage medium for classifying multiple labels of images, wherein the method comprises the following steps: inputting the target image into a preset attention EfficientNet-B0 network; after a weight parameter matrix is obtained by forced orthogonalization before back propagation, carrying out feature extraction operation on an input target image according to the weight parameter matrix through a point-by-point convolution layer and a full connection layer to obtain a target feature vector; and carrying out predictive analysis on the target feature vector through an output layer of a preset attention Efficient Net-B0 network to obtain a multi-classification result comprising a plurality of predictive labels. The receptive field network layer and forced orthogonalization processing based on the non-local attention response mechanism can improve the accuracy and the prediction efficiency of the prediction result. The method and the device can solve the technical problems that the existing neural network for classifying the image multi-label has larger redundant information and large calculated amount, so that the model efficiency is poor and the accuracy is low.

Description

Image multi-label classification method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of computer vision, and in particular, to a method, an apparatus, a device, and a storage medium for classifying multiple labels of an image.
Background
The problem of image multi-label classification is one of the key problems in the field of computer vision. The method mainly aims at accurately distinguishing the images of different categories by utilizing semantic information of the images, so that the minimized classification error is realized. The task requirement is to assign one or more appropriate labels to a given target image; the input of the algorithm or model is an image, and the output is a plurality of category labels corresponding to the image.
The existing image multi-label classification has higher difficulty compared with single-label classification, so the accuracy is lower; network parameters and feature channels in the deep learning-based image multi-label classification technology are high in redundancy, so that an intermediate feature layer is easy to fail, and the effects of the feature layer and the network parameters are difficult to play; in addition, the neural network increases the complexity of the model by increasing the convolutional layer to increase the receptive field, resulting in a larger calculation amount and a longer calculation time, thereby reducing the prediction efficiency.
Disclosure of Invention
The application provides an image multi-label classification method, device, equipment and storage medium, which are used for solving the technical problems that the existing neural network for classifying the image multi-label has larger redundant information, large calculation amount and poor model efficiency and lower accuracy.
In view of this, a first aspect of the present application provides an image multi-label classification method, including:
inputting a target image into a preset attention EfficientNet-B0 network, wherein the preset attention EfficientNet-B0 network constructs a global receptive field network layer based on a non-local attention response mechanism;
after a weight parameter matrix is obtained by forced orthogonalization before back propagation, carrying out feature extraction operation on the input target image through a point-by-point convolution layer and a full connection layer according to the weight parameter matrix to obtain a target feature vector;
and carrying out predictive analysis on the target feature vector through an output layer of the preset attention Efficient Net-B0 network to obtain a multi-classification result, wherein the multi-classification result comprises a plurality of predictive labels.
Preferably, the inputting the target image into a preset attention effect net-B0 network, the preset attention effect net-B0 network constructs a global receptive field network layer based on a non-local attention response mechanism, and the method further comprises the following steps:
constructing a basic Efficient Net-B0 network based on an inverted bottleneck residual block of the MobileNet V2;
and adding a non-local attention response mechanism layer in a plurality of deep network layers of the basic Efficient Net-B0 network to generate a preset attention Efficient Net-B0 network.
Preferably, the inputting the target image into a preset attention effect net-B0 network, the preset attention effect net-B0 network constructs a global receptive field network layer based on a non-local attention response mechanism, and the method further comprises the following steps:
and carrying out optimization pre-training on the preset attention EfficientNet-B0 network through a preset training data set to obtain optimization model parameters.
Preferably, after the step of performing forced orthogonalization to obtain a weight parameter matrix before back propagation, performing feature extraction operation on the input target image according to the weight parameter matrix by using a point-by-point convolution layer and a full connection layer to obtain a target feature vector, including:
before back propagation, performing singular value decomposition on initial parameter matrixes of a point-by-point convolution layer and a full connection layer through a singular value decomposition algorithm to obtain a decomposition matrix;
performing forced orthogonal calculation based on the decomposition matrix to obtain a weight parameter matrix;
and performing feature extraction operation on the input target image according to the weight parameter matrix through a point-by-point convolution layer and a full connection layer to obtain a target feature vector.
A second aspect of the present application provides an image multi-label classification device, comprising:
the image input unit is used for inputting the target image into a preset attention EfficientNet-B0 network, and the preset attention EfficientNet-B0 network constructs a global receptive field network layer based on a non-local attention response mechanism;
the convolution optimization unit is used for carrying out forced orthogonalization before back propagation to obtain a weight parameter matrix, and then carrying out feature extraction operation on the input target image through a point-by-point convolution layer and a full connection layer according to the weight parameter matrix to obtain a target feature vector;
the classification prediction unit is used for performing prediction analysis on the target feature vector through the output layer of the preset attention EfficientNet-B0 network to obtain a multi-classification result, wherein the multi-classification result comprises a plurality of prediction labels.
Preferably, the method further comprises:
the model building unit is used for building a basic EfficientNet-B0 network based on the inverted bottleneck residual block of the MobileNet V2;
and the global optimization unit is used for adding a non-local attention response mechanism layer into a plurality of deep network layers of the basic Efficient Net-B0 network to generate a preset attention Efficient Net-B0 network.
Preferably, the method further comprises:
the pre-training unit is used for carrying out optimization pre-training on the preset attention EfficientNet-B0 network through the preset training data set to obtain optimized model parameters.
Preferably, the convolution optimization unit is specifically configured to:
before back propagation, performing singular value decomposition on initial parameter matrixes of a point-by-point convolution layer and a full connection layer through a singular value decomposition algorithm to obtain a decomposition matrix;
performing forced orthogonal calculation based on the decomposition matrix to obtain a weight parameter matrix;
and performing feature extraction operation on the input target image according to the weight parameter matrix through a point-by-point convolution layer and a full connection layer to obtain a target feature vector.
A third aspect of the present application provides an image multi-label classification device, the device comprising a processor and a memory;
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to execute the image multi-label classification method according to the first aspect according to instructions in the program code.
A fourth aspect of the present application provides a computer readable storage medium for storing program code for performing the image multi-label classification method of the first aspect.
From the above technical solutions, the embodiments of the present application have the following advantages:
in the present application, a method for classifying multiple labels of an image is provided, including: inputting a target image into a preset attention EfficientNet-B0 network, and constructing a global receptive field network layer by the preset attention EfficientNet-B0 network based on a non-local attention response mechanism; after a weight parameter matrix is obtained by forced orthogonalization before back propagation, carrying out feature extraction operation on an input target image according to the weight parameter matrix through a point-by-point convolution layer and a full connection layer to obtain a target feature vector; and carrying out predictive analysis on the target feature vector through an output layer of a preset attention Efficient Net-B0 network to obtain a multi-classification result, wherein the multi-classification result comprises a plurality of predictive labels.
According to the image multi-label classification method, the global receptive field network layer is built in the Efficient Net-B0 network architecture through the non-local attention response mechanism, the purpose of expanding receptive fields can be achieved without adding an additional convolution layer, the complexity of a model can be reduced to a certain extent, and the calculated amount is reduced; in addition, the optimization processing is carried out on the model parameter matrix through forced orthogonalization, so that the wireless correlation between output features can be ensured, the redundancy of feature channels is reduced, the high efficiency of a feature layer is improved, and the accuracy of a prediction result can be improved; in addition, regularization operations subject the parameters to regularization constraints, which can also reduce over-fitting of model training. Therefore, the method and the device can solve the technical problems that the existing neural network for classifying the image multi-label has larger redundant information and large calculation amount, so that the model efficiency is poor and the accuracy is low.
Drawings
Fig. 1 is a flow chart of an image multi-label classification method according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of an image multi-label classification device according to an embodiment of the present application;
fig. 3 is a schematic diagram of an afflicientnet-B0 network structure of an inverted bottleneck residual block based on mobilenet v2 according to an embodiment of the present application;
FIG. 4 is a schematic diagram of an Efficient Net-B0 network connection of an inverted bottleneck residual block based on MobileNet V2 according to an embodiment of the present application
FIG. 5 is a schematic diagram of a global receptive field network structure based on a non-local attention response mechanism provided in embodiments of the application;
fig. 6 is a schematic diagram of calculation and analysis of an input vector by a network layer based on a weight parameter matrix according to an embodiment of the present application;
FIG. 7 is a schematic diagram of an Efficient Net-B0 network structure with orthogonalization of fusion weight parameters according to an embodiment of the present application;
fig. 8 is a schematic diagram of performing graph multi-label classification on a preset attention effect net-B0 network according to an embodiment of the present application.
Detailed Description
In order to make the present application solution better understood by those skilled in the art, the following description will clearly and completely describe the technical solution in the embodiments of the present application with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
For ease of understanding, referring to fig. 1, an embodiment of an image multi-label classification method provided in the present application includes:
and step 101, inputting the target image into a preset attention EfficientNet-B0 network, and constructing a global receptive field network layer by the preset attention EfficientNet-B0 network based on a non-local attention response mechanism.
Further, step 101, before further includes:
constructing a basic Efficient Net-B0 network based on an inverted bottleneck residual block of the MobileNet V2;
and adding a non-local attention response mechanism layer in a plurality of deep network layers of the basic EfficientNet-B0 network to generate a preset attention EfficientNet-B0 network.
Further, step 101, before further includes:
and carrying out optimization pre-training on the preset attention EfficientNet-B0 network through a preset training data set to obtain optimization model parameters.
It should be noted that referring to fig. 3 and 4, the afflicientnet infrastructure of the preset attention afflicientnet-B0 network of the present embodiment is a convolutional neural network architecture and scaling method, which uses a composite coefficient to scale all dimensions of depth/width/resolution uniformly. Furthermore, on the basis of the network of EfficientNet, a basic Eff ictientNet-B0 network is generated from the inverted bottleneck residual block, extrusion and excitation block of MobileNet V2, which can migrate well over numerous data sets and achieve the most advanced accuracy, with one order of magnitude less parameters.
In order to increase the receptive field of the model without increasing the network convolution layer, in this embodiment, a global receptive field network layer based on a non-local attention response mechanism is added to a plurality of deep network layers of the basic efficience entNet-B0 network, and the specific structure is shown in fig. 5, and the number and the insertion position of the global receptive field network layer can be changed according to the actual situation, which is not limited herein. For example, global receptive field network layers based on non-local attention response mechanisms may be added at layers 6, 12 and 18, respectively, of the underlying Efficient Net-B0 network. Capturing remote dependency in an image by introducing a global attention mechanism, and calculating the response of a certain position as a weighted sum of all position characteristics by non-local operation of a network layer; the receptive field of the whole graph can be obtained through only one network level, so that the characterization capability of the network is expanded.
It can be understood that the preset training data set may be a standard training data base obtained by preprocessing and then sorting some historical graphic data, and for the preset training data set used in the pre-training, the image single-label classification and the multi-label classification may be mixed, or all the preset training data set may be the image multi-label classification, which is not specifically limited, does not affect the pre-training, and specific processes are not described herein. The obtained optimized model parameters can be used for updating a preset attention Efficient Net-B0 network, so that the preset attention Effi entNet-B0 network can analyze the target image more efficiently.
Step 102, after a weight parameter matrix is obtained by forced orthogonalization before back propagation, feature extraction operation is carried out on an input target image according to the weight parameter matrix through a point-by-point convolution layer and a full connection layer, and a target feature vector is obtained.
Further, step 102 includes:
before back propagation, performing singular value decomposition on initial parameter matrixes of a point-by-point convolution layer and a full connection layer through a singular value decomposition algorithm to obtain a decomposition matrix;
forced orthogonal calculation is carried out based on the decomposition matrix, so that a weight parameter matrix is obtained;
and performing feature extraction operation on the input target image according to the weight parameter matrix through the point-by-point convolution layer and the full connection layer to obtain a target feature vector.
It should be noted that, in this embodiment, before back propagation, singular value decomposition is performed on initial parameter matrices of a point-by-point convolution layer and a full connection layer, if the initial parameter matrix is W and elements therein all belong to a real number complex domain, then singular value decomposition may be performed:
W=U×∑×V *
wherein U and V * Are unitary matrices, and sigma is a diagonal matrix; these decomposed matrices may be referred to as decomposed matrices; then, forced orthogonal calculation is carried out based on the decomposition matrix, so as to obtain a weight parameter matrix W':
W'=UV *
the obtained weight parameter matrix W' can be applied to the convolution layer for feature extraction, or the parameter matrices in the point-by-point convolution layer and the full connection layer can be subjected to forced orthogonalization constraint processing by adopting the decomposition calculation method; after which back propagation is performed.
Further, the point-by-point convolution layer and the full connection layer of the present embodiment are linear layers connecting the input channel and the output channel. Therefore, when feature extraction analysis is performed based on the weight parameter matrix, it is equivalent to performing linear transformation on the input vector, i.e., y=w' ×x, where x and y are the input vector and the output vector, respectively, and x is the matrix multiplication, specifically refer to fig. 6. The training process of the preset attention afflicientnet-B0 network based on feature extraction is shown in fig. 7.
After the parameter matrixes in the point-by-point convolution layer and the full connection layer in the embodiment are forcedly converted into orthogonal matrixes, the three decomposition matrixes are orthogonal to each other, so that no linear correlation exists between channels of the corresponding network layer, the obtained feature vector y is not overlapped, the diversity of the output channel expression is increased, and the feature vector variety is enriched. And the L2 norm of the orthogonal matrix is 1, the L2 norm of the vector remains unchanged after the weight parameter matrix is calculated, i.e., |x| 2 =|y| 2 The method comprises the steps of carrying out a first treatment on the surface of the In the back propagation training stage, the norms of the vectors can be kept unchanged to ensure that the gradients are kept consistent in the back propagation process, so that gradient dissipation or gradient explosion is effectively avoided. In addition, the forced regularization operation may also act as a regularization constraint on parameters in the network layer, in turn reducing overfitting.
And 103, performing predictive analysis on the target feature vector through an output layer of a preset attention Efficient Net-B0 network to obtain a multi-classification result, wherein the multi-classification result comprises a plurality of predictive labels.
Referring to fig. 8, after feature extraction is performed on a target image in an input preset attention afflicientnet-B0 network through a deep convolutional neural network, a plurality of prediction labels can be given through output layer prediction, for example, after label prediction is performed on the input target image in fig. 8, a plurality of labels including dogs, grasslands and flowers can be obtained, so that multi-label classification of the image is realized.
According to the image multi-label classification method provided by the embodiment of the application, the global receptive field network layer is built in the Efficient Net-B0 network architecture through the non-local attention response mechanism, the purpose of expanding receptive fields can be achieved without adding an additional convolution layer, the complexity of a model can be reduced to a certain extent, and the calculated amount is reduced; in addition, the optimization processing is carried out on the model parameter matrix through forced orthogonalization, so that the wireless correlation between output features can be ensured, the redundancy of feature channels is reduced, the high efficiency of a feature layer is improved, and the accuracy of a prediction result can be improved; in addition, regularization operations subject the parameters to regularization constraints, which can also reduce over-fitting of model training. Therefore, the embodiment of the application can solve the technical problems that the existing neural network for classifying the image multi-label has larger redundant information and large calculation amount, so that the model efficiency is poor and the accuracy is low.
For ease of understanding, referring to fig. 2, the present application provides an embodiment of an image multi-label classification apparatus, including:
an image input unit 201, configured to input a target image into a preset attention afflicientnet-B0 network, where the preset attention afflicientnet-B0 network constructs a global receptive field network layer based on a non-local attention response mechanism;
the convolution optimization unit 202 is configured to perform a feature extraction operation on an input target image according to the weight parameter matrix through the point-by-point convolution layer and the full connection layer after performing forced orthogonalization to obtain the weight parameter matrix before back propagation, so as to obtain a target feature vector;
the classification prediction unit 203 is configured to perform prediction analysis on the target feature vector through an output layer of the preset attention effective net-B0 network, so as to obtain a multi-classification result, where the multi-classification result includes a plurality of prediction labels.
Further, the method further comprises the following steps:
a model building unit 204, configured to build a basic afflicientnet-B0 network based on the inverted bottleneck residual block of mobilenet v 2;
the global optimization unit 205 is configured to add a non-local attention response mechanism layer to a plurality of deep network layers of the basic afflicientnet-B0 network, and generate a preset attention afflicientnet-B0 network.
Further, the method further comprises the following steps:
the pre-training unit 206 is configured to perform optimization pre-training on the preset attention efficiency net-B0 network through the preset training data set, so as to obtain optimization model parameters.
Further, the convolution optimization unit 202 is specifically configured to:
before back propagation, performing singular value decomposition on initial parameter matrixes of a point-by-point convolution layer and a full connection layer through a singular value decomposition algorithm to obtain a decomposition matrix;
forced orthogonal calculation is carried out based on the decomposition matrix, so that a weight parameter matrix is obtained;
and performing feature extraction operation on the input target image according to the weight parameter matrix through the point-by-point convolution layer and the full connection layer to obtain a target feature vector.
The application also provides image multi-label classification equipment, which comprises a processor and a memory;
the memory is used for storing the program codes and transmitting the program codes to the processor;
the processor is configured to execute the image multi-label classification method in the method embodiment according to the instructions in the program code.
The application also provides a computer readable storage medium for storing program code for executing the image multi-label classification method in the above method embodiment.
In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions to execute all or part of the steps of the methods described in the embodiments of the present application by a computer device (which may be a personal computer, a server, or a network device, etc.). And the aforementioned storage medium includes: u disk, mobile hard disk, read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk, etc.
The above embodiments are merely for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims (10)

1. A method for classifying multiple labels of an image, comprising:
inputting a target image into a preset attention EfficientNet-B0 network, wherein the preset attention EfficientNet-B0 network constructs a global receptive field network layer based on a non-local attention response mechanism;
after a weight parameter matrix is obtained by forced orthogonalization before back propagation, carrying out feature extraction operation on the input target image through a point-by-point convolution layer and a full connection layer according to the weight parameter matrix to obtain a target feature vector;
and carrying out predictive analysis on the target feature vector through an output layer of the preset attention Efficient Net-B0 network to obtain a multi-classification result, wherein the multi-classification result comprises a plurality of predictive labels.
2. The method of claim 1, wherein the inputting the target image into a preset attention effect net-B0 network, the preset attention effect net-B0 network constructing a global receptive field network layer based on a non-local attention response mechanism, further comprising:
constructing a basic Efficient Net-B0 network based on an inverted bottleneck residual block of the MobileNet V2;
and adding a non-local attention response mechanism layer in a plurality of deep network layers of the basic Efficient Net-B0 network to generate a preset attention Efficient Net-B0 network.
3. The method of claim 1, wherein the inputting the target image into a preset attention effect net-B0 network, the preset attention effect net-B0 network constructing a global receptive field network layer based on a non-local attention response mechanism, further comprising:
and carrying out optimization pre-training on the preset attention EfficientNet-B0 network through a preset training data set to obtain optimization model parameters.
4. The method for classifying multiple labels of image according to claim 1, wherein after the step of performing forced orthogonalization to obtain a weight parameter matrix before back propagation, performing feature extraction operation on the input target image according to the weight parameter matrix by a point-by-point convolution layer and a full connection layer to obtain a target feature vector, comprising:
before back propagation, performing singular value decomposition on initial parameter matrixes of a point-by-point convolution layer and a full connection layer through a singular value decomposition algorithm to obtain a decomposition matrix;
performing forced orthogonal calculation based on the decomposition matrix to obtain a weight parameter matrix;
and performing feature extraction operation on the input target image according to the weight parameter matrix through a point-by-point convolution layer and a full connection layer to obtain a target feature vector.
5. An image multi-label classification device, comprising:
the image input unit is used for inputting the target image into a preset attention EfficientNet-B0 network, and the preset attention EfficientNet-B0 network constructs a global receptive field network layer based on a non-local attention response mechanism;
the convolution optimization unit is used for carrying out forced orthogonalization before back propagation to obtain a weight parameter matrix, and then carrying out feature extraction operation on the input target image through a point-by-point convolution layer and a full connection layer according to the weight parameter matrix to obtain a target feature vector;
the classification prediction unit is used for performing prediction analysis on the target feature vector through the output layer of the preset attention EfficientNet-B0 network to obtain a multi-classification result, wherein the multi-classification result comprises a plurality of prediction labels.
6. The image multi-label classification apparatus according to claim 5, further comprising:
the model building unit is used for building a basic EfficientNet-B0 network based on the inverted bottleneck residual block of the MobileNet V2;
and the global optimization unit is used for adding a non-local attention response mechanism layer into a plurality of deep network layers of the basic Efficient Net-B0 network to generate a preset attention Efficient Net-B0 network.
7. The image multi-label classification apparatus according to claim 5, further comprising:
the pre-training unit is used for carrying out optimization pre-training on the preset attention EfficientNet-B0 network through the preset training data set to obtain optimized model parameters.
8. The image multi-label classification apparatus according to claim 5, wherein the convolution optimization unit is specifically configured to:
before back propagation, performing singular value decomposition on initial parameter matrixes of a point-by-point convolution layer and a full connection layer through a singular value decomposition algorithm to obtain a decomposition matrix;
performing forced orthogonal calculation based on the decomposition matrix to obtain a weight parameter matrix;
and performing feature extraction operation on the input target image according to the weight parameter matrix through a point-by-point convolution layer and a full connection layer to obtain a target feature vector.
9. An image multi-label classification device, characterized in that the device comprises a processor and a memory;
the memory is used for storing program codes and transmitting the program codes to the processor;
the processor is configured to perform the image multi-label classification method of any one of claims 1-4 according to instructions in the program code.
10. A computer readable storage medium for storing program code for performing the image multi-label classification method of any one of claims 1-4.
CN202311544865.XA 2023-11-17 2023-11-17 Image multi-label classification method, device, equipment and storage medium Pending CN117456268A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311544865.XA CN117456268A (en) 2023-11-17 2023-11-17 Image multi-label classification method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311544865.XA CN117456268A (en) 2023-11-17 2023-11-17 Image multi-label classification method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117456268A true CN117456268A (en) 2024-01-26

Family

ID=89596696

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311544865.XA Pending CN117456268A (en) 2023-11-17 2023-11-17 Image multi-label classification method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117456268A (en)

Similar Documents

Publication Publication Date Title
CN111898696B (en) Pseudo tag and tag prediction model generation method, device, medium and equipment
CN111079532B (en) Video content description method based on text self-encoder
CN111755078A (en) Drug molecule attribute determination method, device and storage medium
CN107526725A (en) The method and apparatus for generating text based on artificial intelligence
CN109492772A (en) The method and apparatus for generating information
CN115244587A (en) Efficient ground truth annotation
KR20230004710A (en) Processing of images using self-attention based neural networks
Nguyen et al. Revisiting sliced Wasserstein on images: From vectorization to convolution
CN114021696A (en) Conditional axial transform layer for high fidelity image transformation
Huai et al. Zerobn: Learning compact neural networks for latency-critical edge systems
CN114742210A (en) Hybrid neural network training method, traffic flow prediction method, apparatus, and medium
CN113222159B (en) Quantum state determination method and device
CN117036006A (en) User portrait generation method and device, storage medium and electronic equipment
CN111723186A (en) Knowledge graph generation method based on artificial intelligence for dialog system and electronic equipment
CN116977714A (en) Image classification method, apparatus, device, storage medium, and program product
CN115222998B (en) Image classification method
CN117456268A (en) Image multi-label classification method, device, equipment and storage medium
CN113806747B (en) Trojan horse picture detection method and system and computer readable storage medium
CN112132269B (en) Model processing method, device, equipment and storage medium
CN114399708A (en) Video motion migration deep learning system and method
EP3893159A1 (en) Training a convolutional neural network
CN115906768B (en) Enterprise informatization data compliance assessment method, system and readable storage medium
CN112418292B (en) Image quality evaluation method, device, computer equipment and storage medium
CN117172232B (en) Audit report generation method, audit report generation device, audit report generation equipment and audit report storage medium
CN113505838B (en) Image clustering method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20240313

Address after: Unit 1, Building 1, China Telecom Zhejiang Innovation Park, No. 8 Xiqin Street, Wuchang Street, Yuhang District, Hangzhou City, Zhejiang Province, 311100

Applicant after: Tianyi Shilian Technology Co.,Ltd.

Country or region after: China

Address before: Room 1423, No. 1256 and 1258, Wanrong Road, Jing'an District, Shanghai 200072

Applicant before: Tianyi Digital Life Technology Co.,Ltd.

Country or region before: China

TA01 Transfer of patent application right