CN114037857B - Image classification precision improving method - Google Patents
Image classification precision improving method Download PDFInfo
- Publication number
- CN114037857B CN114037857B CN202111229240.5A CN202111229240A CN114037857B CN 114037857 B CN114037857 B CN 114037857B CN 202111229240 A CN202111229240 A CN 202111229240A CN 114037857 B CN114037857 B CN 114037857B
- Authority
- CN
- China
- Prior art keywords
- network
- instance
- gating
- pruning
- space
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Abstract
The invention discloses an image classification precision improving method which is characterized by comprising the following steps: acquiring an image dataset; setting a convolutional neural network; performing dynamic network pruning on the convolutional neural network by adopting a characteristic-gating coupling method to obtain an optimized network; inputting the images to be classified into an optimization network, and classifying the images by utilizing the optimization network, wherein the dynamic network pruning method comprises the following steps: step 1, obtaining a characteristic space and a gating space in a dynamic pruning network; step 2, obtaining an example neighborhood relationship in a feature space; step 3, aligning the example neighborhood relationship between the gating space and the feature space; step 4, acquiring a total target loss function of the dynamic pruning network; and 5, updating the network parameters. The characteristic-gating coupling method suitable for dynamic network pruning disclosed by the invention can greatly reduce the distortion of gating characteristics and obviously improve the performance of a dynamic pruning network.
Description
Technical Field
The invention relates to an image classification precision improving method, and belongs to the technical field of image classification.
Background
In order to achieve higher image classification accuracy, Convolutional Neural Networks (CNNs) are designed to be larger and deeper, but the computational cost is also greatly increased.
In order to reduce the calculation cost, researchers have proposed a network pruning method and widely used the method, which can remove network parameters that contribute less to the classification accuracy. By the method, the calculation cost can be reduced as much as possible, and the representation capability of the original network of the pruned network is kept, so that the classification precision is reduced to the minimum, and a compact network model is obtained.
Existing network pruning methods can be roughly divided into two categories: static pruning and dynamic pruning, wherein in the static channel pruning method, a static simplified model is obtained by deleting a characteristic channel which has small contribution to the overall performance; the dynamic pruning method obtains a sub-network related to the input picture instance to reduce the computational cost at runtime.
In the conventional dynamic pruning method, an attached gating module is used to generate a binary mask at a channel level, i.e., a gating vector, which is used to indicate deletion or reservation of a channel. The gating module explores instance-level redundancy according to feature variations of different inputs, i.e., channels identifying particular features can be adaptively opened or closed for different input instances.
However, existing network pruning methods typically ignore the consistency between features and gating distributions. For example, when pairs of instances have similar characteristics but are gated differently, since the gating characteristics are generated by channel multiplication of the eigenvector and the gating vector, the disparity of the two distributions may cause distortion in the gating characteristic space, which may introduce noise instances into the similar pairs of instances or separate them, thereby reducing the representation ability of the pruned network.
Therefore, there is a need for further research on gated coupling in order to solve the above technical problems.
Disclosure of Invention
In order to overcome the above problems, the present inventors have conducted intensive studies to design an image classification accuracy improving method, including:
acquiring an image dataset;
setting a convolutional neural network;
performing dynamic network pruning on the convolutional neural network by adopting a characteristic-gating coupling method to obtain an optimized network;
inputting the images to be classified into an optimization network, classifying the images by utilizing the optimization network,
the feature-gated coupling method comprises the following steps:
step 1, obtaining a characteristic space and a gating space in a dynamic pruning network;
step 2, obtaining an example neighborhood relationship in a feature space;
step 3, aligning the example neighborhood relationship between the gating space and the feature space;
step 4, acquiring a total target loss function of the dynamic pruning network;
and 5, updating the network parameters.
Preferably, step 2 comprises the following sub-steps:
Step 22, pooling the vectors according to the instancesObtaining a similarity matrix for an instance
Step 23, similarity matrix by exampleAnd determining the nearest neighbor instance of the instance, and taking the set of the nearest neighbor instance sequence numbers as an automatic supervision signal.
Preferably, in step 22, the vectors are pooled by measuring different instancesThe similarity between different instances is obtained by a dot product.
Preferably, in step 23, the method comprisesThe ith row of (a) is sorted, the column sequence numbers of the k largest elements in the sequence are obtained, and the example set corresponding to the sequence numbers is used as the self-supervision signal of the example i.
Preferably, in step 3, the positive examples and the discrete negative examples are gathered by the contrast loss function, so as to realize the alignment of the example neighborhood relationship between the gating space and the feature space.
Preferably, step 3 comprises the following sub-steps:
step 31, acquiring the probability that the instance j is the positive instance of the instance i;
and step 32, acquiring a contrast loss function according to the positive example probability, minimizing the contrast loss function, and realizing the alignment of the example neighborhood relationship of the gating space and the feature space.
Preferably, in step 31, the probability that the input instance is identified as a positive instance of instance i in the gated space is:
wherein, l represents the l-th network,example i gating probability of output after passing through the gating module,is the gating probability of the output of example j after passing through the gating module, τ is the temperature hyperparameter.
Preferably, in step 4, the total objective loss function loss is:
wherein the content of the first and second substances,is a function of the cross-entropy loss,the contrast loss function of the l-th layer is represented,l0 norm loss function for layer L; eta and rho are coefficients, and omega is a set of network layer sequence numbers.
The present invention also provides an electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform one of the methods described above.
The invention also provides a computer-readable storage medium having stored thereon computer instructions for causing the computer to perform one of the methods according to the above.
The invention has the advantages that:
(1) feature distributions and corresponding gating distributions in the dynamic pruning network can be aligned;
(2) feature-gating coupling is achieved by iteratively performing neighborhood relationship exploration and feature-gating alignment, and distortion of gating features can be greatly reduced;
(3) the performance of the dynamic pruning network can be significantly improved.
Drawings
Fig. 1 shows a schematic flow chart of a feature-gated coupling method for dynamic network pruning according to a preferred embodiment of the present invention.
Detailed Description
The invention is explained in more detail below with reference to the figures and examples. The features and advantages of the present invention will become more apparent from the description.
The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
In order to solve the problem that the traditional dynamic network pruning method may have distortion of the gating characteristics, in the invention, the distortion of the gating characteristics is reduced to the maximum extent through the distribution of the alignment characteristics and the gating.
The image classification precision improving method provided by the invention comprises the following steps:
acquiring an image dataset;
setting a convolutional neural network;
performing dynamic network pruning on the convolutional neural network by adopting a characteristic-gating coupling method to obtain an optimized network;
inputting the images to be classified into an optimization network, classifying the images by utilizing the optimization network,
in the present invention, the image data set is used for training a neural network, and the method for acquiring the image data set is not particularly limited in the present invention, and may be any open image classification training set, or an image data set designed by a person skilled in the art according to actual needs.
In the present invention, the specific structure of the convolutional neural network is not limited, and those skilled in the art can select an appropriate convolutional neural network structure according to actual needs.
The characteristic-gated coupling method, as shown in fig. 1, includes the following steps:
step 1, obtaining a characteristic space and a gating space in a dynamic pruning network;
step 2, obtaining an example neighborhood relationship in a feature space;
step 3, aligning the example neighborhood relationship between the gating space and the feature space;
step 4, acquiring a total target loss function of the dynamic pruning network;
and 5, updating the network parameters.
In step 1, mapping the image instance input into the convolutional neural network to a feature space and a gating space according to a method in a dynamic network pruning BAS, which can be referred to in the article for the description of the dynamic network pruning BAS: e hteshami Bejnordi, t.blankevorort, m.welling, Batch-profiling for spare controlled channel networks, in: proceedings of International Conference on Learning retrieval (ICLR),2020.
The feature space contains example features of the current convolutional layer, according to the present invention, the dynamic network prune can be set at any convolutional layer of the convolutional neural network, and when the dynamic network prune is set at the ith convolutional layer of the convolutional neural network, the example features are expressed as:
where i denotes the different instances and l denotes the layers of the convolutional neural network;
is the input of the first convolutional layer of the convolutional neural network,is the output of the first convolution layer, C l Is the number of output channels, w, of the first convolution layer l Convolution weight matrix representing the first convolution layer, represents convolution operator, and ReLU represents activation function.
The gating space comprises gating vectors of the current convolutional layer, and further, when the dynamic network pruning is arranged on the l convolutional layer of the convolutional neural network, the gating vectors and the example features are multiplied by each other at channel level to obtain gating features, which are expressed as:
wherein G denotes a gating block, which indicates channel level multiplication,indicating gating characteristics.
In a preferred embodiment, the gating module G may be represented as:
g (), binconstret (Linear (p ())) (iii)
Wherein p represents a global average pooling layer for generating spatial descriptors of the input features; linear () represents two fully connected layers for generating gating probabilities; binconcoret is an activation function.
Further, the gating probability corresponding to the ith convolutional layer of the convolutional neural network is expressed as:
the gating vector corresponding to the first convolution layer of the convolution neural network is as follows:
further, according to the first, third, and sixth equations, the gating characteristic corresponding to the first convolutional layer of the convolutional neural network can also be expressed as:
in step 2, since the example neighborhood relationships are different in feature spaces of different semantic levels, for example, in low-level feature spaces, examples with similar colors or textures may be closer, while in high-level feature spaces, examples in the same class may be clustered together, so manual annotation cannot provide adaptive supervision of the example neighborhood relationships across different network stages, and how to obtain the example neighborhood relationships in the feature spaces is a difficult point of the present invention.
The invention provides a method for adaptively acquiring an example field relation in each feature space, which can be used in any layer of a network to acquire an automatic supervision signal for feature-gating distribution alignment.
Specifically, the method comprises the following substeps:
For the first convolutional layer of the convolutional neural network provided with dynamic network pruning, the characteristics of the ith instance in the layer are divided by using a global average pooling layerPooling into vectorsPooled vectorsCalled an example pooling vector, and the dimension of the feature is changed from C through pooling l ×W l ×H l Reduction to C l X 1, improving the efficiency and effect of subsequent treatment.
Step 22, pooling the vectors according to the instancesObtaining a similarity matrix for an instance
Similarity matrix of the exampleFor characterizing the similarity between different instances in the layer l, and further, by measuring the pooling vectors of different instancesThe similarity between different instances characterizes the similarity between different instances, preferably the metric is performed by dot product, and the similarity between different instances can be expressed as:
wherein i, j represent different instances,the similarity between the ith and jth examples in the ith convolutional layer of the convolutional neural network is shown, and T represents transposition.
Furthermore, the similarity of all the examples in the first convolution layer is combined into a matrix, which is the similarity matrixThe elements of the matrix areN represents the total number of instances used for training.
Step 23, similarity matrix by exampleAnd determining the nearest neighbor instance of the instance, and taking the set of the nearest neighbor instance sequence numbers as an automatic supervision signal.
To pairThe ith row of the gate module is sequenced to obtain column sequence numbers of k maximum elements, the corresponding examples of the column sequence numbers are nearest neighbor examples of the example i, the set of the nearest neighbor example sequence numbers are self-supervision signals of the example i, the self-supervision signals are used for regulating the gate module, and the sequence numbers are expressed as:
wherein the content of the first and second substances,denotes the self-supervision signal of example i, topk denotesThe row number of the row i is k, and the element corresponding to the column number in the row i is the largest k elements in the row i.
Preferably, k is 100-500, more preferably 200.
In a preferred embodiment, in step 21, instead of recalculating the instance pooling vector for all the inputted picture instances each time a new picture instance is inputted, the obtained instance pooling vector is stored in the feature bankIn (1),where D is the dimension of the pooled feature, i.e., the number of channels C l In the subsequently input picture example, when similarity is calculated, the call is directly madeThe vector of (1).
where the momentum coefficient m is set to 0.3-0.7, preferably 0.5, all vectors in the bank are initialized to unity random vector.
In step 3, according to the self-supervision signal of instance i, a nearest neighbor set of instance i can be obtained, denoted asThe instances in the nearest neighbor set are positive instances to be pulled in, while the other instances outside the nearest neighbor set are negative instances to be pushed out in the gating space.
The positive instance refers to the nearest neighbor instance of instance i, and the instances except the nearest neighbor instance are referred to as negative instances.
In a preferred embodiment, step 3 comprises the following sub-steps:
and 31, acquiring the probability that the instance j is the positive instance of the instance i.
Specifically, the probability that an instance of the input convolutional neural network is identified as a positive instance of instance i in the gating space corresponding to the ith convolutional layer is:
wherein l represents the first convolutional layer of the convolutional neural network,example i gating probability output after passing through the gating module,is the gating probability of the output of the example j after passing through the gating module, tau is a temperature over-parameter and is preferably set to 0.01-0.2, for example to 0.07.
And step 32, acquiring a contrast loss function according to the positive example probability, minimizing the contrast loss function, and realizing the alignment of the example neighborhood relationship of the gating space and the feature space.
by contrast loss functionAnd minimizing, namely, the nearest neighbor in the characteristic space is pulled close in the gating space, so that the example neighborhood relationship of the characteristic space is reproduced in the gating space, and the alignment of the gating space and the example neighborhood relationship of the characteristic space is realized.
In the inventionDesigned contrast loss functionIs a variant of the InfoNCE loss function, which minimizes the mutual information between the characteristic distribution and the gated distribution, and the specific principle of which can be explained by the following analysis:
mutual information between the instance feature distribution and the gating feature distribution is defined as:
andcan be represented by a probability density ratio Is shown, andis thatThe corresponding gating probability is set to be,is the gating probability of the corresponding positive instance.
From the above equation, the contrast loss function can be rewritten as:
wherein the content of the first and second substances,representing nearest neighbor instance features in the feature space.
the above equation shows the contrast loss functionMinimizing may maximize a lower bound of mutual information of the instance feature distribution and the gating feature distribution, and accordingly, boosting of the mutual information of the instance feature distribution and the gating feature distribution facilitates alignment of the two distributions.
In a preferred embodiment, in step 31, the memory is accessed by gating the memory bankTo store the gating probability of any instance iIn a step 32 of the method, it is,self-supervision Signal according to example iSlave gated memory bankExtract corresponding good exampleAnd the formula (eleven) is calculated, so that repeated calculation is avoided, and the calculation amount is reduced.
Further, the gated bank is used for storing the new instance after the new instance is inputUpdated in a momentum manner, which can be expressed as:
where the momentum coefficient m is set to 0.3-0.7, preferably 0.5, and all vectors in the bank are initialized to be unit random vectors.
In step 4, the total target loss function loss is set as follows:
wherein the content of the first and second substances,is a function of the cross-entropy loss,representing the contrast loss function of the first convolutional layer,the L0 norm loss function for the first convolutional layer is the same as the L0 norm loss function in BAS; eta and rho are coefficients, and omega is a set of sequence numbers of convolution layers with characteristic-gating coupling in the convolution neural network.
Further, the air conditioner is provided with a fan,for each gated layer, wherein |) 0 Is the L0 norm.
According to the invention, the value of the coefficient ρ can be determined by a person skilled in the art through multiple tests as required, wherein the coefficient ρ is used for controlling the sparsity of the pruning model, i.e., the larger ρ is, the more sparse the model is, the lower the precision is, the lower ρ is, the lower the sparsity is, and the higher the precision is.
Preferably, the value of the coefficient η is between 0.002 and 0.004, more preferably 0.003.
According to the present invention, the overall target loss function includes a cross entropy loss function, an L0 norm loss function, and a contrast loss function. Wherein, the cross entropy loss function is used for an image classification task; the L0 norm loss function is used for enhancing the sparsity of the gating vector; the contrast loss function is used for feature-gated distribution alignment. Through the arrangement, the dynamic network pruning can couple the characteristics with the gate control, and the distortion of the gate control characteristics is reduced to the greatest extent.
In step 5, updating parameters of the dynamic pruning network through gradient back transmission according to a total target loss function of the dynamic pruning network, and terminating and returning to a final model if the network is converged; otherwise, repeating the steps 1-5 and carrying out iterative updating.
The specific method of this step is the same as the traditional neural network parameter updating method, and is not described in detail in the present invention.
Various embodiments of the above-described methods of the present invention may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), System On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present invention may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the methods described herein may be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The methods and apparatus described herein may be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described herein), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in a traditional physical host and a VPS service ("Virtual Private Server", or "VPS" for short). The server may also be a server of a distributed system, or a server incorporating a blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.
Examples
Example 1
Experiments were performed on the title quasi-image classification database, i.e. the image dataset was CIFAR10, the CIFAR10 dataset both contained 6 ten thousand color image instances, of which 5 ten thousand instances were used for training and 1 ten thousand instances were used for testing, the experimental environment being: NVIDIA RTX 3090 GPU, performed using PyTorch.
A convolutional neural network was set up and experiments were performed using the ResNet-20 model, which was trained for 400 cycles with 256 image instances per batch. Acceleration gradient descent optimizer using Nesterov, with momentum set to 0.9 and weight decay set to 5e -4 The initial learning rate is set to 0.1, and a multi-step decay strategy is adopted, namely the learning rate reaches 200, 275 and 350 in a training period respectively]The time decay is 0.1.
And (3) dynamically pruning a dynamic pruning network arranged in the last two residual modules in the ResNet-20 model by adopting a characteristic-gating coupling method to obtain a superior network, wherein the dynamic pruning method comprises the following steps:
step 1, obtaining a characteristic space and a gating space in a dynamic pruning network;
step 2, obtaining an example neighborhood relation in a feature space;
step 3, aligning the example neighborhood relationship between the gating space and the feature space;
step 4, acquiring a total target loss function of the dynamic pruning network;
and 5, updating the network parameters.
In step 1, the number of layers of the last two residual modules of the ResNet-20 model is the l convolutional layer and the l +1 convolutional layer of the convolutional neural network, where the l convolutional layer is taken as an example for explanation, and the pruning process of the l +1 layer and the l layer is the same.
The characteristics of the first convolutional layer are expressed as:
the gating characteristics corresponding to the ith convolution layer of the ResNet-20 model are expressed as:
p represents a global average pooling layer; linear () represents two fully connected layers; binconcoret is an activation function.
Step 2 comprises the following substeps:
For the first convolution layer of the convolutional neural network, the characteristics of the ith example are combined by using a global average pooling layerPooling as vectors
wherein the content of the first and second substances,indicating that in the l-th network, between the ith and jth instancesWith T representing transpose
Step 23, similarity matrix by exampleDetermining nearest neighbor examples of different examples, and taking a set of nearest neighbor example serial numbers as an automatic supervision signal;
the self-supervision signals for example i are:
wherein the content of the first and second substances,denotes the self-supervision Signal for example i, topk denotesThe column index of the k largest elements in the ith row of (1), where k is taken to be 200.
In step 3, step 3 comprises the following substeps:
step 31, acquiring the probability that the instance j is the positive instance of the instance i;
the probability that an instance of the input convolutional neural network is identified as a positive instance of instance i in the gating space corresponding to the ith convolutional layer is:
l denotes the first convolutional layer of the convolutional neural network,example i gating probability of output after passing through the gating module,is the gating probability of the output of example j after passing through the gating module, τ is 0.07,
step 32, obtaining a contrast loss function according to the positive example probability, minimizing the contrast loss function, and realizing the alignment of the example neighborhood relationship of the gating space and the feature space;
by contrast loss functionAnd minimizing to realize the alignment of the instance neighborhood relationship between the gating space and the feature space.
In step 4, the total target loss function loss is set as follows:
wherein the content of the first and second substances,is a function of the cross-entropy loss,the contrast loss function of the l-th layer is represented,l0 norm loss function for layer L; eta and rho are coefficients, eta is 0.003, rho is 0.4, and omega is a set of network layer numbers, wherein the total number of the network layers is l and l + 1.
Further, classification experiments are carried out on the classification database through an optimized network.
Example 2
The same experiment as in example 1 was carried out, except that the experiment was carried out using the ResNet-32 model.
Example 3
The same experiment as that of example 1 was performed, except that the experiment was performed using the ResNet-56 model, a dynamic pruning network was set in the last four residual modules in the ResNet-56, the dynamic pruning process was the same as that of example 1, and the set of network layer numbers in step 4 was four layers.
Example 4
The same experiment as in example 1 was performed except that the experimental database was CIFAR100, the model was trained for 400 cycles, the number of image instances per batch was 128, the initial learning rate was set to 0.1, and a multi-step decay strategy was also employed, with a 0.2 decay being achieved when the training cycles respectively reached [60,120,160 ].
Example 5
The same experiment as in example 4 was performed except that the experiment was performed using the ResNet-32 model.
Example 6
The same experiment as that of example 4 was performed, except that the experiment was performed using the ResNet-56 model, a dynamic pruning network was set in the last four residual modules in the ResNet-56, the dynamic pruning process was the same as that of example 4, and the set of network layer numbers in step 4 was four layers.
Example 7
The same experiment as in example 1 was performed except that the experimental database was ImageNet, and since the training process took a lot of time, the experiment was performed using the ResNet-18 model, which had 130 cycles in total, the number of image instances per batch was 256, and the weight decay was set to 1e -4 The learning rate is initialized to 0.1, again using a multi-step decay strategy, at a training period of [40,70,100 ]]The time decay is 0.1.
Example 8
The same experiment as in example 1 was performed, except that in step 23, k was taken as 5, 20, 100, 512, 1024, 2048, 4096, respectively.
The results are shown in table one:
watch 1
As can be seen from table one, when the number k of nearest neighbors is 200, the obtained pruning effect is better.
Example 9
The same experiment as in example 1 was carried out except that in step 4, the coefficients η were 5e-4, 0.001, 0.002, 0.003, 0.005, 0.01, and 0.02, respectively.
The results are shown in Table II
Watch two
As can be seen from Table II, the pruning effect obtained is excellent when the coefficient eta is 0.003.
Comparative example
Comparative example 1
The same experiment as in example 1 was conducted except that the SFP method, FPGM method, DSA method, Hinge method, DHP method, FBS method, BAS method were used for pruning,
SFP methods are specifically described in the papers: y.he, g.kang, x.dong, y.fu, y.yang, Soft filter setting for acquiring deep connected neural networks, in: proceedings of International Joint Conference on Artificial Intelligence insight (IJCAI), 2018, pp.2234-2240.
The FPGM method is specifically described in the paper: y.he, p.liu, z.wang, z.hu, y.yang, Filter pruning vitamin metric medium for deep connected network access, in: proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2019, pp.4340-4349.
DSA methods are specifically described in the papers: x.ning, t.zhao, w.li, p.lei, y.wang, h.yang, DSA: more effective partitioned leaving allocation, in: proceedings of European Conference on Computer Vision (ECCV),2020, pp.592-607.
The Hinge method is specifically described in the article: y.li, s.gu, c.mayer, l.v.gool, r.timofte, Group spark: the high between filter setting and decoding for network compression, in: proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2020, pp.8015-8024.
DHP methods are specifically seen in the papers: y.li, s.gu, k.zhang, l.v.gool, r.timofte, DHP: differential formatting via networks, in: proceedings of European Conference on Computer Vision (ECCV),2020, pp.608-624.
FBS methods are specifically referred to the paper: gao, y.zhao, l.dudziak, r.d.mullins, c.xu, Dynamic channel pruning: feature boosting and compression, in: proceedings of International Conference on Learning retrieval (ICLR),2019.
BAS methods are specifically referred to the article: ehteshami Bejnordi, t.blankevor, m.welling, Batch-mapping for learning a conditional channel networks, in: proceedings of International Conference on Learning retrieval (ICLR),2020.
Comparative example 2
The same experiment as in example 2 was performed except that pruning was performed using the Baseline method, SFP method, FPGM method, FBS method, BAS method instead of the dynamic pruning method in example 2,
the Baseline method is specifically referred to the article: k.he, x.zhang, s.ren, j.sun, Deep residual learning for image recognition, in: proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016: 770-778.
Comparative example 3
The same experiment as in example 3 was performed except that the dynamic pruning method in example 3 was replaced with the Baseline method, SFP method, FPGM method, HRank method, DSA method, Hinge method, DHP method, FBS method, BAS method, respectively, for pruning.
The HRank method is specifically described in the article: m.lin, r.ji, y.wang, y.zhang, b.zhang, y.tiana, l.sho, Hrank: filter pruning using high-rank feature map, in: proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2020, pp.1526-1535.
Comparative example 4
The same experiment as in example 4 was performed except that the dynamic pruning method in example 4 was replaced with the Baseline method and the BAS method, respectively, for pruning.
Comparative example 5
The same experiment as in example 5 was performed except that pruning was performed using the Baseline method, CAC method, BAS method instead of the dynamic pruning method in example 5, respectively.
The CAC method is specifically described in the article: chen, T.xu, C.Du, C.Liu, H.He, dynamic channel pruning by conditional access change for deep neural Networks, IEEE trans.neural Networks Learn.Syst. (TNNLS)32(2), (2021) 799-.
Comparative example 6
The same experiment as in example 6 was performed except that pruning was performed using the Baseline method, CAC method, BAS method instead of the dynamic pruning method in example 6, respectively.
Comparative example 7
The same experiment as in example 7 was performed except that the dynamic pruning method in example 7 was replaced with the Baseline method, SFP method, FPGM method, DSA method, LCCN method, cgnett method, FBS method, BAS method, respectively, for pruning.
LCCN methods are specifically referred to the article: x.dong, j.huang, y.yang, s.yan, More is less: a more populated network with less interference complexity, in: proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2017, pp.1895-1903.
The CGNeT method is specifically described in the paper: hua, y.zhou, c.d.sa, z.zhang, g.e.suh, Channel gating neural networks, in: proceedings of Advances in Neural Information Processing Systems (NeurIPS),2019, pp.1884-1894.
Experimental example 1
The pruning results of examples 1-3 and comparative examples 1-3 were counted, and the results are shown in Table III
The two sets of results in examples 1, 2, and 3 are the performance at the minimum classification error and the maximum compression ratio, respectively, and it can be seen from the two sets of results that examples 1-3 can achieve a better tradeoff between classification error and compression ratio.
In this experimental example, the classification error is the proportion of misclassified samples to the total number of samples, and the Top-1 classification error is calculated in the following manner:
wherein y is i Class number representing the ith sample, N represents the total number of samples, top1 (z) i ) Representative vector z i Subscript of the middle-largest element, z i Representing the network output:
whereinAn i-th sample representing the input to the net, i.e. the 0 th convolutional layer; NN represents a convolutional neural network, consisting of a plurality of convolutional layersAre connected to form;representing the output of the network, N c Is the number of classes, representing the number of correctly identified samples, y i ==top1(z i ) The meaning of (1) is that when the prediction of the network top1 (z) i ) And class label y i If the two values are consistent, the value is 1, otherwise, the value is 0.
The pruning ratio is calculated in the following way:
whereinGating vector representing the first convolutional layerThe ratio of zero elements of (b) can also be regarded as the calculated clipping ratio of the current layer,represents the calculated amount of the first convolution layer, comp NN_original Representing the calculated amount of the original (uncut) convolutional network, calculating the cutting calculated amount of each sample of the network according to the cutting ratio of the calculated amount of each gated convolutional layer, and obtaining the cutting ratio of the network by dividing the average value by the calculated amount of the original convolutional network.
As can be seen from table three, the methods in embodiments 1, 2, and 3 obtain lower classification errors at higher pruning rate, significantly improve pruning performance, and are superior to the existing advanced network pruning methods.
Experimental example 2
The pruning results of examples 4-6 and comparative examples 4-6 were counted, and the results are shown in Table IV
As with examples 1-3, the two sets of results in examples 4-6 are performance at minimum classification error and maximum compression ratio, respectively, and it can be seen from the two sets of results that examples 4-6 can achieve a better tradeoff between classification error and compression ratio.
As can be seen from table four, the methods in embodiments 4, 5, and 6 can obtain lower classification error at higher pruning rate, significantly improve pruning performance, and are superior to the existing advanced network pruning methods.
Experimental example 3
The pruning results of example 7 and comparative example 7 were counted and shown in Table five
As with examples 1-3, the two sets of results in example 7 are performance at minimum classification error and maximum compression ratio, respectively, and it can be seen from the two sets of results that example 7 can achieve a better tradeoff between classification error and compression ratio.
In this experimental example, the Top-5 classification error is calculated as follows:
wherein top5 (z) i ) Representative vector z i Set of subscripts of the largest 5 elements in , If y i Belong to this set, then y i ∈top5(z i ) The value of (A) is 1, otherwise is 0; the meaning of the remaining parameters and ToThe corresponding parameters in the p-1 classification error have the same meaning.
As can be seen from table three, table four, and table five, the methods in embodiments 1 to 7 can obtain lower classification errors in different data sets at higher pruning rates, and are superior to the existing advanced network pruning methods.
In the description of the present invention, it should be noted that the terms "upper", "lower", "inner", "outer", "front", "rear", etc. indicate orientations or positional relationships based on operational states of the present invention, and are only for convenience of description and simplification of description, but do not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," "third," and "fourth" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
In the description of the present invention, it should be noted that, unless otherwise specifically stated or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; the connection may be direct or indirect via an intermediate medium, and may be a communication between the two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
The present invention has been described above in connection with preferred embodiments, but these embodiments are merely exemplary and merely illustrative. On the basis of the above, the invention can be subjected to various substitutions and modifications, and the substitutions and the modifications are all within the protection scope of the invention.
Claims (5)
1. An image classification accuracy improving method is characterized by comprising the following steps:
acquiring an image dataset;
setting a convolutional neural network;
performing dynamic network pruning on the convolutional neural network by adopting a characteristic-gating coupling method to obtain an optimized network;
inputting the images to be classified into an optimization network, classifying the images by utilizing the optimization network,
the feature-gated coupling method comprises the steps of:
step 1, obtaining a characteristic space and a gating space in a dynamic pruning network;
step 2, obtaining an example neighborhood relationship in a feature space;
step 3, aligning the example neighborhood relationship between the gated space and the feature space;
step 4, acquiring a total target loss function of the dynamic pruning network;
step 5, updating network parameters;
step 2 comprises the following substeps:
Step 23, similarity matrix by exampleDetermining a nearest neighbor instance of the instance, and taking a set of nearest neighbor instance sequence numbers as an automatic supervision signal;
in step 22, the vectors are pooled by measuring different instancesThe similarity between different instances is obtained, and the measureBy dot product;
in step 23, forThe ith row of the sequence I is sequenced to obtain the column sequence numbers of k maximum elements in the sequence, and an example set corresponding to the sequence numbers is used as an automatic supervision signal of an example i;
step 3 comprises the following substeps:
step 31, acquiring the probability that the instance j is the positive instance of the instance i;
step 32, obtaining a contrast loss function according to the positive example probability, minimizing the contrast loss function, and realizing the alignment of the example neighborhood relationship of the gating space and the feature space;
the positive example refers to the nearest neighbor example of example i;
2. The image classification accuracy improvement method according to claim 1,
in step 31, the probability that the input instance is identified in the gated space as a positive instance of instance i is:
wherein l represents the first convolutional layer of the convolutional neural network,is the gating probability of the output after the instance i passes through the gating module,is the gating probability of the output after the example j passes through the gating module, and tau is a temperature over-parameter.
3. The image classification accuracy improvement method according to claim 1,
in step 4, the total target loss function loss is:
4. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-3.
5. A computer-readable storage medium having computer instructions stored thereon for causing the computer to perform the method of any one of claims 1-3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111229240.5A CN114037857B (en) | 2021-10-21 | 2021-10-21 | Image classification precision improving method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111229240.5A CN114037857B (en) | 2021-10-21 | 2021-10-21 | Image classification precision improving method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114037857A CN114037857A (en) | 2022-02-11 |
CN114037857B true CN114037857B (en) | 2022-09-23 |
Family
ID=80135096
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111229240.5A Active CN114037857B (en) | 2021-10-21 | 2021-10-21 | Image classification precision improving method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114037857B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110210620A (en) * | 2019-06-04 | 2019-09-06 | 北京邮电大学 | A kind of channel pruning method for deep neural network |
CN111242277A (en) * | 2019-12-27 | 2020-06-05 | 中国电子科技集团公司第五十二研究所 | Convolutional neural network accelerator supporting sparse pruning and based on FPGA design |
CN111368699A (en) * | 2020-02-28 | 2020-07-03 | 交叉信息核心技术研究院(西安)有限公司 | Convolutional neural network pruning method based on patterns and pattern perception accelerator |
CN111626330A (en) * | 2020-04-23 | 2020-09-04 | 南京邮电大学 | Target detection method and system based on multi-scale characteristic diagram reconstruction and knowledge distillation |
CN112508955A (en) * | 2021-02-08 | 2021-03-16 | 中国科学院自动化研究所 | Method for detecting living cell morphology based on deep neural network and related product |
CN113239981A (en) * | 2021-04-23 | 2021-08-10 | 中国科学院大学 | Image classification method of local feature coupling global representation |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
RU2711153C2 (en) * | 2018-05-23 | 2020-01-15 | Общество С Ограниченной Ответственностью "Яндекс" | Methods and electronic devices for determination of intent associated with uttered utterance of user |
CN112734025B (en) * | 2019-10-28 | 2023-07-21 | 复旦大学 | Neural network parameter sparsification method based on fixed base regularization |
CN113077044A (en) * | 2021-03-18 | 2021-07-06 | 北京工业大学 | General lossless compression and acceleration method for convolutional neural network |
-
2021
- 2021-10-21 CN CN202111229240.5A patent/CN114037857B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110210620A (en) * | 2019-06-04 | 2019-09-06 | 北京邮电大学 | A kind of channel pruning method for deep neural network |
CN111242277A (en) * | 2019-12-27 | 2020-06-05 | 中国电子科技集团公司第五十二研究所 | Convolutional neural network accelerator supporting sparse pruning and based on FPGA design |
CN111368699A (en) * | 2020-02-28 | 2020-07-03 | 交叉信息核心技术研究院(西安)有限公司 | Convolutional neural network pruning method based on patterns and pattern perception accelerator |
CN111626330A (en) * | 2020-04-23 | 2020-09-04 | 南京邮电大学 | Target detection method and system based on multi-scale characteristic diagram reconstruction and knowledge distillation |
CN112508955A (en) * | 2021-02-08 | 2021-03-16 | 中国科学院自动化研究所 | Method for detecting living cell morphology based on deep neural network and related product |
CN113239981A (en) * | 2021-04-23 | 2021-08-10 | 中国科学院大学 | Image classification method of local feature coupling global representation |
Non-Patent Citations (1)
Title |
---|
基于卷积神经网络的模型压缩研究;姚杨;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20210415(第04期);第I132-677页 * |
Also Published As
Publication number | Publication date |
---|---|
CN114037857A (en) | 2022-02-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020083073A1 (en) | Non-motorized vehicle image multi-label classification method, system, device and storage medium | |
US10354170B2 (en) | Method and apparatus of establishing image search relevance prediction model, and image search method and apparatus | |
CN110175628A (en) | A kind of compression algorithm based on automatic search with the neural networks pruning of knowledge distillation | |
WO2015165372A1 (en) | Method and apparatus for classifying object based on social networking service, and storage medium | |
CN111709493B (en) | Object classification method, training device, object classification equipment and storage medium | |
CN112232087B (en) | Specific aspect emotion analysis method of multi-granularity attention model based on Transformer | |
CN110929848A (en) | Training and tracking method based on multi-challenge perception learning model | |
US20230185998A1 (en) | System and method for ai-assisted system design | |
US20200082213A1 (en) | Sample processing method and device | |
CN112488301B (en) | Food inversion method based on multitask learning and attention mechanism | |
CN111353534B (en) | Graph data category prediction method based on adaptive fractional order gradient | |
CN112101364A (en) | Semantic segmentation method based on parameter importance incremental learning | |
CN114417058A (en) | Video material screening method and device, computer equipment and storage medium | |
JP2022117941A (en) | Image searching method and device, electronic apparatus, and computer readable storage medium | |
WO2021178981A9 (en) | Hardware-friendly multi-model compression of neural networks | |
CN114581868A (en) | Image analysis method and device based on model channel pruning | |
CN111079011A (en) | Deep learning-based information recommendation method | |
CN111126501B (en) | Image identification method, terminal equipment and storage medium | |
CN113190696A (en) | Training method of user screening model, user pushing method and related devices | |
CN114037857B (en) | Image classification precision improving method | |
CN111651660A (en) | Method for cross-media retrieval of difficult samples | |
CN112244863A (en) | Signal identification method, signal identification device, electronic device and readable storage medium | |
CN116433980A (en) | Image classification method, device, equipment and medium of impulse neural network structure | |
CN110990630A (en) | Video question-answering method based on graph modeling visual information and guided by using questions | |
CN115601578A (en) | Multi-view clustering method and system based on self-walking learning and view weighting |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |