CN114037857B

CN114037857B - Image classification precision improving method

Info

Publication number: CN114037857B
Application number: CN202111229240.5A
Authority: CN
Inventors: 施梦楠; 刘畅; 叶齐祥; 焦建彬
Original assignee: University of Chinese Academy of Sciences
Current assignee: University of Chinese Academy of Sciences
Priority date: 2021-10-21
Filing date: 2021-10-21
Publication date: 2022-09-23
Anticipated expiration: 2041-10-21
Also published as: CN114037857A

Abstract

The invention discloses an image classification precision improving method which is characterized by comprising the following steps: acquiring an image dataset; setting a convolutional neural network; performing dynamic network pruning on the convolutional neural network by adopting a characteristic-gating coupling method to obtain an optimized network; inputting the images to be classified into an optimization network, and classifying the images by utilizing the optimization network, wherein the dynamic network pruning method comprises the following steps: step 1, obtaining a characteristic space and a gating space in a dynamic pruning network; step 2, obtaining an example neighborhood relationship in a feature space; step 3, aligning the example neighborhood relationship between the gating space and the feature space; step 4, acquiring a total target loss function of the dynamic pruning network; and 5, updating the network parameters. The characteristic-gating coupling method suitable for dynamic network pruning disclosed by the invention can greatly reduce the distortion of gating characteristics and obviously improve the performance of a dynamic pruning network.

Description

Image classification precision improving method

Technical Field

The invention relates to an image classification precision improving method, and belongs to the technical field of image classification.

Background

In order to achieve higher image classification accuracy, Convolutional Neural Networks (CNNs) are designed to be larger and deeper, but the computational cost is also greatly increased.

In order to reduce the calculation cost, researchers have proposed a network pruning method and widely used the method, which can remove network parameters that contribute less to the classification accuracy. By the method, the calculation cost can be reduced as much as possible, and the representation capability of the original network of the pruned network is kept, so that the classification precision is reduced to the minimum, and a compact network model is obtained.

Existing network pruning methods can be roughly divided into two categories: static pruning and dynamic pruning, wherein in the static channel pruning method, a static simplified model is obtained by deleting a characteristic channel which has small contribution to the overall performance; the dynamic pruning method obtains a sub-network related to the input picture instance to reduce the computational cost at runtime.

In the conventional dynamic pruning method, an attached gating module is used to generate a binary mask at a channel level, i.e., a gating vector, which is used to indicate deletion or reservation of a channel. The gating module explores instance-level redundancy according to feature variations of different inputs, i.e., channels identifying particular features can be adaptively opened or closed for different input instances.

However, existing network pruning methods typically ignore the consistency between features and gating distributions. For example, when pairs of instances have similar characteristics but are gated differently, since the gating characteristics are generated by channel multiplication of the eigenvector and the gating vector, the disparity of the two distributions may cause distortion in the gating characteristic space, which may introduce noise instances into the similar pairs of instances or separate them, thereby reducing the representation ability of the pruned network.

Therefore, there is a need for further research on gated coupling in order to solve the above technical problems.

Disclosure of Invention

In order to overcome the above problems, the present inventors have conducted intensive studies to design an image classification accuracy improving method, including:

acquiring an image dataset;

setting a convolutional neural network;

performing dynamic network pruning on the convolutional neural network by adopting a characteristic-gating coupling method to obtain an optimized network;

inputting the images to be classified into an optimization network, classifying the images by utilizing the optimization network,

the feature-gated coupling method comprises the following steps:

step 1, obtaining a characteristic space and a gating space in a dynamic pruning network;

step 2, obtaining an example neighborhood relationship in a feature space;

step 3, aligning the example neighborhood relationship between the gating space and the feature space;

step 4, acquiring a total target loss function of the dynamic pruning network;

and 5, updating the network parameters.

Preferably, step 2 comprises the following sub-steps:

step 21, pooling example characteristics to obtain example pooling vectors

Step 22, pooling the vectors according to the instances

Obtaining a similarity matrix for an instance

Step 23, similarity matrix by example

And determining the nearest neighbor instance of the instance, and taking the set of the nearest neighbor instance sequence numbers as an automatic supervision signal.

Preferably, in step 22, the vectors are pooled by measuring different instances

The similarity between different instances is obtained by a dot product.

Preferably, in step 23, the method comprises

The ith row of (a) is sorted, the column sequence numbers of the k largest elements in the sequence are obtained, and the example set corresponding to the sequence numbers is used as the self-supervision signal of the example i.

Preferably, in step 3, the positive examples and the discrete negative examples are gathered by the contrast loss function, so as to realize the alignment of the example neighborhood relationship between the gating space and the feature space.

Preferably, step 3 comprises the following sub-steps:

step 31, acquiring the probability that the instance j is the positive instance of the instance i;

and step 32, acquiring a contrast loss function according to the positive example probability, minimizing the contrast loss function, and realizing the alignment of the example neighborhood relationship of the gating space and the feature space.

Preferably, in step 31, the probability that the input instance is identified as a positive instance of instance i in the gated space is:

wherein, l represents the l-th network,

example i gating probability of output after passing through the gating module,

is the gating probability of the output of example j after passing through the gating module, τ is the temperature hyperparameter.

Preferably, in step 4, the total objective loss function loss is:

wherein the content of the first and second substances,

is a function of the cross-entropy loss,

the contrast loss function of the l-th layer is represented,

l0 norm loss function for layer L; eta and rho are coefficients, and omega is a set of network layer sequence numbers.

The present invention also provides an electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform one of the methods described above.

The invention also provides a computer-readable storage medium having stored thereon computer instructions for causing the computer to perform one of the methods according to the above.

The invention has the advantages that:

(1) feature distributions and corresponding gating distributions in the dynamic pruning network can be aligned;

(2) feature-gating coupling is achieved by iteratively performing neighborhood relationship exploration and feature-gating alignment, and distortion of gating features can be greatly reduced;

(3) the performance of the dynamic pruning network can be significantly improved.

Drawings

Fig. 1 shows a schematic flow chart of a feature-gated coupling method for dynamic network pruning according to a preferred embodiment of the present invention.

Detailed Description

The invention is explained in more detail below with reference to the figures and examples. The features and advantages of the present invention will become more apparent from the description.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

In order to solve the problem that the traditional dynamic network pruning method may have distortion of the gating characteristics, in the invention, the distortion of the gating characteristics is reduced to the maximum extent through the distribution of the alignment characteristics and the gating.

The image classification precision improving method provided by the invention comprises the following steps:

acquiring an image dataset;

setting a convolutional neural network;

in the present invention, the image data set is used for training a neural network, and the method for acquiring the image data set is not particularly limited in the present invention, and may be any open image classification training set, or an image data set designed by a person skilled in the art according to actual needs.

In the present invention, the specific structure of the convolutional neural network is not limited, and those skilled in the art can select an appropriate convolutional neural network structure according to actual needs.

The characteristic-gated coupling method, as shown in fig. 1, includes the following steps:

step 2, obtaining an example neighborhood relationship in a feature space;

step 4, acquiring a total target loss function of the dynamic pruning network;

and 5, updating the network parameters.

In step 1, mapping the image instance input into the convolutional neural network to a feature space and a gating space according to a method in a dynamic network pruning BAS, which can be referred to in the article for the description of the dynamic network pruning BAS: e hteshami Bejnordi, t.blankevorort, m.welling, Batch-profiling for spare controlled channel networks, in: proceedings of International Conference on Learning retrieval (ICLR),2020.

The feature space contains example features of the current convolutional layer, according to the present invention, the dynamic network prune can be set at any convolutional layer of the convolutional neural network, and when the dynamic network prune is set at the ith convolutional layer of the convolutional neural network, the example features are expressed as:

where i denotes the different instances and l denotes the layers of the convolutional neural network;

is the input of the first convolutional layer of the convolutional neural network,

is the output of the first convolution layer, C ^l Is the number of output channels, w, of the first convolution layer ^l Convolution weight matrix representing the first convolution layer, represents convolution operator, and ReLU represents activation function.

The gating space comprises gating vectors of the current convolutional layer, and further, when the dynamic network pruning is arranged on the l convolutional layer of the convolutional neural network, the gating vectors and the example features are multiplied by each other at channel level to obtain gating features, which are expressed as:

wherein G denotes a gating block, which indicates channel level multiplication,

indicating gating characteristics.

In a preferred embodiment, the gating module G may be represented as:

g (), binconstret (Linear (p ())) (iii)

Wherein p represents a global average pooling layer for generating spatial descriptors of the input features; linear () represents two fully connected layers for generating gating probabilities; binconcoret is an activation function.

Further, the gating probability corresponding to the ith convolutional layer of the convolutional neural network is expressed as:

the gating vector corresponding to the first convolution layer of the convolution neural network is as follows:

further, according to the first, third, and sixth equations, the gating characteristic corresponding to the first convolutional layer of the convolutional neural network can also be expressed as:

in step 2, since the example neighborhood relationships are different in feature spaces of different semantic levels, for example, in low-level feature spaces, examples with similar colors or textures may be closer, while in high-level feature spaces, examples in the same class may be clustered together, so manual annotation cannot provide adaptive supervision of the example neighborhood relationships across different network stages, and how to obtain the example neighborhood relationships in the feature spaces is a difficult point of the present invention.

The invention provides a method for adaptively acquiring an example field relation in each feature space, which can be used in any layer of a network to acquire an automatic supervision signal for feature-gating distribution alignment.

Specifically, the method comprises the following substeps:

step 21, pooling example characteristics to obtain example pooling vectors

For the first convolutional layer of the convolutional neural network provided with dynamic network pruning, the characteristics of the ith instance in the layer are divided by using a global average pooling layer

Pooling into vectors

Pooled vectors

Called an example pooling vector, and the dimension of the feature is changed from C through pooling ^l ×W ^l ×H ^l Reduction to C ^l X 1, improving the efficiency and effect of subsequent treatment.

Step 22, pooling the vectors according to the instances

Obtaining a similarity matrix for an instance

Similarity matrix of the example

For characterizing the similarity between different instances in the layer l, and further, by measuring the pooling vectors of different instances

The similarity between different instances characterizes the similarity between different instances, preferably the metric is performed by dot product, and the similarity between different instances can be expressed as:

wherein i, j represent different instances,

the similarity between the ith and jth examples in the ith convolutional layer of the convolutional neural network is shown, and T represents transposition.

Furthermore, the similarity of all the examples in the first convolution layer is combined into a matrix, which is the similarity matrix

The elements of the matrix are

N represents the total number of instances used for training.

Step 23, similarity matrix by example

To pair

The ith row of the gate module is sequenced to obtain column sequence numbers of k maximum elements, the corresponding examples of the column sequence numbers are nearest neighbor examples of the example i, the set of the nearest neighbor example sequence numbers are self-supervision signals of the example i, the self-supervision signals are used for regulating the gate module, and the sequence numbers are expressed as:

wherein the content of the first and second substances,

denotes the self-supervision signal of example i, topk denotes

The row number of the row i is k, and the element corresponding to the column number in the row i is the largest k elements in the row i.

Preferably, k is 100-500, more preferably 200.

In a preferred embodiment, in step 21, instead of recalculating the instance pooling vector for all the inputted picture instances each time a new picture instance is inputted, the obtained instance pooling vector is stored in the feature bank

In (1),

where D is the dimension of the pooled feature, i.e., the number of channels C ^l In the subsequently input picture example, when similarity is calculated, the call is directly made

The vector of (1).

Further, the feature bank

Updated in a momentum manner, which can be expressed as:

where the momentum coefficient m is set to 0.3-0.7, preferably 0.5, all vectors in the bank are initialized to unity random vector.

In step 3, according to the self-supervision signal of instance i, a nearest neighbor set of instance i can be obtained, denoted as

The instances in the nearest neighbor set are positive instances to be pulled in, while the other instances outside the nearest neighbor set are negative instances to be pushed out in the gating space.

The positive instance refers to the nearest neighbor instance of instance i, and the instances except the nearest neighbor instance are referred to as negative instances.

In a preferred embodiment, step 3 comprises the following sub-steps:

and 31, acquiring the probability that the instance j is the positive instance of the instance i.

Specifically, the probability that an instance of the input convolutional neural network is identified as a positive instance of instance i in the gating space corresponding to the ith convolutional layer is:

wherein l represents the first convolutional layer of the convolutional neural network,

example i gating probability output after passing through the gating module,

is the gating probability of the output of the example j after passing through the gating module, tau is a temperature over-parameter and is preferably set to 0.01-0.2, for example to 0.07.

In particular, the contrast loss function

Comprises the following steps:

by contrast loss function

And minimizing, namely, the nearest neighbor in the characteristic space is pulled close in the gating space, so that the example neighborhood relationship of the characteristic space is reproduced in the gating space, and the alignment of the gating space and the example neighborhood relationship of the characteristic space is realized.

In the inventionDesigned contrast loss function

Is a variant of the InfoNCE loss function, which minimizes the mutual information between the characteristic distribution and the gated distribution, and the specific principle of which can be explained by the following analysis:

mutual information between the instance feature distribution and the gating feature distribution is defined as:

and

can be represented by a probability density ratio

Is shown, and

is that

The corresponding gating probability is set to be,

is the gating probability of the corresponding positive instance.

From the above equation, the contrast loss function can be rewritten as:

wherein the content of the first and second substances,

representing nearest neighbor instance features in the feature space.

And

the mutual information between them is constrained by the following inequality:

the above equation shows the contrast loss function

Minimizing may maximize a lower bound of mutual information of the instance feature distribution and the gating feature distribution, and accordingly, boosting of the mutual information of the instance feature distribution and the gating feature distribution facilitates alignment of the two distributions.

In a preferred embodiment, in step 31, the memory is accessed by gating the memory bank

To store the gating probability of any instance i

In a step 32 of the method, it is,

self-supervision Signal according to example i

Slave gated memory bank

Extract corresponding good example

And the formula (eleven) is calculated, so that repeated calculation is avoided, and the calculation amount is reduced.

Further, the gated bank is used for storing the new instance after the new instance is input

Updated in a momentum manner, which can be expressed as:

where the momentum coefficient m is set to 0.3-0.7, preferably 0.5, and all vectors in the bank are initialized to be unit random vectors.

In step 4, the total target loss function loss is set as follows:

wherein the content of the first and second substances,

is a function of the cross-entropy loss,

representing the contrast loss function of the first convolutional layer,

the L0 norm loss function for the first convolutional layer is the same as the L0 norm loss function in BAS; eta and rho are coefficients, and omega is a set of sequence numbers of convolution layers with characteristic-gating coupling in the convolution neural network.

Further, the air conditioner is provided with a fan,

for each gated layer, wherein |) ₀ Is the L0 norm.

According to the invention, the value of the coefficient ρ can be determined by a person skilled in the art through multiple tests as required, wherein the coefficient ρ is used for controlling the sparsity of the pruning model, i.e., the larger ρ is, the more sparse the model is, the lower the precision is, the lower ρ is, the lower the sparsity is, and the higher the precision is.

Preferably, the value of the coefficient η is between 0.002 and 0.004, more preferably 0.003.

According to the present invention, the overall target loss function includes a cross entropy loss function, an L0 norm loss function, and a contrast loss function. Wherein, the cross entropy loss function is used for an image classification task; the L0 norm loss function is used for enhancing the sparsity of the gating vector; the contrast loss function is used for feature-gated distribution alignment. Through the arrangement, the dynamic network pruning can couple the characteristics with the gate control, and the distortion of the gate control characteristics is reduced to the greatest extent.

In step 5, updating parameters of the dynamic pruning network through gradient back transmission according to a total target loss function of the dynamic pruning network, and terminating and returning to a final model if the network is converged; otherwise, repeating the steps 1-5 and carrying out iterative updating.

The specific method of this step is the same as the traditional neural network parameter updating method, and is not described in detail in the present invention.

Various embodiments of the above-described methods of the present invention may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), System On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present invention may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the methods described herein may be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The methods and apparatus described herein may be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described herein), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in a traditional physical host and a VPS service ("Virtual Private Server", or "VPS" for short). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

Examples

Example 1

Experiments were performed on the title quasi-image classification database, i.e. the image dataset was CIFAR10, the CIFAR10 dataset both contained 6 ten thousand color image instances, of which 5 ten thousand instances were used for training and 1 ten thousand instances were used for testing, the experimental environment being: NVIDIA RTX 3090 GPU, performed using PyTorch.

A convolutional neural network was set up and experiments were performed using the ResNet-20 model, which was trained for 400 cycles with 256 image instances per batch. Acceleration gradient descent optimizer using Nesterov, with momentum set to 0.9 and weight decay set to 5e ^-4 The initial learning rate is set to 0.1, and a multi-step decay strategy is adopted, namely the learning rate reaches 200, 275 and 350 in a training period respectively]The time decay is 0.1.

And (3) dynamically pruning a dynamic pruning network arranged in the last two residual modules in the ResNet-20 model by adopting a characteristic-gating coupling method to obtain a superior network, wherein the dynamic pruning method comprises the following steps:

step 2, obtaining an example neighborhood relation in a feature space;

step 4, acquiring a total target loss function of the dynamic pruning network;

and 5, updating the network parameters.

In step 1, the number of layers of the last two residual modules of the ResNet-20 model is the l convolutional layer and the l +1 convolutional layer of the convolutional neural network, where the l convolutional layer is taken as an example for explanation, and the pruning process of the l +1 layer and the l layer is the same.

The characteristics of the first convolutional layer are expressed as:

the gating characteristics corresponding to the ith convolution layer of the ResNet-20 model are expressed as:

wherein the content of the first and second substances,

p represents a global average pooling layer; linear () represents two fully connected layers; binconcoret is an activation function.

Step 2 comprises the following substeps:

step 21, pooling example characteristics to obtain example pooling vectors

For the first convolution layer of the convolutional neural network, the characteristics of the ith example are combined by using a global average pooling layer

Pooling as vectors

Step 22, obtaining the similarity matrix of the example

Matrix array

The elements of (b) are the similarities between the different examples:

wherein the content of the first and second substances,

indicating that in the l-th network, between the ith and jth instancesWith T representing transpose

Step 23, similarity matrix by example

Determining nearest neighbor examples of different examples, and taking a set of nearest neighbor example serial numbers as an automatic supervision signal;

the self-supervision signals for example i are:

wherein the content of the first and second substances,

denotes the self-supervision Signal for example i, topk denotes

The column index of the k largest elements in the ith row of (1), where k is taken to be 200.

In step 3, step 3 comprises the following substeps:

the probability that an instance of the input convolutional neural network is identified as a positive instance of instance i in the gating space corresponding to the ith convolutional layer is:

l denotes the first convolutional layer of the convolutional neural network,

example i gating probability of output after passing through the gating module,

is the gating probability of the output of example j after passing through the gating module, τ is 0.07,

step 32, obtaining a contrast loss function according to the positive example probability, minimizing the contrast loss function, and realizing the alignment of the example neighborhood relationship of the gating space and the feature space;

contrast loss function

Expressed as:

by contrast loss function

And minimizing to realize the alignment of the instance neighborhood relationship between the gating space and the feature space.

In step 4, the total target loss function loss is set as follows:

wherein the content of the first and second substances,

is a function of the cross-entropy loss,

the contrast loss function of the l-th layer is represented,

l0 norm loss function for layer L; eta and rho are coefficients, eta is 0.003, rho is 0.4, and omega is a set of network layer numbers, wherein the total number of the network layers is l and l + 1.

Further, classification experiments are carried out on the classification database through an optimized network.

Example 2

The same experiment as in example 1 was carried out, except that the experiment was carried out using the ResNet-32 model.

Example 3

The same experiment as that of example 1 was performed, except that the experiment was performed using the ResNet-56 model, a dynamic pruning network was set in the last four residual modules in the ResNet-56, the dynamic pruning process was the same as that of example 1, and the set of network layer numbers in step 4 was four layers.

Example 4

The same experiment as in example 1 was performed except that the experimental database was CIFAR100, the model was trained for 400 cycles, the number of image instances per batch was 128, the initial learning rate was set to 0.1, and a multi-step decay strategy was also employed, with a 0.2 decay being achieved when the training cycles respectively reached [60,120,160 ].

Example 5

The same experiment as in example 4 was performed except that the experiment was performed using the ResNet-32 model.

Example 6

The same experiment as that of example 4 was performed, except that the experiment was performed using the ResNet-56 model, a dynamic pruning network was set in the last four residual modules in the ResNet-56, the dynamic pruning process was the same as that of example 4, and the set of network layer numbers in step 4 was four layers.

Example 7

The same experiment as in example 1 was performed except that the experimental database was ImageNet, and since the training process took a lot of time, the experiment was performed using the ResNet-18 model, which had 130 cycles in total, the number of image instances per batch was 256, and the weight decay was set to 1e ^-4 The learning rate is initialized to 0.1, again using a multi-step decay strategy, at a training period of [40,70,100 ]]The time decay is 0.1.

Example 8

The same experiment as in example 1 was performed, except that in step 23, k was taken as 5, 20, 100, 512, 1024, 2048, 4096, respectively.

The results are shown in table one:

watch 1

As can be seen from table one, when the number k of nearest neighbors is 200, the obtained pruning effect is better.

Example 9

The same experiment as in example 1 was carried out except that in step 4, the coefficients η were 5e-4, 0.001, 0.002, 0.003, 0.005, 0.01, and 0.02, respectively.

The results are shown in Table II

Watch two

As can be seen from Table II, the pruning effect obtained is excellent when the coefficient eta is 0.003.

Comparative example

Comparative example 1

The same experiment as in example 1 was conducted except that the SFP method, FPGM method, DSA method, Hinge method, DHP method, FBS method, BAS method were used for pruning,

SFP methods are specifically described in the papers: y.he, g.kang, x.dong, y.fu, y.yang, Soft filter setting for acquiring deep connected neural networks, in: proceedings of International Joint Conference on Artificial Intelligence insight (IJCAI), 2018, pp.2234-2240.

The FPGM method is specifically described in the paper: y.he, p.liu, z.wang, z.hu, y.yang, Filter pruning vitamin metric medium for deep connected network access, in: proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2019, pp.4340-4349.

DSA methods are specifically described in the papers: x.ning, t.zhao, w.li, p.lei, y.wang, h.yang, DSA: more effective partitioned leaving allocation, in: proceedings of European Conference on Computer Vision (ECCV),2020, pp.592-607.

The Hinge method is specifically described in the article: y.li, s.gu, c.mayer, l.v.gool, r.timofte, Group spark: the high between filter setting and decoding for network compression, in: proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2020, pp.8015-8024.

DHP methods are specifically seen in the papers: y.li, s.gu, k.zhang, l.v.gool, r.timofte, DHP: differential formatting via networks, in: proceedings of European Conference on Computer Vision (ECCV),2020, pp.608-624.

FBS methods are specifically referred to the paper: gao, y.zhao, l.dudziak, r.d.mullins, c.xu, Dynamic channel pruning: feature boosting and compression, in: proceedings of International Conference on Learning retrieval (ICLR),2019.

BAS methods are specifically referred to the article: ehteshami Bejnordi, t.blankevor, m.welling, Batch-mapping for learning a conditional channel networks, in: proceedings of International Conference on Learning retrieval (ICLR),2020.

Comparative example 2

The same experiment as in example 2 was performed except that pruning was performed using the Baseline method, SFP method, FPGM method, FBS method, BAS method instead of the dynamic pruning method in example 2,

the Baseline method is specifically referred to the article: k.he, x.zhang, s.ren, j.sun, Deep residual learning for image recognition, in: proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016: 770-778.

Comparative example 3

The same experiment as in example 3 was performed except that the dynamic pruning method in example 3 was replaced with the Baseline method, SFP method, FPGM method, HRank method, DSA method, Hinge method, DHP method, FBS method, BAS method, respectively, for pruning.

The HRank method is specifically described in the article: m.lin, r.ji, y.wang, y.zhang, b.zhang, y.tiana, l.sho, Hrank: filter pruning using high-rank feature map, in: proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2020, pp.1526-1535.

Comparative example 4

The same experiment as in example 4 was performed except that the dynamic pruning method in example 4 was replaced with the Baseline method and the BAS method, respectively, for pruning.

Comparative example 5

The same experiment as in example 5 was performed except that pruning was performed using the Baseline method, CAC method, BAS method instead of the dynamic pruning method in example 5, respectively.

The CAC method is specifically described in the article: chen, T.xu, C.Du, C.Liu, H.He, dynamic channel pruning by conditional access change for deep neural Networks, IEEE trans.neural Networks Learn.Syst. (TNNLS)32(2), (2021) 799-.

Comparative example 6

The same experiment as in example 6 was performed except that pruning was performed using the Baseline method, CAC method, BAS method instead of the dynamic pruning method in example 6, respectively.

Comparative example 7

The same experiment as in example 7 was performed except that the dynamic pruning method in example 7 was replaced with the Baseline method, SFP method, FPGM method, DSA method, LCCN method, cgnett method, FBS method, BAS method, respectively, for pruning.

LCCN methods are specifically referred to the article: x.dong, j.huang, y.yang, s.yan, More is less: a more populated network with less interference complexity, in: proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2017, pp.1895-1903.

The CGNeT method is specifically described in the paper: hua, y.zhou, c.d.sa, z.zhang, g.e.suh, Channel gating neural networks, in: proceedings of Advances in Neural Information Processing Systems (NeurIPS),2019, pp.1884-1894.

Experimental example 1

The pruning results of examples 1-3 and comparative examples 1-3 were counted, and the results are shown in Table III

The two sets of results in examples 1, 2, and 3 are the performance at the minimum classification error and the maximum compression ratio, respectively, and it can be seen from the two sets of results that examples 1-3 can achieve a better tradeoff between classification error and compression ratio.

In this experimental example, the classification error is the proportion of misclassified samples to the total number of samples, and the Top-1 classification error is calculated in the following manner:

wherein y is _i Class number representing the ith sample, N represents the total number of samples, top1 (z) _i ) Representative vector z _i Subscript of the middle-largest element, z _i Representing the network output:

wherein

An i-th sample representing the input to the net, i.e. the 0 th convolutional layer; NN represents a convolutional neural network, consisting of a plurality of convolutional layersAre connected to form;

representing the output of the network, N _c Is the number of classes, representing the number of correctly identified samples, y _i ＝＝top1(z _i ) The meaning of (1) is that when the prediction of the network top1 (z) _i ) And class label y _i If the two values are consistent, the value is 1, otherwise, the value is 0.

The pruning ratio is calculated in the following way:

wherein

Gating vector representing the first convolutional layer

The ratio of zero elements of (b) can also be regarded as the calculated clipping ratio of the current layer,

represents the calculated amount of the first convolution layer, comp _{NN_original} Representing the calculated amount of the original (uncut) convolutional network, calculating the cutting calculated amount of each sample of the network according to the cutting ratio of the calculated amount of each gated convolutional layer, and obtaining the cutting ratio of the network by dividing the average value by the calculated amount of the original convolutional network.

As can be seen from table three, the methods in embodiments 1, 2, and 3 obtain lower classification errors at higher pruning rate, significantly improve pruning performance, and are superior to the existing advanced network pruning methods.

Experimental example 2

The pruning results of examples 4-6 and comparative examples 4-6 were counted, and the results are shown in Table IV

As with examples 1-3, the two sets of results in examples 4-6 are performance at minimum classification error and maximum compression ratio, respectively, and it can be seen from the two sets of results that examples 4-6 can achieve a better tradeoff between classification error and compression ratio.

As can be seen from table four, the methods in embodiments 4, 5, and 6 can obtain lower classification error at higher pruning rate, significantly improve pruning performance, and are superior to the existing advanced network pruning methods.

Experimental example 3

The pruning results of example 7 and comparative example 7 were counted and shown in Table five

As with examples 1-3, the two sets of results in example 7 are performance at minimum classification error and maximum compression ratio, respectively, and it can be seen from the two sets of results that example 7 can achieve a better tradeoff between classification error and compression ratio.

In this experimental example, the Top-5 classification error is calculated as follows:

wherein top5 (z) _i ) Representative vector z _i Set of subscripts of the largest 5 elements in _， If y _i Belong to this set, then y _i ∈top5(z _i ) The value of (A) is 1, otherwise is 0; the meaning of the remaining parameters and ToThe corresponding parameters in the p-1 classification error have the same meaning.

As can be seen from table three, table four, and table five, the methods in embodiments 1 to 7 can obtain lower classification errors in different data sets at higher pruning rates, and are superior to the existing advanced network pruning methods.

In the description of the present invention, it should be noted that the terms "upper", "lower", "inner", "outer", "front", "rear", etc. indicate orientations or positional relationships based on operational states of the present invention, and are only for convenience of description and simplification of description, but do not indicate or imply that the device or element referred to must have a specific orientation, be constructed and operated in a specific orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," "third," and "fourth" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the description of the present invention, it should be noted that, unless otherwise specifically stated or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; the connection may be direct or indirect via an intermediate medium, and may be a communication between the two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

The present invention has been described above in connection with preferred embodiments, but these embodiments are merely exemplary and merely illustrative. On the basis of the above, the invention can be subjected to various substitutions and modifications, and the substitutions and the modifications are all within the protection scope of the invention.

Claims

1. An image classification accuracy improving method is characterized by comprising the following steps:

acquiring an image dataset;

setting a convolutional neural network;

the feature-gated coupling method comprises the steps of:

step 2, obtaining an example neighborhood relationship in a feature space;

step 3, aligning the example neighborhood relationship between the gated space and the feature space;

step 4, acquiring a total target loss function of the dynamic pruning network;

step 5, updating network parameters;

step 2 comprises the following substeps:

step 21, pooling example characteristics to obtain example pooling vectors

Step 22, pooling vectors according to instances

Obtaining a similarity matrix for an instance

Step 23, similarity matrix by example

Determining a nearest neighbor instance of the instance, and taking a set of nearest neighbor instance sequence numbers as an automatic supervision signal;

in step 22, the vectors are pooled by measuring different instances

The similarity between different instances is obtained, and the measureBy dot product;

in step 23, for

The ith row of the sequence I is sequenced to obtain the column sequence numbers of k maximum elements in the sequence, and an example set corresponding to the sequence numbers is used as an automatic supervision signal of an example i;

step 3 comprises the following substeps:

the positive example refers to the nearest neighbor example of example i;

the contrast loss function

Comprises the following steps:

wherein the content of the first and second substances,

a nearest neighbor set of the example i is shown,

is the probability that an input instance is identified in gated space as a positive instance of instance i.

2. The image classification accuracy improvement method according to claim 1,

in step 31, the probability that the input instance is identified in the gated space as a positive instance of instance i is:

is the gating probability of the output after the instance i passes through the gating module,

is the gating probability of the output after the example j passes through the gating module, and tau is a temperature over-parameter.

3. The image classification accuracy improvement method according to claim 1,

in step 4, the total target loss function loss is:

wherein, the first and the second end of the pipe are connected with each other,

is a function of the cross-entropy loss,

the contrast loss function of the l-th layer is represented,

l0 norm loss function for layer L; eta and rho are coefficients, and omega is a set of network layer serial numbers.

4. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-3.

5. A computer-readable storage medium having computer instructions stored thereon for causing the computer to perform the method of any one of claims 1-3.