CN113743582B

CN113743582B - Novel channel shuffling method and device based on stack shuffling

Info

Publication number: CN113743582B
Application number: CN202110902470.7A
Authority: CN
Inventors: 裴颂伟; 季语成
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2021-08-06
Filing date: 2021-08-06
Publication date: 2023-11-17
Anticipated expiration: 2041-08-06
Also published as: CN113743582A

Abstract

The application provides a novel channel shuffling method based on stack shuffling, which relates to the technical field of channel shuffling, wherein the method comprises the following steps: serializing the channels into corresponding numbers to obtain corresponding input channel sequences; giving a first empty stack as a channel shuffling place and a second empty stack as a storage place of an output channel; randomly selecting a push channel from an input channel sequence to a first empty stack each time, or a pop channel from the first empty stack, and clearing all pops of elements in the first empty stack if the input channel sequence is empty; and receiving channels coming out of the first empty stack pop by using the second empty stack, and forming a shuffled channel sequence as output. The application adopting the scheme can more advantageously carry out uniform and controllable shuffling fusion on the channel characteristics.

Description

Novel channel shuffling method and device based on stack shuffling

Technical Field

The application relates to the technical field of channel shuffling, in particular to a novel channel shuffling method and device based on stack shuffling.

Background

The CNN library has first proposed a convolutional layer concept based on random sparsification, which is located in a random channel shuffling operation after normal convolution, and since it varies with respect to different neural network shuffling methods, subsequent studies rarely utilize this method for shuffling. The strategy of 'two steps' convolution adopts the existing random shuffling method and is realized, namely the channels are combined in a random scrambling mode, and the random channel shuffling brings about model compression and acceleration effects, so that the method has strong limitation.

The ShuffleNet network opens up a completely new channel shuffling technique that brings about the flow of information within the group. The technology is mainly divided into the following steps: the convolution channels are divided into g groups, each group has n channels, which can be represented by (g, n) element pairs, the channels are remodeled according to group numbers as row numbers and group numbers as column numbers, so as to obtain an arrangement shown in fig. 2, under which 12 channels are divided into 3 groups, 4 channels of each group, 1-4 are the first group, 5-8 are the second group, 9-12 are the third group, and 1-12 are the sequence numbers of the original inputs of the channels. On this basis, the permutation matrix is subjected to a transposition operation, so as to obtain a permutation mode shown in fig. 3, which can be marked as (g, g); the transposed matrix takes columns as indexes, and a shuffled result is obtained according to the sequence from top to bottom of each column, and can represent (1, 5, 9) (2, 6, 10) (3, 7, 11) (4, 8, 12), after the matrix is flattened, a complete shuffled sequence (1,5,9,2,6,10,3,7,11,4,8,12) is formed, and the complete shuffled sequence is input to a next-layer network, so that a new grouping mode is obtained, as shown in fig. 4; if the number of packets of the new input layer needs to be changed, the permutation of the packet input matrix only needs to be changed, and the transposition strategy remains unchanged.

Zhang proposes a new shuffling method, firstly numbering channels according to indexes, then learning on all possible channel arrangement spaces, taking channel dimensions as arrangement basis, selecting a proper channel shuffling mode through a mathematical formula, and obtaining a certain effect on a ResNet network on a CIFAR-10 data set, but the improvement rate is very limited.

In the longitudinal view of the existing channel shuffling technology, the precision loss of a model is larger, usually between 2% and 4%, based on a random sparse convolution layer concept and a 'two-step' convolution strategy, meanwhile, noise brought by a random shuffling method is quite remarkable, rapid convergence of the model is not facilitated, the defect of random shuffling is overcome by a brand-new channel shuffling technology developed by a ShuffleNet network, shuffling is performed in a line and column transposition mode, the line and column transposition mode is seemingly uniform, and the method has great defects, for example, the original sequence of the head channel and the tail channel is not changed during each shuffling. When the channels satisfy g=n, all the channels located on the diagonal will not change their original orderAs shown in fig. 5. This would mean that the number of channels that do not change the original order is located at [2, min (g, n)](g.gtoreq.2, n.gtoreq.2) in all channels at a ratio ofThis provides the possibility of improving the accuracy performance of the model.

Although the method proposed by Zhang is novel, NAS searching in the full permutation space still consumes a lot of time, and because parameters P and Q are set as the basis for examining the connectivity between channels, it is difficult to explain the improvement of the effect to some extent. On the other hand, the precision improvement is still very limited, and compared with the SheffleNet precision improvement under the ResNet-50 network, the precision improvement is only 0.11 percent, and the precision of the original model is not as good as that under the condition of ungrouped convolution.

Disclosure of Invention

The present application aims to solve at least one of the technical problems in the related art to some extent.

Therefore, the first object of the present application is to provide a novel channel shuffling method based on stack shuffling, which solves the problem that the characteristic information in the existing channel shuffling method cannot be fully fused, and further brings about the loss of neural network model precision, and by the proposed channel shuffling method based on stack shuffling, the channel characteristics are uniformly and controllably shuffled and fused, so that the uncertainty of random shuffling down characteristic fusion is avoided, the problem that insufficient shuffling exists in the ShuffleNet is avoided, the precision of model compression is improved, and meanwhile, the desired shuffling mode can be designed according to the characteristics of the neural network, and the inter-group distribution and intra-group fusion of the characteristics are controlled.

A second object of the application is to propose a new channel shuffling device based on stack shuffling.

A third object of the present application is to propose a non-transitory computer readable storage medium.

To achieve the above object, an embodiment of a first aspect of the present application provides a novel channel shuffling method based on stack shuffling, including: serializing the channels into corresponding numbers to obtain corresponding input channel sequences; giving a first empty stack as a channel shuffling place and a second empty stack as a storage place of an output channel; randomly selecting a push channel from an input channel sequence to a first empty stack each time, or a pop channel from the first empty stack, and clearing all pops of elements in the first empty stack if the input channel sequence is empty; and receiving channels coming out of the first empty stack pop by using the second empty stack, and forming a shuffled channel sequence as output.

Optionally, in an embodiment of the present application, a reasonable value range of the number of channels stored in the first empty stack is set according to the number of channels included in the input channel sequence, if the number of channels stored in the first empty stack at the current moment exceeds the reasonable value range, the channels in the first empty stack are randomly popped, and if the number of channels stored in the first empty stack at the current moment is within the reasonable value range, a random number is generated to determine to execute a push or pop operation.

Alternatively, in one embodiment of the application, the order of pop channels is recorded by the second empty stack after each pop channel from the first empty stack.

Optionally, in one embodiment of the present application, the reasonable value range is determined by a threshold and a first ratio, where the first ratio represents a ratio of the number of stacks pushed by each set of input channel sequences to the current output number, and the ratio represents an allocation rate of channels between different sets after shuffling, and a higher allocation rate indicates that the more the set of features are retained by the next round of convolution operation, and a smaller number of times are conversely retained.

To achieve the above object, an embodiment of a second aspect of the present application provides a novel channel shuffling device based on stack shuffling, including: the device comprises a conversion module, a first empty stack module, a second empty stack module and an operation module, wherein,

the conversion module is used for serializing the channels into corresponding numbers to obtain corresponding input channel sequences;

the first empty stack module is used for storing the shuffling channel sequence;

the second empty stack module is used for storing the output channel sequence;

and the operation module is used for randomly selecting a push channel from the input channel sequence to the first empty stack or a pop channel from the first empty stack, emptying all the pops of the elements in the first empty stack if the input channel sequence is empty, and receiving the channels from the pop of the first empty stack by using the second empty stack to form a shuffled channel sequence as output.

To achieve the above object, an embodiment of a third aspect of the present application proposes a non-transitory computer-readable storage medium capable of performing a novel channel shuffling method based on stack shuffling when instructions in the storage medium are executed by a processor.

The novel channel shuffling method based on stack shuffling, the novel channel shuffling device based on stack shuffling and the non-transitory computer readable storage medium solve the problem that characteristic information in the existing channel shuffling method cannot be fully fused, so that accuracy loss of a neural network model is caused, and through the channel shuffling method based on stack shuffling, uniform and controllable shuffling fusion of channel characteristics is realized, so that uncertainty of feature fusion under random shuffling is avoided, the problem that insufficient shuffling exists in a ShuffleNet is avoided, model compression accuracy is improved, and simultaneously, a desired shuffling mode can be designed according to characteristics of the neural network, and inter-group distribution and intra-group fusion of the characteristics are controlled.

Additional aspects and advantages of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.

Drawings

The foregoing and/or additional aspects and advantages of the application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

FIG. 1 is a flow chart of a novel stack shuffling-based channel shuffling method according to a first embodiment of the present application;

FIG. 2 is a schematic diagram of a shuffle front of a shuffle network channel in accordance with an embodiment of the present application;

FIG. 3 is a schematic diagram illustrating a shuffleNet channel transpose operation in accordance with an embodiment of the present application;

FIG. 4 is a schematic diagram of a shuffled shuffle of a shuffleNet channel in accordance with an embodiment of the present application;

fig. 5 is a schematic diagram of channel shuffling when g=n according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a novel channel shuffling method based on stack shuffling in accordance with an embodiment of the present application;

FIG. 7 is a structural position diagram of a novel channel shuffling method based on stack shuffling in an embodiment of the present application in whole model compression;

FIG. 8 is an algorithm flow diagram of a novel channel shuffling method based on stack shuffling in accordance with an embodiment of the present application;

fig. 9 is a schematic structural diagram of a novel channel shuffling device based on stack shuffling according to a second embodiment of the present application.

Detailed Description

Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are illustrative and intended to explain the present application and should not be construed as limiting the application.

The novel channel shuffling method and apparatus based on stack shuffling according to embodiments of the present application are described below with reference to the accompanying drawings.

Fig. 1 is a flowchart of a novel channel shuffling method based on stack shuffling according to a first embodiment of the present application.

As shown in fig. 1, the novel channel shuffling method based on stack shuffling comprises the following steps:

step 101, serializing the channels into corresponding numbers to obtain corresponding input channel sequences;

102, giving a first empty stack as a place for channel shuffling, and a second empty stack as a storage place for an output channel;

step 103, randomly selecting a push channel from the input channel sequence to a first empty stack each time, or a pop channel from the first empty stack, and if the input channel sequence is empty, clearing all pops of elements in the first empty stack;

and 104, receiving channels from the first empty stack pop by using the second empty stack, and forming a shuffled channel sequence as output.

According to the novel channel shuffling method based on stack shuffling, the channels are serialized into corresponding numbers, and corresponding input channel sequences are obtained; giving a first empty stack as a channel shuffling place and a second empty stack as a storage place of an output channel; randomly selecting a push channel from an input channel sequence to a first empty stack each time, or a pop channel from the first empty stack, and clearing all pops of elements in the first empty stack if the input channel sequence is empty; and receiving channels coming out of the first empty stack pop by using the second empty stack, and forming a shuffled channel sequence as output. Therefore, the problem that characteristic information cannot be fully fused in the existing channel shuffling method, and further, the precision loss of a neural network model is caused can be solved, the channel characteristics are uniformly and controllably shuffled and fused through the proposed channel shuffling method based on stack shuffling, the uncertainty of random shuffling in characteristic fusion is avoided, the problem that insufficient shuffling exists in the SheffeNet is avoided, the precision of model compression is improved, and meanwhile, the self-wanted shuffling mode can be designed according to the characteristics of the neural network, and the inter-group distribution and intra-group fusion of the characteristics are controlled.

In order to ensure that the stack shuffling can be more sufficient and uniform, a threshold t must be introduced, where t represents the number of channel sequences stored in the stack 1, and the number of channels pushed into the stack 1 is t+1, otherwise t-1 is t, and it is obvious that the number of t needs to be constrained, and t represents a reasonable range of the number of channels stored in the stack 1, and that numbers greater than or less than this are not allowed, so as to ensure the shuffling uniformity requirement, and assuming that each time a channel is pushed into the stack 1, and no pop operation exists, more channel sequences with the same group number are pushed into the stack with the increase of t. Because each group has n channels required by the uniformity of shuffling, two groups of sequences with the same number cannot be completely entered into a stack at the same time, according to the pigeon nest principle, the reasonable value range of t should be [0,2n-2], if t exceeds the upper limit of the value range, the pop operation must be executed, the channels in the stack 1 are popped up, the random number generated by the system can determine to execute the push or pop operation, and after each pop, the sequence of the pop channels is recorded by the stack 2. When the input channel sequence is empty, i.e. there is no more new input, all pops of the elements in stack 1 are emptied.

On the basis of t, in order to make shuffling controllable, parameters r, r are introduced to represent the intra-group channel/feature retention degree when channel shuffling, the larger the shuffling rate, the better the uniformity, and the less the intra-group feature retention, so the uniformity t needs to be flexibly adjusted by the distribution degree r. The influence of grouping convolution performance of the neural network under different shuffling uniformity is explored, if the single pursuing shuffling is more uniform and better, the result is often single, in order to make the shuffling effect more flexible, the Gaussian distribution variation delta mu is introduced as the adjustment degree r to act on t, and thus the adjustment amplitude of t is not limited to [0,2n-2]]The new constraint is thatr records the ratio of the number of pressed stacks of each group of channels to the current output number, wherein the value reflects the distribution rate of the channels among different groups after shuffling, and if the r value is smaller, the more uniform the channels are after shuffling, namely the less the original channels in the channel group are reserved; if the r value is larger, namely the distribution rate is higher, the shuffling is more uneven, namely the more original channels in the channel group are reserved, and the less the number of the disturbed channels is. The parameters t and r may be combined with each other to control the stack shuffling process to achieve the desired shuffling result for the target.

Further, in the embodiment of the present application, a reasonable value range of the number of channels stored in the first empty stack is set according to the number of channels included in the input channel sequence, if the number of channels stored in the first empty stack at the current moment exceeds the reasonable value range, the channels in the first empty stack are randomly popped up, and if the number of channels stored in the first empty stack at the current moment is within the reasonable value range, a random number is generated to determine to execute a push or pop operation.

If the number of groups is g and the number of channels in each group is n, then the reasonable value range of t should be [0,2n-2]]If t exceeds the upper limit, pop operation must be executed, the channel in stack 1 is popped up, the generated random number can determine to execute push or pop operation, and after each pop, the stack2 record the order of pop channels. If the input channel sequence is empty, all pops of elements in stack 1 are emptied. The r value can be adjusted in combination with t for a certain group g ₁ If the r value is smaller, the feature retention in the first groups of channels after shuffling is less, and the feature retention in the last groups of channels is correspondingly more, so that the shuffling effect of gradually increasing feature distribution is obtained.

Further, in the embodiment of the present application, after pop one channel at a time from the first empty stack, the order of pop channels is recorded by the second empty stack.

Further, in the embodiment of the application, the reasonable value range is determined by a threshold value and a first ratio value, wherein the first ratio value represents the ratio of the number of the push stacks of each group of input channel sequences to the current output number, the ratio value represents the distribution rate of channels among different groups after shuffling, and the higher the distribution rate is, the more the group characteristics are reserved in the next round of convolution operation, and the less the group characteristics are reserved in the next round of convolution operation.

From the standpoint of code implementation, it specifically operates as follows: extracting component information again according to each dimension by using a four-dimensional tensor (W/H/K/C) to obtain four components (W/H/K/C), and disturbing the channel sequence in a stack shuffling mode by taking a channel (C) as an independent variable and t and r as control parameters. In order to ensure that different channels are dispersed in each group as evenly as possible, random push/pop operation is needed by taking the group as a unit, when the difference between the number of push operation and the number of pop operation is greater than t, the output is carried out according to the algorithm with the largest output channel number sequence word pattern, otherwise, the output is carried out according to a random form.

Channel shuffling can be applied to the neural network model compression algorithms such as LeNet, alexNet, resNet, resNet50, resNet101 and the like, the network compression algorithm usually uses a packet convolution strategy, and the packet convolution can bring about the problem of unsmooth information flow between each group, so that the problem can be solved by the channel shuffling. Compared with the traditional shuffling strategy, the stack shuffling can improve the precision (accuracy) of the compression model. If the LeNet improves the recognition rate of the handwritten numbers, the AlexNet improves the capabilities of image classification and image recognition, and the ResNet improves the precision of image classification, image segmentation and the like. In particular, when network models such as AlexNet/ResNet are compressed, a stack shuffling strategy is adopted, and test results on a CIFAR-10 data set show that the image classification accuracy is higher than that of an original model before the compression, and the image classification accuracy is improved by about 0.13-0.32%.

FIG. 6 is a schematic diagram of a novel channel shuffling method based on stack shuffling according to an embodiment of the present application.

As shown in fig. 6, the novel channel shuffling method based on stack shuffling gives two empty stacks, stack 1 is used as a place of channel shuffling, and a push channel can be randomly selected from an input channel sequence at a time or a pop channel in stack 1 (ensuring that the stack is not empty); stack 2 serves as a storage location for outgoing channels, each time receiving a channel from stack 1 pop.

FIG. 7 is a structural diagram of a novel channel shuffling method based on stack shuffling in an embodiment of the present application in a whole model compression.

As shown in fig. 7, the model compression flow structure comprises an original model, channel shuffling and grouping convolution, wherein the novel channel shuffling method based on stack shuffling is suitable for the step of channel shuffling in the model compression flow structure.

FIG. 8 is an algorithm flow chart of a novel channel shuffling method based on stack shuffling according to an embodiment of the present application.

As shown in fig. 8, the novel channel shuffling method based on stack shuffling includes: inputting a model to obtain an input channel sequence; initializing stack 1 and stack 2 to be empty, and inputting packet number g and channel number n in each group; calculating a t value and setting a parameter r; judging whether the number of channels in the stack 1 meets the value [0,2n-2] of t; if the value exceeds the value range, pop operation is required to be executed, the channel in the stack 1 is popped up, and the stack 2 records the output channel sequence; if the value range is satisfied, determining to execute push or pop operation according to the generated random number, and recording an output channel sequence by the stack 2 after each pop; judging whether the stack 1 is empty or not; if the stack 1 is not empty, all pops of the elements in the stack 1 are emptied; if stack 1 is empty, then the packet convolution operation is entered.

Further, the results of different neural networks under the traditional channel shuffling method of the shuffle net are tested to be baseline, then the improved novel channel shuffling method test based on stack shuffling is adopted to be compared, and the novel channel shuffling method based on stack shuffling can be seen from the table one to obtain good effect on precision improvement, and has improvement of about 0.12-0.32%.

List one

As shown in fig. 9, the novel channel shuffling device based on stack shuffling includes: the device comprises a conversion module, a first empty stack module, a second empty stack module and an operation module, wherein,

the conversion module 10 is configured to serialize the channels into corresponding numbers to obtain corresponding input channel sequences;

a first empty stack module 20 for storing a shuffle channel sequence;

a second empty stack module 30 for storing the output channel sequence;

and an operation module 40, configured to randomly select a push channel from the input channel sequence to the first empty stack, or select a pop channel from the first empty stack, empty all pops of elements in the first empty stack if the input channel sequence is empty, and receive a channel from the first empty stack by using the second empty stack, so as to form a shuffled channel sequence as an output.

The novel channel shuffling device based on stack shuffling comprises a conversion module, a first empty stack module, a second empty stack module and an operation module, wherein the conversion module is used for serializing channels into corresponding numbers to obtain corresponding input channel sequences; the first empty stack module is used for storing the shuffling channel sequence; the second empty stack module is used for storing the output channel sequence; and the operation module is used for randomly selecting a push channel from the input channel sequence to the first empty stack or a pop channel from the first empty stack, emptying all the pops of the elements in the first empty stack if the input channel sequence is empty, and receiving the channels from the pop of the first empty stack by using the second empty stack to form a shuffled channel sequence as output. Therefore, the problem that characteristic information cannot be fully fused in the existing channel shuffling method, and further, the precision loss of a neural network model is caused can be solved, the channel characteristics are uniformly and controllably shuffled and fused through the proposed channel shuffling method based on stack shuffling, the uncertainty of random shuffling in characteristic fusion is avoided, the problem that insufficient shuffling exists in the SheffeNet is avoided, the precision of model compression is improved, and meanwhile, the self-wanted shuffling mode can be designed according to the characteristics of the neural network, and the inter-group distribution and intra-group fusion of the characteristics are controlled.

In order to implement the above embodiment, the present application also proposes a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the novel stack shuffling based channel shuffling method of the above embodiment.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, the meaning of "plurality" means at least two, for example, two, three, etc., unless specifically defined otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and additional implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order from that shown or discussed, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.

Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.

It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. As with the other embodiments, if implemented in hardware, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.

The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like. While embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

Claims

1. The novel channel shuffling method based on stack shuffling is characterized by being applied to a neural network model compression algorithm for executing image classification, and comprises the following steps of:

serializing the channels into corresponding numbers to obtain corresponding input channel sequences;

giving a first empty stack as a channel shuffling place and a second empty stack as a storage place of an output channel;

randomly selecting a push channel from the input channel sequence to the first empty stack each time, or selecting a pop channel from the first empty stack, and clearing all pops of elements in the first empty stack if the input channel sequence is empty;

receiving channels coming out of the first empty stack pop by using the second empty stack, and forming a shuffled channel sequence as output;

after each pop channel from the first empty stack, the order of pop channels is recorded by the second empty stack.

2. The method of claim 1, wherein a reasonable value range of the number of channels stored in the first empty stack is set according to the number of channels included in the input channel sequence, if the number of channels stored in the first empty stack at the current time exceeds the reasonable value range, the channels in the first empty stack are randomly popped, and if the number of channels stored in the first empty stack at the current time is within the reasonable value range, a random number is generated to determine to execute a push or pop operation.

3. The method of claim 2, wherein the reasonable value range is determined by a threshold and a first ratio, the first ratio representing a ratio of a number of stacks pushed by each set of input channel sequences to a current number of outputs, the ratio representing an allocation rate of channels between different sets after shuffling, the higher the allocation rate indicating that the more the set of features is preserved for a next round of convolution operation, and vice versa.

4. A novel channel shuffling device based on stack shuffling is characterized by being applied to a neural network model compression algorithm for executing image classification, comprising a conversion module, a first empty stack module, a second empty stack module and an operation module, wherein,

the second empty stack module is used for storing the output channel sequence;

the operation module is configured to randomly select a push channel from the input channel sequence to the first empty stack, or select a pop channel from the first empty stack, empty all pops of elements in the first empty stack if the input channel sequence is empty, and receive a channel from the pop of the first empty stack by using the second empty stack, so as to form a shuffled channel sequence as output;

5. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the method according to any of claims 1-3.