CN113743582A

CN113743582A - Novel channel shuffling method and device based on stack shuffling

Info

Publication number: CN113743582A
Application number: CN202110902470.7A
Authority: CN
Inventors: 裴颂伟; 季语成
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2021-08-06
Filing date: 2021-08-06
Publication date: 2021-12-03
Anticipated expiration: 2041-08-06
Also published as: CN113743582B

Abstract

The application provides a novel channel shuffling method based on stack shuffling, and relates to the technical field of channel shuffling, wherein the method comprises the following steps: serializing the channel into corresponding numbers to obtain corresponding input channel sequences; giving a first empty stack as a place for channel shuffling, and taking a second empty stack as a storage place for an output channel; randomly selecting a push channel from the input channel sequence to the first empty stack or pop a channel from the first empty stack each time, and clearing all pop elements in the first empty stack if the input channel sequence is empty; and using the second empty stack to receive the channels output from the first empty stack pop, and forming a mixed channel sequence as output. The method and the device for realizing the shuffling fusion can be used for carrying out uniform and controllable shuffling fusion on channel characteristics more beneficially.

Description

Novel channel shuffling method and device based on stack shuffling

Technical Field

The application relates to the technical field of channel shuffling, in particular to a novel channel shuffling method and device based on stack shuffling.

Background

The CNN library originally proposed a convolutional layer concept based on random sparsification, in which a layer is subjected to random channel shuffling operation after being subjected to ordinary convolution, and since the layer is different in shuffling method for different neural networks, subsequent studies rarely utilize the method for shuffling. The strategy of the 'two-step-walking' convolution adopts the existing random shuffling method and is realized, namely, the channels are combined in a random scrambling mode, and the random channel shuffling brings model compression and acceleration effects, so that the method has strong limitation.

The shuffle network opens up a completely new channel shuffling technique that brings about a flow of information within a group. The technology is mainly divided into the following steps: the convolution channels are divided into g groups, each group has n channels, which can be represented by (g, n) element pairs, the channels are reshaped by taking the group number as a row number and the intra-group number as a column number, so as to obtain an arrangement shown in fig. 2, under the arrangement, 12 channels are divided into 3 groups, each group has 4 channels, 1-4 are a first group, 5-8 are a second group, 9-12 are a third group, and 1-12 are the original input sequence numbers of the channels. On this basis, the permutation matrix is transposed to obtain the permutation mode shown in fig. 3, which can be marked as (g, g); the transposed matrix takes columns as indexes, and obtains shuffled results according to the sequence of each column from top to bottom, which can represent (1,5,9) (2,6,10) (3,7,11) (4,8,12), after the matrix is flattened, a complete shuffled sequence (1,5,9,2,6,10,3,7,11,4,8,12) is formed, and a new grouping mode is obtained by inputting the shuffled sequence into the next layer of network, as shown in fig. 4; if the number of the packets of the new input layer needs to be changed, only the arrangement of the packet input matrix needs to be changed, and the transposition strategy is kept unchanged.

Zhang proposes a new shuffling method, firstly channels are numbered according to indexes, then learning is carried out on all possible channel arrangement spaces, channel dimensions are used as arrangement basis, a proper channel shuffling mode is selected through a mathematical formula, a certain effect is achieved on a ResNet network on a CIFAR-10 data set, and the promotion rate is very limited.

The existing channel shuffling technology is observed throughout, the precision loss of a model is large and is usually between 2% and 4% based on a random sparse convolution layer concept and a technology of a two-step-walking convolution strategy, meanwhile, noise caused by a random shuffling method is very obvious and is not beneficial to rapid convergence of the model, the defect of random shuffling is overcome by a brand-new channel shuffling technology developed by a ShuffleNet network, and the random shuffling technology is realized in a row-column transposition modeThe shuffling seems to be uniform, but actually has a great defect, for example, the head and tail channels do not change the original sequence in each shuffling. When a channel satisfies g-n, all channels on the diagonal will not change their original order, as shown in fig. 5. This would mean that the number of channels that do not change the original order is at [2, min (g, n) ]](g is more than or equal to 2, n is more than or equal to 2) and the ratio of the total number of the channels is

This provides the possibility of improving the accuracy performance of the model.

Although the method proposed by Zhang is novel, NAS search in the full permutation space still consumes a lot of time, and because parameters P and Q are set as the basis for inter-channel connectivity investigation, it is difficult to explain the improvement of the effect to some extent. On the other hand, the precision improvement is still very limited, and the precision improvement is only 0.11% in the ResNet-50 network compared with the ShuffleNet precision, which is not as good as the precision of the original model in the case of non-grouped convolution.

Disclosure of Invention

The present application is directed to solving, at least to some extent, one of the technical problems in the related art.

Therefore, a first object of the present application is to provide a novel channel shuffling method based on stack shuffling, which solves the problem that feature information in the existing channel shuffling method cannot be sufficiently fused, thereby causing the precision loss of a neural network model, and by the proposed channel shuffling method based on stack shuffling, uniform and controllable shuffling fusion of channel features is realized, thereby not only avoiding the uncertainty of feature fusion under random shuffling, but also avoiding the problem that insufficient shuffling still exists in shuffle, improving the precision of model compression, and simultaneously designing a desired shuffling mode according to the features of a neural network, and controlling inter-group distribution and intra-group fusion of features.

A second object of the present application is to propose a novel channel shuffling apparatus based on stack shuffling.

A third object of the present application is to propose a non-transitory computer-readable storage medium.

In order to achieve the above object, an embodiment of a first aspect of the present application provides a novel channel shuffling method based on stack shuffling, including: serializing the channel into corresponding numbers to obtain corresponding input channel sequences; giving a first empty stack as a place for channel shuffling, and taking a second empty stack as a storage place for an output channel; randomly selecting a push channel from the input channel sequence to the first empty stack or pop a channel from the first empty stack each time, and clearing all pop elements in the first empty stack if the input channel sequence is empty; and using the second empty stack to receive the channels output from the first empty stack pop, and forming a mixed channel sequence as output.

Optionally, in an embodiment of the present application, a reasonable value range of the number of channels stored in the first empty stack is set according to the number of channels included in the input channel sequence, if the number of channels stored in the first empty stack at the current time exceeds the reasonable value range, the channels in the first empty stack are randomly popped, and if the number of channels stored in the first empty stack at the current time is within the reasonable value range, a random number is generated to determine to execute a push or pop operation.

Optionally, in an embodiment of the present application, after each pop channel from the first empty stack, the second empty stack records the order of the pop channels.

Optionally, in an embodiment of the present application, the reasonable value range is determined by a threshold and a first ratio, where the first ratio represents a ratio between the number of input channel sequences pushed into the stack per group and the current output number, and the ratio represents a distribution ratio of channels between different groups after mixing, and a higher distribution ratio indicates that a next round of convolution operation retains more features of the group, and vice versa.

In order to achieve the above object, a second aspect of the present application provides a novel channel shuffling apparatus based on stack shuffling, including: a conversion module, a first empty stack module, a second empty stack module and an operation module, wherein,

the conversion module is used for serializing the channel into corresponding numbers to obtain corresponding input channel sequences;

the first empty stack module is used for storing the shuffle channel sequence;

the second empty stack module is used for storing the output channel sequence;

and the operation module is used for randomly selecting a push channel from the input channel sequence to the first empty stack or a pop channel from the first empty stack, emptying all pops of elements in the first empty stack if the input channel sequence is empty, receiving the channels from the first empty stack pop by using the second empty stack, and forming the mixed channel sequence as output.

To achieve the above object, a non-transitory computer-readable storage medium is provided in an embodiment of a third aspect of the present application, and when instructions in the storage medium are executed by a processor, a novel stack shuffle-based channel shuffle method can be performed.

The novel channel shuffling method based on stack shuffling, the novel channel shuffling device based on stack shuffling and the non-transitory computer readable storage medium solve the problem that feature information cannot be fully fused in the existing channel shuffling method, so that the precision loss of a neural network model is caused.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a flowchart of a novel channel shuffling method based on stack shuffling according to an embodiment of the present disclosure;

fig. 2 is a schematic diagram of a shuffleNet channel before shuffling according to an embodiment of the present application;

fig. 3 is a schematic diagram of a shuffleNet channel transpose operation according to an embodiment of the present application;

fig. 4 is a schematic diagram of a shuffleNet channel after shuffling according to an embodiment of the present application;

fig. 5 is a schematic diagram of channel shuffling when g is equal to n in the embodiment of the present application;

FIG. 6 is a diagram illustrating a novel stack shuffle-based channel shuffling method according to an embodiment of the present application;

FIG. 7 is a diagram illustrating the structure location of the stack shuffle-based channel shuffle method in the entire model compression according to an embodiment of the present application;

FIG. 8 is a flowchart of an algorithm of a novel stack shuffle-based channel shuffle method according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a novel channel shuffling device based on stack shuffling according to a second embodiment of the present application.

Detailed Description

Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.

The novel channel shuffling method and apparatus based on stack shuffling according to the embodiments of the present application will be described below with reference to the accompanying drawings.

Fig. 1 is a flowchart of a novel channel shuffling method based on stack shuffling according to an embodiment of the present application.

As shown in fig. 1, the novel channel shuffling method based on stack shuffling comprises the following steps:

step 101, serializing a channel into corresponding numbers to obtain a corresponding input channel sequence;

step 102, a first empty stack is given as a place for channel shuffling, and a second empty stack is given as a storage place for an output channel;

103, randomly selecting a push channel from the input channel sequence to the first empty stack or pop a channel from the first empty stack each time, and clearing all pop elements in the first empty stack if the input channel sequence is empty;

and 104, receiving the channels output from the first empty stack pop by using the second empty stack, and forming a mixed channel sequence as output.

According to the novel channel shuffling method based on stack shuffling, a channel is serialized into corresponding numbers, and a corresponding input channel sequence is obtained; giving a first empty stack as a place for channel shuffling, and taking a second empty stack as a storage place for an output channel; randomly selecting a push channel from the input channel sequence to the first empty stack or pop a channel from the first empty stack each time, and clearing all pop elements in the first empty stack if the input channel sequence is empty; and using the second empty stack to receive the channels output from the first empty stack pop, and forming a mixed channel sequence as output. Therefore, the problem that the feature information cannot be fully fused in the existing channel shuffling method, and further the precision loss of a neural network model is caused can be solved, the uniform and controllable shuffling fusion of the channel features is realized through the proposed channel shuffling method based on stack shuffling, the uncertainty of feature fusion under random shuffling is avoided, the problem that insufficient shuffling still exists in ShuffleNet is avoided, the model compression precision is improved, and meanwhile, a desired shuffling mode can be designed according to the characteristics of a neural network, and the inter-group distribution and the intra-group fusion of the features are controlled.

In order to ensure that the stack shuffling can be more sufficient and uniform, a threshold value t must be introduced, t represents the number of channel sequences stored in the stack 1, and if a channel push is pushed into the stack 1, t +1 is performed, otherwise, t-1 is performed, obviously, the number of t needs to be restricted, t represents a reasonable range of the number of channels stored in the stack 1, and the numbers larger than and smaller than the number are not allowed so as to ensure the uniformity requirement of the shuffling. Due to the requirement of the uniformity of the shuffling, each group has n channels, two groups of sequences with the same number cannot exist at the same time and completely enter a stack, the reasonable value range of t is [0,2n-2] according to the pigeon nest principle, if t exceeds the upper limit, pop operation must be executed, the channels in the stack 1 are popped, when the upper limit of t is not reached, the push or pop operation can be determined to be executed by a random number generated by a system, and after each pop, the order of the pop channels is recorded by the stack 2. When the input channel sequence is empty, i.e., there is no more new input, all pop elements in stack 1 are cleared.

On the basis of t, in order to make the shuffling controllable, a parameter r is introduced, wherein r represents the degree of channel/feature retention in a group when the channels are shuffled, the larger the shuffling rate is, the better the uniformity is, and the less the feature retention in the group is, so that the uniformity t needs to be adjusted by flexibly allocating the degree r. The influence of grouping convolution performance of a neural network under different shuffling uniformity is researched, if the shuffling is pursued more uniformly and better singly, the result is always single, in order to enable the shuffling effect to be more flexible, Gaussian distribution variation delta mu is introduced to be used as an adjustment degree r to act on t, and therefore the adjustment amplitude of t is not limited to [0,2 n-2%]The new constraint is

r records the ratio of the number of the push stacks of each group of channels to the current output number, the value reflects the distribution rate of the channels among different groups after the shuffling, if the r value is smaller, the channels after the shuffling are more uniform, namely the original channels in the channel group are less reserved; if the r value is larger, namely the distribution rate is higher, the shuffling is more uneven, namely the original channels in the channel group are reserved more, and the number of the disturbed channels is smaller. The parameters t and r may be combined with each other to control the stack shuffle process to achieve a target desired shuffle result.

Further, in the embodiment of the present application, a reasonable value range of the number of channels stored in the first empty stack is set according to the number of channels included in the input channel sequence, if the number of channels stored in the first empty stack at the current time exceeds the reasonable value range, the channels in the first empty stack are randomly popped, and if the number of channels stored in the first empty stack at the current time is within the reasonable value range, a random number is generated to determine to execute push or pop operation.

If the number of groups is g, the number of channels in each groupThe quantity is n, then the reasonable value range for t should be [0,2n-2]]If t exceeds the upper limit, pop operation must be executed to pop up the channel in the stack 1, and if t does not reach the upper limit of t, push or pop operation can be executed according to the generated random number, and after each pop, the order of the pop channel is recorded by the stack 2. If the input channel sequence is empty, all pop elements in stack 1 are cleared. The value of r can be adjusted in conjunction with t for a certain set of g₁If the r value is smaller, the retention of the features in the first groups of channels after the mixing is less, and the retention of the features in the last groups of channels is correspondingly more, so that the mixing effect with gradually increasing feature distribution is obtained.

Further, in the embodiment of the present application, after a channel is popped from the first empty stack each time, the order of the pop channels is recorded by the second empty stack.

Further, in this embodiment of the present application, the reasonable value range is determined by a threshold and a first ratio, where the first ratio represents a ratio between the number of input channel sequences pushed into the stack per group and the current output number, and the ratio represents a distribution ratio of channels between different groups after mixing, and a higher distribution ratio indicates that a next round of convolution operation retains more features of the group, and vice versa.

From the code implementation point of view, the specific operation is as follows: and (3) re-extracting component information of the four-dimensional tensor (W/H/K/C) according to each dimension to obtain (W/H/K/C) four components, and disordering the channel sequence in a stack shuffling mode by taking the channel (C) as an independent variable and taking t and r as control parameters. In order to ensure that different channels are evenly distributed in each group as much as possible, random push/pop operation is needed by taking the group as a unit, when the difference between the push operation times and the pop operation times is larger than t, output is carried out according to an algorithm with the largest output channel numbering sequence dictionary order, and otherwise, output is carried out according to a random form.

The channel shuffling can be applied to neural network model compression algorithms such as LeNet, AlexNet, ResNet18, ResNet50 and ResNet101, a network compression algorithm is usually used for a packet convolution strategy, packet convolution can cause the problem of poor information flow among groups, and the channel shuffling can just solve the problem. Compared with the traditional shuffling strategy, the stack shuffling can improve the precision (accuracy rate) of a compression model. For example, LeNet improves the recognition rate of handwritten numbers, AlexNet improves the image classification and image recognition capabilities, and ResNet improves the accuracy of image classification, image segmentation, and the like. Particularly, a stack shuffling strategy is adopted during compression of network models such as AlexNet/ResNet, and test results on a CIFAR-10 data set show that the precision of image classification is higher than that of an original model before the compression, and the precision is improved by about 0.13-0.32%.

Fig. 6 is a schematic diagram of a novel channel shuffling method based on stack shuffling according to an embodiment of the present application.

As shown in fig. 6, the new channel shuffling method based on stack shuffling gives two empty stacks, stack 1 is used as the place for channel shuffling, and one channel can be randomly selected to be pushed from the input channel sequence or one channel can be pop from stack 1 (ensuring that the stack is not empty) each time; the stack 2 serves as a storage place for the output channel, and receives the channel output by the stack 1pop each time.

FIG. 7 is a diagram illustrating the structure location of the whole model compression according to the novel stack shuffle-based channel shuffle method in the embodiment of the present application.

As shown in fig. 7, the model compression flow structure comprises an original model, channel shuffling and packet convolution, wherein the novel channel shuffling method based on stack shuffling is adapted to the step of channel shuffling in the model compression flow structure.

Fig. 8 is an algorithm flowchart of a novel stack shuffle-based channel shuffle method according to an embodiment of the present application.

As shown in fig. 8, the novel channel shuffling method based on stack shuffling includes: inputting a model to obtain an input channel sequence; initializing

stacks

1 and 2 to be empty, and transmitting a packet number g and a channel number n in each group; calculating a value t and setting a parameter r; judging whether the number of channels in the stack 1 meets the value [0,2n-2] of t; if the value range is exceeded, pop operation must be executed, the channel in the stack 1 is popped up, and the stack 2 records the output channel sequence; if the value range is met, the push or pop operation can be executed according to the generated random number, and after each pop operation, the stack 2 records an output channel sequence; judging whether the stack 1 is empty; if the stack 1 is not empty, clearing all pop of elements in the stack 1; if stack 1 is empty, then a packet convolution operation is entered.

Further, the result of testing different neural networks under the traditional channel shuffling method of ShuffleNet is used as baseline, and then the improved novel channel shuffling method based on stack shuffling is used for testing for comparison, so that the table I shows that the novel channel shuffling method based on stack shuffling has good effect on precision improvement, and the improvement is about 0.12-0.32%.

Watch 1

As shown in fig. 9, the novel channel shuffling apparatus based on stack shuffling includes: a conversion module, a first empty stack module, a second empty stack module and an operation module, wherein,

a conversion module 10, configured to serialize the channels into corresponding numbers to obtain corresponding input channel sequences;

a first unstack module 20 for storing a shuffle channel sequence;

the second empty stack module 30 is used for storing the output channel sequence;

and the operation module 40 is configured to randomly select one push channel from the input channel sequence to the first empty stack, or one pop channel from the first empty stack, empty all pops of elements in the first empty stack if the input channel sequence is empty, receive the channels output from the first empty stack with the second empty stack, and form a mixed channel sequence as an output.

The novel channel shuffling device based on stack shuffling in the embodiment of the application comprises a conversion module, a first empty stack module, a second empty stack module and an operation module, wherein the conversion module is used for serializing a channel into corresponding numbers to obtain a corresponding input channel sequence; the first empty stack module is used for storing the shuffle channel sequence; the second empty stack module is used for storing the output channel sequence; and the operation module is used for randomly selecting a push channel from the input channel sequence to the first empty stack or a pop channel from the first empty stack, emptying all pops of elements in the first empty stack if the input channel sequence is empty, receiving the channels from the first empty stack pop by using the second empty stack, and forming the mixed channel sequence as output. Therefore, the problem that the feature information cannot be fully fused in the existing channel shuffling method, and further the precision loss of a neural network model is caused can be solved, the uniform and controllable shuffling fusion of the channel features is realized through the proposed channel shuffling method based on stack shuffling, the uncertainty of feature fusion under random shuffling is avoided, the problem that insufficient shuffling still exists in ShuffleNet is avoided, the model compression precision is improved, and meanwhile, a desired shuffling mode can be designed according to the characteristics of a neural network, and the inter-group distribution and the intra-group fusion of the features are controlled.

In order to implement the above embodiments, the present application also proposes a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the novel stack shuffle-based channel shuffling method of the above embodiments.

In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims

1. A novel channel shuffling method based on stack shuffling is characterized by comprising the following steps:

serializing the channel into corresponding numbers to obtain corresponding input channel sequences;

giving a first empty stack as a place for channel shuffling, and taking a second empty stack as a storage place for an output channel;

randomly selecting a push channel from the input channel sequence to the first empty stack each time, or selecting a pop channel from the first empty stack, and clearing all pop elements in the first empty stack if the input channel sequence is empty;

and receiving the channels output from the first empty stack pop by using the second empty stack to form a mixed channel sequence as output.

2. The method according to claim 1, wherein a reasonable value range of the number of channels stored in the first empty stack is set according to the number of channels included in the input channel sequence, if the number of channels stored in the first empty stack at the current time exceeds the reasonable value range, the channels in the first empty stack are randomly popped, and if the number of channels stored in the first empty stack at the current time is within the reasonable value range, a random number is generated to determine to execute a push or pop operation.

3. The method of claim 1, wherein an order of pop channels is recorded by the second empty stack after each pop channel from the first empty stack.

4. The method of claim 3, wherein the reasonable value range is determined by a threshold and a first ratio, the first ratio represents a ratio of the number of input channel sequences pushed onto the stack per group to the current output number, the ratio represents an allocation rate of channels between different groups after the mixing, and a higher allocation rate indicates that a next convolution operation retains more features of the group, and vice versa.

5. The novel channel shuffling device based on stack shuffling is characterized by comprising a conversion module, a first empty stack module, a second empty stack module and an operation module, wherein,

the first empty stack module is used for storing a shuffle channel sequence;

the second empty stack module is used for storing an output channel sequence;

the operation module is configured to randomly select a push channel from the input channel sequence to the first empty stack, or a pop channel from the first empty stack, if the input channel sequence is empty, clear all pops of elements in the first empty stack, and use the second empty stack to receive the channels output from the first empty stack, so as to form a mixed channel sequence as an output.

6. A non-transitory computer-readable storage medium having stored thereon a computer program, wherein the computer program, when executed by a processor, implements the method of any one of claims 1-4.