CN112183711B

CN112183711B - Calculation method and system of convolutional neural network using pixel channel scrambling

Info

Publication number: CN112183711B
Application number: CN201910586166.9A
Authority: CN
Inventors: 吴俊樟; 陈世泽
Original assignee: Realtek Semiconductor Corp
Current assignee: Realtek Semiconductor Corp
Priority date: 2019-07-01
Filing date: 2019-07-01
Publication date: 2023-09-12
Anticipated expiration: 2039-07-01
Also published as: CN112183711A

Abstract

In the calculation method, an operation system is used for receiving original input values, before convolution operation is carried out, pixel scrambling is carried out on the original input values, the original input values are separated into a plurality of groups of values, the dimensionality of each group of values is reduced, channel scrambling is carried out on the plurality of groups of values, the values participating in the convolution operation are selected to form a plurality of groups of new input values, the dimensionality of the input values can be effectively reduced by discarding unselected values, then a convolution kernel is set, and then convolution operation is carried out on the convolution kernel and the plurality of groups of new input values through a multiplication accumulator, so that a plurality of groups of output values are formed.

Description

Calculation method and system of convolutional neural network using pixel channel scrambling

Technical Field

The present application relates to a data processing technology using convolutional neural network, and more particularly, to a method and system for operating convolutional neural network, which reduces the amount of computation and the storage space by the pre-operation of scrambling pixels and channels while maintaining the accuracy of recognition.

Background

In the field of artificial intelligence (Artificial Intelligence, AI), a machine learning (machine learning) technology is applied, and in machine learning, a convolutional neural network (Convolutional Neural Network, abbreviated as CNN) is a feed-forward neural network, and particularly, the technology can be applied in the field of image processing, in particular, image recognition, object detection, image segmentation and other processes.

The development of convolutional neural network models and algorithms has been very advanced in recent years, however, although convolutional neural networks have high accuracy in image feature extraction and recognition, the convolutional neural networks are difficult to implement in hardware because of the large calculation amount and the layer-by-layer operation characteristic.

In recent years, various researches have proposed a neural network suitable for hardware computation, such as a depth-separable convolution (depth-wise separable convolution) and a transition convolution (Shift Convolution) of Mobile Net, and various developments have been made to reduce the computation amount and the storage space of a model while maintaining the original accuracy.

Because the model operation based on the convolutional neural network is very large, the prior art is known to operate by a cloud server or a computer host, and if the model operation is applied to an artificial intelligence internet of things (AIOT) product, the image data can be transmitted to the cloud server for operation, so that the problem of large operation amount is solved.

In order to simultaneously maintain accuracy and reduce the number of model parameters and the amount of computation, the convolution operation in the prior art, such as squezenet (2016), can be not changed, but the original larger convolution kernel (convolution kernel) is disassembled into a plurality of modules so as to reduce the parameter storage. The prior art, such as MobileNet v1 (2017) and MobileNet v2 (2018), approximates the convolution operation of original k by a depth-separable convolution (depth-wise separable convolution) module, which is a depth-wise convolution followed by a point-wise convolution (point-wise convolution). Further, as in the prior art shift net (2018), shift-convolution (shift-convolution) is used instead of depth convolution, so that more parameter storage amounts and convolution operation amounts can be reduced.

Disclosure of Invention

The application discloses a calculation method and a system using a convolution neural network with pixel channel scrambling, wherein before convolution operation is executed, the front operation of pixel scrambling (pixel scrambling) and channel scrambling (channel scrambling) is executed on input values, so that the dimension of the height, width and depth of the input values can be reduced, and the calculation amount and the memory usage amount of the system are reduced under the condition that the parameter amounts are the same.

According to an embodiment, a method for calculating a convolutional neural network using pixel channel scrambling includes receiving an original input value, which may be image data, having a height, a width, and a first number of depth values, with an operation system, performing a pixel scrambling on the original input value with a processor of the operation system, and separating the original input value into a plurality of sets of values to reduce dimensions of the sets of values; and then, a channel scrambling is carried out on the values, the values participating in convolution operation can be selected from a plurality of groups of values respectively, a plurality of groups of new input values are formed, and the values are temporarily stored in a memory.

Next, a plurality of sets of convolution kernels corresponding to the new input values are set, and according to an embodiment, a second number of convolution kernels may be included, each convolution kernel verifying a filter. Then, a convolution operation is performed with a second number of convolution kernels and a plurality of new sets of input values by a multiply-accumulator in the processor to form a plurality of sets of output values having the second number.

Wherein, when the original input value has a value of a first number of depths, a plurality of sets of new input values smaller than the first number of depths can be formed once the pixel scrambling and the channel scrambling are performed.

Preferably, the original input values are image data, and after convolution operation is performed by the operation system to extract image features, a plurality of sets of feature maps with the second number of depths can be formed. And the image characteristic map with the second number of depths can be formed by performing inverse pixel scrambling operation on the plurality of groups of output values with the second number.

Preferably, the generated image feature map is used to identify the original input value.

Preferably, the height, width and depth of each convolution kernel performing the convolution operation are any positive integer.

According to an embodiment of a system for performing a method for performing a convolutional neural network using pixel channel scrambling, the system comprises a processor, and a communication circuit and a memory electrically connected to the processor, wherein the method for performing a convolutional neural network using pixel channel scrambling is performed.

Further, the computing system may form a cloud system for providing services for performing image recognition using the computing method of the convolutional neural network with pixel channel scrambling.

Furthermore, the algorithm can also realize an independent circuit system, which is suitable for a specific system to execute image recognition by using the algorithm of the convolutional neural network with the scrambled pixel channels.

For a further understanding of the nature and the technical aspects of the present application, reference should be made to the following detailed description of the application and the accompanying drawings, which are provided for purposes of reference only and are not intended to limit the application.

Drawings

FIGS. 1 (A) -1 (C) are diagrams showing a point-wise convolution operation;

FIG. 2 shows a schematic diagram of a convolution operation of a filter with a location in an input value;

FIG. 3 is a schematic diagram of an exemplary algorithm of convolutional neural network using pixel channel scrambling;

FIG. 4 is a flowchart of an embodiment of a method of calculating a convolutional neural network using pixel channel scrambling;

FIG. 5 shows a schematic diagram of an embodiment of an computing system implementing a method of computing a convolutional neural network using pixel channel scrambling;

FIGS. 6 (A) -6 (C) are diagrams showing embodiments of performing pixel scrambling using an algorithm of a convolutional neural network for pixel channel scrambling;

FIGS. 7 (A) -7 (C) are diagrams showing embodiments of a method of performing channel scrambling using a convolutional neural network for pixel channel scrambling;

FIGS. 8 (A) -8 (C) are diagrams showing embodiments of a method of performing convolution operations using a convolutional neural network with pixel channel scrambling;

fig. 9 (a) and 9 (B) are diagrams showing an embodiment of inverse pixel scrambling using an algorithm of a convolutional neural network of pixel channel scrambling.

Symbol description

First quantity C1

Second quantity C2

Third quantity C1'

Screen 20

Input value 22

Output value 24

Screen 30

Screener numbers 301-316

Input values a, b, c, d

Values a1 to a4, b1 to b4, c1 to c4, d1 to d4

Algorithm 50

Processor 501

Communication circuit 505

Memory 503

Network 52

Terminal 511,512,513

First group of input values I_A

Second group of input values I_B

Third group of input values I_C

Fourth group of input values I_D

A first group of input values I_A'

A second set of input values I_B'

Third group of input values I_C'

Fourth group of input values I_D'

First group of output values O_A

Second set of output values O_B

Third set of output values O_C

Fourth set of output values O_D

Calculation flow of convolutional neural network using pixel channel scrambling in steps S401-S413

S401 obtaining an original input value (H W C1)

S403 performs pixel scrambling (H/2*W/2C 1)

S405 performs channel scrambling (I_ A, I _ B, I _ C, I _D)

S407 is discarded to form a plurality of sets of input values (I_A ', I_B', I_C ', I_D')

S409 set convolution kernel (C2 filters)

S411 performs a convolution operation (H/2*W/2C 2)

S413 performs inverse pixel scrambling (h×w×c2)

Detailed Description

The following embodiments of the present application are described in terms of specific examples, and those skilled in the art will appreciate the advantages and effects of the present application from the disclosure herein. The application is capable of other and different embodiments and its several details are capable of modifications and various other uses and applications, all of which are obvious from the description, without departing from the spirit of the application. It is to be noted that the drawings of the present application are merely schematic illustrations, and are not drawn to actual dimensions. The following embodiments will further illustrate the related art content of the present application in detail, but the disclosure is not intended to limit the scope of the present application.

It will be understood that, although the terms "first," "second," "third," etc. may be used herein to describe various elements or signals, these elements or signals should not be limited by these terms. These terms are used primarily to distinguish one element from another element or signal from another signal. In addition, the term "or" as used herein shall include any one or combination of more of the associated listed items as the case may be.

The Convolutional Neural Network (CNN) has great achievements in image recognition application, and an image processing method based on the convolutional neural network is developed successively, however, in the fully-connected neural network, each neuron between two adjacent layers is connected with each other, when the feature dimension of an input layer becomes very high, the parameters required to be trained by the neural network are very large, and the operation amount is also very large, so that the development of the convolutional neural network is divided into two aspects, namely, the accuracy is further improved, and the operation of a network model is compressed and accelerated.

Because the model operation amount based on the convolutional neural network is very large, the disclosed algorithm method and system using the convolutional neural network (pixel-channel shuffle convolution neural network) with the scrambled pixel channels are provided, the purpose of the algorithm method and system is to simultaneously maintain the accuracy, and to reduce the model parameter amount and operation amount, the method uses a depth-separable convolutional module as the depth convolutional (depth-wise convolutional), and replaces the existing point-wise convolutional (point-wise convolutional) with the disclosed convolutional neural network with the scrambled pixel channels, so that the calculation amount can be reduced, for example, experiments under specific environments can be performed, and the result is that the reducible calculation amount and the memory use amount become one fourth of the traditional point-wise convolutional.

Taking the image recognition and detection function as an example, the embodiment of the calculation method using the convolutional neural network with the pixel channel scrambling according to the present application can reduce the channel calculation using the feature map (feature map) in the Convolutional Neural Network (CNN) and combine the pixel scrambling (pixel scrambling) and the channel scrambling (channel scrambling) to reduce the hardware calculation, including reducing the size of the memory.

A schematic diagram of a point-wise convolution operation described with reference to fig. 1 (a) -1 (C).

The input layer shown in fig. 1 (a) is a point-wise convolution operation, and is shown as a cube, wherein a first layer input value represented by the designation a, b, C, d is shown, and is shown as a cube formed of a height (H), a width (W), and a depth (C1), wherein the depth (C1) represents the number of convolution kernels (first number C1) of the input layer.

Fig. 1 (B) schematically shows a 1*1 filter implemented by a convolution kernel, which in this example shows a second number of C2 filters, where the convolution operation is performed by scanning the filters one by one at a step (stride) setting with the previous input layer (fig. 1 (a)) of the filter, and in the process, multiplying and adding the filters together to obtain the output value as shown in fig. 1 (C).

The output layer shown in fig. 1 (C) is a cube formed by a high (H), a wide (W) and a deep (C2), and the depth (C2) is the same number of feature maps generated by comparing the number of filters (the second number C2), so as to display the number of output values, where h×w×c2 represents the magnitude of the output values.

The convolution kernel realizes a screening mechanism, as shown in fig. 1 (B), each parameter in the convolution kernel is equivalent to a weight parameter in the neural network and is connected with a corresponding local pixel, and for example, the moving window scanning calculation is to multiply each parameter of the convolution kernel with a corresponding local pixel value one by one, and finally sum the parameters to obtain a result on the convolution layer. Features in the image can be extracted using convolution kernels and feature mapping (mapping) can be performed.

For example, when the input value (input data) and a filter are convolved, as shown in fig. 1 (B), the size of the filter is 1*1, the depth is 16 (the first number C1), and the output feature map (feature map) is an output value with a size of h×w×1 after multiplying the input value by the filter (1×1×16). Similarly, when C2 filters are proposed (fig. 1 (B)), C2 feature maps are generated, and the feature maps are combined to form a cube as shown in fig. 1 (C). That is, the input value and the filter are convolved to form the output layer pattern of fig. 1 (C), and the size after combination is h×w×c2, that is, the size of the output value (output data).

According to the convolution operation, there is a number (second number C2, as shown in fig. 1 (B)) of filters (convolution kernel), each filter has a number of values (first number C1, as shown in fig. 1 (B)) and the same number of values (first number C1, as shown in fig. 1 (a)), as shown in fig. 16, at each position in the input value, respectively, for multiplication, and finally taking the sum, the second number C2 of filters forms a second number C2 of feature maps through the convolution operation, and the feature maps with the size h×w×c2 as shown in fig. 1 (C) are combined to form the output value of the convolution operation.

Fig. 2 shows a schematic diagram of a convolution operation of a filter with one of the input values, shown as a filter 20, having a first number C1 of values, which may be any value as desired, in this example shown as 16, and a convolution operation of the input value 22, shown schematically as a-position in fig. 1 (a), having a first number C1 of values, shown as 16. According to an embodiment of the algorithm using the convolutional neural network with the pixel channel scrambling, the concept is that each position in the input value 22 is not multiplied by all the values of the first number C1 in the filter 20, but is multiplied according to a specific rule, so that the output value 24 is generated by multiplying different geometric positions in the input value 22 and different values of the filter 20, and as a result, the operation amount can be reduced.

With continued reference to FIG. 3, an exemplary algorithm using a convolutional neural network with pixel channel scrambling is shown, which illustrates a2 x 2 input value, each having a first number C1, as shown, of 16, in this example, each block is numbered a, b, C, d in order, which may be referred to as a pixel in the input image to be processed by the system. The convolution operation is provided with a filter 30, wherein the number of layers is denoted by filter numbers 301 to 316, and in order to reduce the operation amount, the filter 30 sets a multiplication and summation rule according to a requirement for reducing the operation amount.

In this example, the input values are 2×2 blocks, and in the embodiment, 4 consecutive pixels are used as a group, so that the filter 30 can be set to 4 groups at intervals, wherein the filter numbers 301, 305, 309 and 313 are shown as a first group of filters, the numbers 302, 306, 310 and 314 are shown as a second group of filters, the numbers 303, 307, 311 and 315 are shown as a third group of filters, the numbers 304, 308, 312 and 316 are shown as a fourth group of filters, and the convolution operation is performed with the input values of a, b, c, d respectively according to the sequence instead of the convolution operation with all the input values, so that the operation amount can be reduced. The grouping rules of the filters are stored in a memory of the system.

For example, the system extracts a first number C1 (16 in this example) of the input values a according to a rule (e.g., 4 intervals), and then takes out the values a1, a2, a3, and a4 to form a first set of input values (i_a), and the input values selected according to the rule are registered in the memory of the system and convolved with the first set of filters (filter numbers 301, 305, 309, and 313). The unselected values in the input value a are discarded, so that the operand is effectively reduced, which is shown to be reduced to one fourth of the original operand. Wherein, for a first set of input values (i_a), the convolution operation will multiply-add with the corresponding position filter (first set of filters): (value a1 multiplied by number 301 filter) + (value a2 multiplied by number 301 filter) + (value a3 multiplied by number 301 filter) + (value a4 multiplied by number 301 filter) =first output value; (value a1 multiplied by number 305 filter) + (value a2 multiplied by number 305 filter) + (value a3 multiplied by number 305 filter) + (value a4 multiplied by number 305 filter) =second output value; (value a1 multiplied by number 309 filter) + (value a2 multiplied by number 309 filter) + (value a3 multiplied by number 309 filter) + (value a4 multiplied by number 309 filter) =third output value; (value a1 multiplied by number 313 filter) + (value a2 multiplied by number 313 filter) + (value a3 multiplied by number 313 filter) + (value a4 multiplied by number 313 filter) =fourth output value. The first output value, the second output value, the third output value and the fourth output value obtained by convolution operation of the input value a form a first group of output values (O_A).

Similarly, the system takes a first number C1 (16 in this example) of the input values B out of the values B1, B2, B3, and B4 according to a rule (e.g., at intervals of 4), forms a second set of input values (i_b), and temporarily stores the selected input values in the memory of the system, and then performs a convolution operation with a second set of filters (filter numbers 302, 306, 310, and 314). Likewise, unselected ones of the input values b will be discarded. Wherein, for a second set of input values (i_b), the convolution operation will multiply-add with the corresponding position filter (second set of filters): (value b1 multiplied by number 302 filter) + (value b2 multiplied by number 302 filter) + (value b3 multiplied by number 302 filter) + (value b4 multiplied by number 302 filter) =first output value; (value b1 multiplied by number 306 filter) + (value b2 multiplied by number 306 filter) + (value b3 multiplied by number 306 filter) + (value b4 multiplied by number 306 filter) =second output value; (value b1 multiplied by number 310 filter) + (value b2 multiplied by number 310 filter) + (value b3 multiplied by number 310 filter) + (value b4 multiplied by number 310 filter) =third output value; (value b1 multiplied by number 314 filter) + (value b2 multiplied by number 314 filter) + (value b3 multiplied by number 314 filter) + (value b4 multiplied by number 314 filter) =fourth output value. The first output value, the second output value, the third output value and the fourth output value obtained by convolution operation of the input value B form a second group of output values (O_B).

Similarly, the system takes a first number C1 (16 in this example) of the input values C out of the first number C1, C2, C3 and C4 according to a rule (e.g., at intervals of 4), forms a third set of input values (i_c), and temporarily stores the selected input values in the memory of the system, and then performs a convolution operation with a third set of filters (filter numbers 303, 307, 311 and 315). The unselected ones of the input values c will be discarded. Wherein, for a third set of input values (i_c), the convolution operation will multiply-add with the corresponding position filter (third set of filters): (value c1 multiplied by number 303 filter) + (value c2 multiplied by number 303 filter) + (value c3 multiplied by number 303 filter) + (value c4 multiplied by number 303 filter) =first output value; (value c1 multiplied by number 307 filter) + (value c2 multiplied by number 307 filter) + (value c3 multiplied by number 307 filter) + (value c4 multiplied by number 307 filter) =second output value; (value c1 multiplied by number 311 filter) + (value c2 multiplied by number 311 filter) + (value c3 multiplied by number 311 filter) + (value c4 multiplied by number 311 filter) =third output value; (value c1 multiplied by number 315 filter) + (value c2 multiplied by number 315 filter) + (value c3 multiplied by number 315 filter) + (value c4 multiplied by number 315 filter) =fourth output value. The first output value, the second output value, the third output value and the fourth output value obtained by convolution operation of the input value C form a third group of output values (O_C).

Similarly, the system takes a first number C1 (16 in this example) of the input values D out of the values D1, D2, D3, and D4 according to a rule (e.g., at intervals of 4), forms a fourth set of input values (i_d), and temporarily stores the selected input values in the memory of the system, and then performs a convolution operation with a fourth set of filters (filter numbers 304, 308, 312, and 316). The unselected ones of the input values d will be discarded. Wherein, for the fourth set of input values (i_d), the convolution operation will multiply-add with the corresponding position filter (fourth set of filters): (value d1 multiplied by number 304 filter) + (value d2 multiplied by number 304 filter) + (value d3 multiplied by number 304 filter) + (value d4 multiplied by number 304 filter) =first output value; (value d1 multiplied by number 308 filter) + (value d2 multiplied by number 308 filter) + (value d3 multiplied by number 308 filter) + (value d4 multiplied by number 308 filter) =second output value; (value d1 multiplied by number 312 filter) + (value d2 multiplied by number 312 filter) + (value d3 multiplied by number 312 filter) + (value d4 multiplied by number 312 filter) =third output value; (value d1 multiplied by number 316 filter) + (value d2 multiplied by number 316 filter) + (value d3 multiplied by number 316 filter) + (value d4 multiplied by number 316 filter) =fourth output value. The first output value, the second output value, the third output value and the fourth output value obtained by convolution operation of the input value c form a fourth group of output values (O_D).

The example of fig. 3 shows that the input values (a, B, C, D) selected according to the specific rule form a first set of input values (i_a), a second set of input values (i_b), a third set of input values (i_c) and a fourth set of input values (i_d), and then the selected input values are convolved to form output values of the first set of output values (o_a), the second set of output values (o_b), the third set of output values (o_c) and the fourth set of output values (o_d).

According to the above example, it is known that when the convolution operation is performed, not all the input values are operated, and the values other than the values participating in the convolution operation are selected according to the specific rule from the input values, so that the calculation amount can be effectively reduced.

According to the foregoing exemplary concept, the method for calculating a convolutional neural network using pixel channel scrambling according to the present application is different from the conventional point-by-point convolution operation, and the method breaks down the point-by-point convolution operation applied therein into a plurality of operations including pixel scrambling (pixel scrambling), channel scrambling (channel scrambling), point-by-point convolution operation (point-wise scrambling) and inverse pixel scrambling (inverse pixel shuffle).

As described in the steps of fig. 4 and the operation system of fig. 5, the system for executing the operation method using the convolutional neural network with the scrambled pixel channels may be an operation system 50 for performing image processing, where the operation system 50 is provided with a processor 501, a communication circuit 505 and a memory 503, and the circuit elements are electrically connected, and the steps of pixel scrambling, channel scrambling operation, convolution operation and subsequent reverse pixel scrambling in the operation method are performed by the operation capability of the processor 501, and in particular, the convolution operation is performed by a multiply-accumulate (multiply-accumulate) for performing multiply-add operation in the processor 501. The calculation amount can be effectively reduced by the calculation method, so that the corresponding hardware requirements, such as a multiplication accumulator and a memory, are also effectively reduced.

It should be noted that, the computing system 50 may be a cloud system in addition to a general computer system, and may be configured to receive the image data transmitted by each terminal 511,512,513 through the network 52, and provide a service for performing image recognition by using the computing method of the convolutional neural network with the pixel channel scrambling. In another embodiment, the algorithm 50 may also implement a separate circuit system, such as an Integrated Circuit (IC), suitable for use in a particular system to perform image recognition using the algorithm of the convolutional neural network with pixel channel scrambling.

According to one embodiment, the algorithm 50 processes an input image for image recognition purposes, and the convolutional neural network algorithm can obtain a feature of a graph from pixels in the image, wherein the feature covers the correlation between pixels in addition to each pixel. The flow of the related method can be described by the following description of the drawings, such as the exemplary diagrams shown in fig. 6 (a) to 9 (B), and the flow chart of the algorithm using the convolutional neural network with the scrambled pixel channels described in fig. 4 is referred to, in particular, it can be understood by these examples why the algorithm using the convolutional neural network with the scrambled pixel channels can reduce the calculation amount and approach the result of the huge convolution calculation amount originally.

Fig. 6 (a) -6 (C) show an embodiment of performing pixel scrambling in an algorithm using a convolutional neural network for pixel channel scrambling.

Fig. 6 (a) shows an original input value represented by a cube of height (H), width (W) and depth (a first number C1, e.g., 16), and 4 sets of input values, each set having the value of the first number C1, are arranged to be represented by input values a, b, C, d. In step S401 of fig. 4, the algorithm receives an original input value with a size of h×w×c1, wherein the original input value may be image data having a height, a width and a depth.

The algorithm performs a scrambling (scrambling) operation on the original input values, such as pixel scrambling (pixel scrambling), and in step S403 of fig. 4, the original input values are separated into a plurality of sets of values according to the requirement by the processor of the algorithm, so that each set of values has a reduced dimension with respect to the original input values, and the height and width can be reduced, but the depth may not be changed. As shown in fig. 6 (B), the original input values are separated into 4 groups, forming 4 groups of cubes of height (H/2), width (W/2) and depth (first number C1), each of which has a halved height and width. The representation may then be as shown in fig. 6 (C), wherein a first set of input values (i_a), a second set of input values (i_b), a third set of input values (i_c) and a fourth set of input values (i_d) of the first number C1 (feature map) are displayed. The data generated in the operation procedure can be temporarily stored in the memory to wait for the next step to fetch the data.

Next, as shown in step S405 of fig. 4, the algorithm performs channel scrambling (channel scrambling) on the plurality of sets of values formed by the pixel scrambling, and the algorithm using the convolutional neural network of the pixel channel scrambling is shown in fig. 7 (a) -7 (C).

FIG. 7 (A) shows the input values with reduced height and width to half after the pixel scrambling procedure, and then the values participating in the convolution operation are selected from each of the multiple sets of values according to a rule to form multiple new input values, which can be temporarily stored in the memory of the system. The figure shows that in the first set of input values (i_a) the values are selected at intervals according to the design of the filter, for example one for every 4 values, i.e. a feature map is taken in which 4k+1, where k is 0,1,2,3. Similarly, one of the values is selected from the second set of input values (i_b) for every 4 values, i.e., a feature map is taken in which 4k+2 sheets are taken, where k is 0,1,2,3; selecting one of the values from every 4 values in the third set of input values (i_c), i.e. taking a feature map of 4k+3, where k is 0,1,2,3; and selecting one from every 4 values in the fourth set of input values (I_D), namely taking a characteristic diagram of 4k+4, wherein k is 0,1,2 and 3.

Fig. 7 (B) shows that the values selected from the first set of input values (i_a), the second set of input values (i_b), the third set of input values (i_c), the fourth set of input values (i_d) are rearranged in the previous layers.

Fig. 7 (C) shows that the input values of the first set of input values (i_a), the second set of input values (i_b), the third set of input values (i_c) and the fourth set of input values (i_d) are discarded, and the input values of each set of input values originally having 16 values (feature images) are reduced to 4 values, and the feature image number is reduced (by one fourth) after the channel scrambling procedure, so as to form new first set of input values (i_a '), second set of input values (i_b'), third set of input values (i_c ') and fourth set of input values (i_d'). In step S407 of fig. 4, the operation system performs channel scrambling, discards the unselected values, and then the value depth dimension is reduced to a third number C1', and after pixel scrambling and channel scrambling, sets of new input values (depth is the third number C1') smaller than the first number C1 depth are formed, which is shown as a quarter (e.g. 4) of the original data, relative to the original input values (depth is the first number C1), to form sets of input values (i_a ', i_b', i_c ', i_d') participating in the convolution operation.

Fig. 8 (a) -8 (C) are diagrams next showing an embodiment of performing convolution operations in an algorithm of a convolutional neural network using pixel channel scrambling.

Fig. 8 (a) shows that the first set of input values (i_a '), the second set of input values (i_b '), the third set of input values (i_c ') and the fourth set of input values (i_d '), each of which has been reduced in number by pixel scrambling and channel scrambling, are reduced in depth to a third number C1', and the filters (depth corresponds to the third number C1 ') implemented by the four sets of convolution kernels shown in fig. 8 (B) respectively perform convolution operations to set a convolution kernel (filter) corresponding to each new input value, and a second number (C2) of filters (1×1×4) are set as required, each set of filters is removed from the original filter according to rules, and the convolution kernel depth is only one fourth (third number C1 '). As shown in step S409 of fig. 4, the corresponding convolution kernels are set according to the multiple sets of input values (i_a ', i_b', i_c ', i_d') participating in the convolution operation, and the depth is also reduced, so as to implement the second number C2 of filters.

In step S411 of fig. 4, the convolution operation is performed by the multiply-accumulate device of the processor in the operation system with the convolution kernel set in step S409 and a plurality of new input values, as shown in fig. 8 (C), and four output values are generated after the convolution operation, which are the first set of output values (o_a), the second set of output values (o_b), the third set of output values (o_c) and the fourth set of output values (o_d), respectively. The height and width of each group of output values are H/2 and W/2 respectively, and the depth is the second number C2 of convolution kernels. The plurality of sets of output values are extracted from the original input values (e.g., image data), such as a plurality of feature maps having a second number of C2 depths. Similarly, the feature maps may be temporarily stored in the memory.

After the system completes the convolution operation to generate the sets of output values, the next steps are shown in fig. 9 (a) and 9 (B) to perform an operation as inverse pixel scrambling (inverse pixel shuffle), as shown in step S413 of fig. 4. Fig. 9 (a) shows a first set of output values (o_a), a second set of output values (o_b), a third set of output values (o_c) and a fourth set of output values (o_d) obtained from the memory by the system, wherein each set of output values respectively covers the components of each classification value (a, B, C, D) in the original input values, and each set of output values can be inversely combined into a final output value with a height H, a width W and a depth of a second number C2 according to the original designed input value number sequence a, B, C, D, that is, an image feature diagram of the original input image data is obtained by the calculation method of the convolutional neural network by performing the pixel channel scrambling by the calculation system, as shown in fig. 9 (B). It should be noted that the image feature map is an image feature extracted by the algorithm according to the original input value (image data), that is, the image feature map generated finally, and can provide a specific system for identifying the original image.

It should be noted that, according to an embodiment of the algorithm using the convolutional neural network with the pixel channel scrambling, the input value applied to the pixel scrambling can be adjusted according to the requirement, the convolution kernel for performing the convolution operation can be changed arbitrarily according to the requirement, and the height, width and depth can be any positive integer. The values are the same before and after the final output value and the initial input value, and the parameter amounts are the same, but the number of multiply-accumulator (multiply-accumulator) needed by the system to perform the multiply-add operation is lower.

In summary, according to the embodiments of the method and system for computing a convolutional neural network using pixel channel scrambling, in data processing technology using the convolutional neural network, the computation amount and the storage space are reduced by the pre-processing of pixel scrambling (pixel scrambling) and channel scrambling (channel scrambling), while the accuracy of the convolutional computation is still maintained.

The above disclosure is only a preferred embodiment of the present application and is not intended to limit the claims of the present application, so that all equivalent technical changes made by the application of the specification and the drawings of the present application are included in the claims of the present application.

Claims

1. A method of computing a convolutional neural network using pixel channel scrambling, comprising:

receiving an original input value by an operation system, wherein the original input value is a value with a height, a width and a depth;

performing, by a processor of the computing system, a pixel scrambling on the original input value, and separating the original input value into a plurality of groups of values to reduce dimensions of the groups of values;

executing a channel scrambling on the multiple groups of values by the processor, and respectively selecting the values participating in a convolution operation from the multiple groups of values to form multiple groups of new input values;

setting convolution kernels corresponding to the multiple groups of new input values, wherein the convolution kernels comprise a second number of convolution kernels, and each convolution kernel realizes a filter; and

the convolution operation is performed with the second number of convolution kernels and the plurality of sets of new input values by a multiply-accumulator in the processor to form a plurality of sets of output values having the second number.

2. The method of claim 1, wherein the original input value is an image data, and the computing system performs the convolution operation to extract image features to form a feature map having a plurality of sets of depths with a second number.

3. The method of claim 2, wherein the second number of the plurality of sets of output values is processed by a reverse pixel scrambling algorithm to form an image feature map having the second number of depths.

4. The method of claim 1, wherein the new input values are formed by the channel scrambling, and values of the sets of values not selected to participate in the convolution operation are further discarded.

5. The method of claim 1, wherein the original input values have values of a first number of depths, and the pixel scrambling and the channel scrambling form the plurality of new input values of less than the first number of depths.

6. The method of claim 5, wherein the original input value is an image data, and the computing system performs the convolution operation to extract image features to form a feature map having a plurality of sets of depths with a second number.

7. The method of claim 6, wherein the second number of the plurality of sets of output values is processed by a reverse pixel scrambling algorithm to form an image feature map having the second number of depths.

8. The method of claim 7, wherein the image feature map is used to identify the original input values.

9. The algorithm using a convolutional neural network of pixel channel scrambling of any of claims 1-8, wherein the height, width and depth of each convolution kernel performing the convolution operation is any positive integer.

10. An algorithm using a convolutional neural network of pixel channel scrambling, comprising:

a processor, and a communication circuit and a memory electrically connected with the processor;

wherein, the algorithm of the convolution neural network using the pixel channel scrambling is executed by the processor, comprising:

receiving an original input value, wherein the original input value is a value with a height, a width and a depth;

performing a pixel scrambling on the original input value, separating the original input value into a plurality of sets of values to reduce the dimension of each set of values;

performing a channel scrambling on the plurality of sets of values, and respectively selecting values participating in a convolution operation from the plurality of sets of values to form a plurality of sets of new input values;