WO2017216976A1 - Information processing method and device for neural network - Google Patents

Information processing method and device for neural network Download PDF

Info

Publication number
WO2017216976A1
WO2017216976A1 PCT/JP2016/068741 JP2016068741W WO2017216976A1 WO 2017216976 A1 WO2017216976 A1 WO 2017216976A1 JP 2016068741 W JP2016068741 W JP 2016068741W WO 2017216976 A1 WO2017216976 A1 WO 2017216976A1
Authority
WO
WIPO (PCT)
Prior art keywords
neural network
layer
activation
processing
pooling
Prior art date
Application number
PCT/JP2016/068741
Other languages
French (fr)
Inventor
Vijay DAULTANI
Original Assignee
Nec Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nec Corporation filed Critical Nec Corporation
Priority to PCT/JP2016/068741 priority Critical patent/WO2017216976A1/en
Publication of WO2017216976A1 publication Critical patent/WO2017216976A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

An information processing device for a neural network, the information processing device including a neural network reconfiguration unit configured to swap an order of activation processing and pooling processing in a target neural network, when the activation processing is a non-decreasing function and the pooling processing is a max function, which is a portion of the target neural network in which convolution processing, the activation processing, and the pooling processing occur in order; and a processing unit configured to process input data by using a reconfigured neural network by the neural network reconfiguration unit.

Description

DESCRIPTION
INFORMATION PROCESSING METHOD AND DEVICE FOR NEURAL
NETWORK
TECHNICAL FIELD
The present disclosure relates to the field of convolution neural networks used in, for example, image processing. More specifically, the present disclosure relates to a method and device to arrange an order of layers in convolution neural networks.
BACKGROUND ART
Recently, deep learning has been widely applied to the field of machine learning, particularly through the use of artificial neural networks which have shown promising results in various fields. A convolution neural network (CNN), which is one class of artificial neural networks, has seen significant research contributions in the past few years. CNNs have exhibited exceptional properties which have inspired their use for a multitude of challenging tasks. Image processing, text processing, speech processing, trade markets, etc. are some examples of the many fields where CNNs are being applied.
Machine learning has a long history and such techniques have been applied in many fields for various tasks. Before CNNs were used for these tasks, designers of machine learning systems had to determine which input features should be used to train computers in order to achieve good results. Specific features were chosen based on the designer's experience and intuition. Neural networks used these manually decided features for learning on training data. Careful selection of features required a large amount of time and effort, and had a huge impact on the results of tasks that machine learning was used to solve. Such decisions with regard to choosing features were limited by a designer's capability of wisely choosing the correct set of features. However, the use of CNNs changed this by automatically learning the features and replaced the need for a designer to choose the features.
In general a CNN can be viewed as a computation graph which is a thin wrapper around nodes (i.e. layers) connected together in some order. This interconnection of layers which form a computation graph or a network is also known as a model. Different types of inputs ex, image, voice, etc. have different characteristics and hence a single CNN model which suits every type of input is unlikely. Therefore, new CNN models are often designed either to solve a new problem or optimize an existing model.
A CNN model includes a number of layers and their interconnections. A typical CNN model includes some common elements such as a convolution layer, an activation layer, a pooling layer, a fully connected layer, a softmax layer and an SVM layer.
Although the above mentioned elements may be common to CNN models, the configuration of the connections of these layers differentiates one CNN model from another.
Artificial neural networks can be thought of as a simplified emulation of the visual cortex system in a human brain. However, current artificial neural networks are designed with specific engineering goals and not to emulate all the functionalities of a brain. Hence, researchers have developed models inspired by very complex human visual cortex systems. This has an advantage in that it reduces the amount of
computations within the limits of current state of the art hardware. In these abstracted mathematical models, specific tasks from the visual cortex system may be assigned to specific layers in artificial neural networks. Layers in CNN models are arranged in specific patterns. For example, a convolutional layer is usually followed by an activation layer which is sometimes followed by a pooling layer. Together, the convolution and activation layers model the capability of a single cell in the brain, i.e. where a cell fires (activates) if an excitatory signal (encouraging the cell to transmit the information forward to other neurons) on its dendrites is strong enough, i.e. higher than some threshold. Similarly, in a CNN model, a neuron activates if the output of a convolution operation is stronger than a predetermined threshold.
Since CNNs can have millions of neurons, the computing capability required to perform the computation for convolution neural networks is proportional to the number of neurons in the network. Hence, there is high demand for methods to shrink the output of intermediate layers in order to reduce the amount of computation. In order to perform this shrinking, activation layers are usually followed by a pooling layer which shrinks the output of the activation layers.
Different CNN models can vary from each other in many ways. One of these differences can be depth of network (i.e. the number of layers in the network), size (height, width and depth) of each layer, type of activation functions, usage of pooling layers and others. Although different from each other, commonalities exist in the structure of CNNs as discussed above. Among all of the patterns that may exist in convolution neural networks, the present invention is concerned with a pattern in which a convolution layer is followed by an activation layer, which is followed by a pooling layer. When such pattern of a convolution layer, an activation layer, and a pooling layer exists, the respective operations of each layer are also executed in the same order.
As described in NPL1, a general and simple CNN model may have, for example, a configuration where an input of data to be processed is followed by a convolution layer, an activation layer, a pooling layer, and a fully connected layer, which may be the output of the CNN model.
As described in NPL2, different CNN models have different numbers of layers and different configurations for these layers. One example of a well-known CNN model is Alexnet, which is used for image recognition. Each CNN model differs based on design specifications for an intended application; however, the present disclosure is particularly concerned with the presence of three layers, i.e. a convolution layer, an activation layer, and a pooling layer in that order which is included in the example of Alexnet.
When such a pattern of the convolution layer, the activation layer and the pooling layer exists in a convolution neural network, it can be replaced by a pattern of the convolution layer, the pooling layer, and the activation layer (in this order) as disclosed in PTL 1. In PTL 1, it is shown that such an order of the layers can reduce the number of computations in the network, and suggests such an idea can be applied for any activation layer and pooling layer without regard to the function executed by each layer. However, it does not recognize that for certain functions used in the activation layer and the pooling layer, swapping the activation and pooling layer can produce unintended output or, in some cases, what are known as dead neurons, thus changing the output of parts of, and ultimately the entire, CNN. Citation List
Patent Literature
PTL1 : U.S. Application Publication No. 2015/0309961 Al
Non Patent Literature NPL1 : CS23 In Convolutional Neural Networks for Visual Recognition; http://cs231n.github.io/convolutional-networks/
NPL2: ImageNet Classification with Deep Convolutional Neural Networks;
https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional- neural-networks.pdf
DISCLOSURE OF INVENTION
Technical Problem
Different CNN models may vary from each other in a variety of factors.
However, in order to emulate human visual cortex system they have a common pattern of structure for stacking layers together. Each cell in the human visual cortex system, works in two steps. First, to combine all the signals received at its dendrites and second, to fire (activate) if the result of the first step is more than some threshold.
The convolution operation is analogous to the first step of the process of a cell, and the activation operation is analogous to the second step of the process of the cell. Since the output size of intermediate layers of a CNN model may be very large, the pooling operation is usually performed after the activation layer. The pooling operation emulates sampling of the output of nearby cells. Performing the pooling operation also introduces non-linearity, which addresses a known problem of overfitting.
Because of the resemblance of a neural network to the human visual cortex system, a convolution operation followed by an activation operation, and then by a pooling operation is very natural. Since two steps of a cell, i.e. summing strengths of the incoming signal on dendrites and firing are logically mapped on convolution operation and activation operation of convolution neural network, the order (convolution followed by activation) tends to be a common configuration among neural network designs.
Fig. 3 shows a simplified form of a state of the art CNN model used for image recognition. It is evident from Fig. 3 that a convolution layer, followed by an activation layer, further followed by a pooling layer exists in actual models used for solving problems in real life.
Inherent to the existing order of these 3 layers (i.e. convolution, activation, pooling) which we find in most state of the art CNN networks is an opportunity for optimization which is not easily recognizable. In such a case where these 3 layers exist in order, it is possible to swap the activation layer and the pooling layer thereby reducing the number of operations required for processing and decreasing the computing costs (i.e., speed of processing) as mentioned above. However, the prior art (for example PTL1) does not recognize a serious problem in that, for certain functions used in the activation layer and the pooling layer, swapping the activation and pooling layer can cause degradation in the integrity of the output data of the CNN.
Also in some scenarios swapping of the activation and pooling layers can change the output to 0 which can introduce what are known as dead neurons in the network. Dead neurons can affect the CNN model in both the training phase and testing phase and change the results from expected or intended results. Solution to Problem
One object of the present disclosure is to provide a class of functions for both the activation layer and the pooling layer, which when used for the respective layers will ensure that the output of the overall network will never change, except between the swapped activation layer and pooling layer. Further, the present invention has an object of providing a method and a device for optimizing a CNN to reduce computing costs while at the same time maintaining the integrity of the output thereof.
In order to achieve the aforementioned objects, the present invention provides a device and a method which can optimize the processing operations of a CNN while maintaining the integrity of the output thereof.
Therefore, a first aspect of the present invention provides an information processing device for a neural network, the information processing device including a neural network reconfiguration unit configured to swap an order of activation processing and pooling processing in a target neural network, when the activation processing is a non-decreasing function and the pooling processing is a max function, which is a portion of the target neural network in which convolution processing, the activation processing, and the pooling processing occur in order; and a processing unit configured to process input data by using a reconfigured target neural network by the neural network reconfiguration unit.
A second aspect of the present invention, in accordance with the first aspect, further includes a neural network analyzation unit configured to analyze the target neural network by identifying a target portion to be reconfigured by the neural network reconfiguration unit.
A third aspect of the present invention provides a computer-implemented information processing method for a neural network, the method including identifying a target portion in which the neural network is configured to perform, in order, convolution processing, activation processing, and pooling processing; when the activation processing of the target portion is a non-decreasing function and the pooling processing of the target portion is a max function, swapping the order of the activation processing and the pooling processing in the target portion of the neural network so as to reconfigure the neural network; and processing input data using the reconfigured neural network.
A fourth aspect of the present invention provides a non-transitory computer readable medium containing program instructions for causing a computer to perform the method of the third aspect.
Advantageous Effects of Invention
The present invention improves a neural network by reducing the number of operations and computational costs in terms of speed and power consumption performed by an information processing device or computer implementing the processes of the neural network while maintaining the integrity of the output of the neural network.
BRIEF DESCRIPTION OF DRAWINGS FIG. 1 is a block diagram of a configuration of a computer system by which information processing device according to exemplary embodiments of the present disclosure may be achieved.
FIG. 2 is a block diagram which represents a general configuration of a CNN model.
FIG. 3 is a block diagram of a schematic configuration for a simplfied representation of a state of the art CNN model such as Alexnet.
FIG. 4 is a block diagram of a schematic configuration where swapping technique transforms each occurrence of convolution-activation-pooling in an input CNN model to convolution-pooling-activation, after determining it is safe. FIG. 5 shows an example of how the swapping technique in accordance with the present invention may reduce the number of operations compared to the conventional technique.
FIG. 6 shows a flow of control used in the present invention to determine whether or not to perform the swapping of the activation and pooling layer.
FIGS. 7 A, 7B, 7C are examples showing how the swapping technique in accordance with the present invention can reduce the number of operations when max, or tanh or sigmoid functions are used for activation operation respectively and a max function is used for the pooling layer while maintaining the integrity of the output results.
FIGS. 8A, 8B, 8C are examples showing how the swapping technique in accordance with the present invention can reduce the number of operations when there exists overlapping between consecutive pooling operations max, or tanh or sigmoid functions are used for activation operation respectively and a max function is used for the pooling layer while maintaining the integrity of the output results.
FIG. 9 shows an example in which swapping the activation layer and the pooling layer results in a different output than the case in which the layers are not swapped due to an improper selection for the pooling layer function.
FIG. 10 is a block diagram showing a example of a reconfiguration of a CNN model.
EXEMPLARY EMBODIMENTS FOR CARRYING OUT THE INVENTION
It is to be understood that the figures and descriptions of the present invention have been simplified to illustrate elements that are relevant for a clear understanding of the present invention. Hereinafter, embodiments of the present invention will be described with reference to the figures. Fig. 1 is a block diagram of a configuration of a computer 100 (also referred to as a "computer system") by which information processing apparatuses according to below-described exemplary embodiments of the present disclosure may be achieved. The computer system 100 includes a processor 110, a cache subsystem 120, a GPU subsystem 130, a graphic output device 140, a memory bridge 150, an I/O (Input/Output) subsystem 160, input devices 170 (e.g. a mouse 171 and a keyboard 172), a memory subsystem 180, and secondary storage 190. The computer system 100 may include a plurality of graphics output devices 140. The processor 110 includes registers 111. The registers 111 are used to stage data used by execution units included in the processor 110 from the cache subsystem 120. The registers 111 and other parts of the processor 110 are present on the same hip to reduce the latency. The cache subsystem 120 may have two or more levels of cache. The processor 110 and at least a level of cache subsystem may be implemented on the same chip. The number (e.g. level 1, level 2, level 3, etc.) and locations (on chip or off chip of the processor 110) of the levels may vary among systems having different architectures. Therefore, for the sake of simplification of the variation in configuration among systems having different architectures, the cache subsystem 120 is show as a module separated from process 110. Inputs like mouse and key board 170 are connected to the memory bridge 150 via I/O subsystem.
In order to optimize an existing neural network 200 in accordance with the present invention, sections of the neural network 200 which may be capable of being optimized are identified by searching for occurrences of, in order, a convolution layer 220, an activation layer 230, and a pooling layer 240. Such sections are present in the example neural networks 200 shown in Figs. 2 and 3.
Here, it can be seen that there is one occurrence of a convolution layer 220 followed by an activation layer 230 followed by a pooling layer 240 in Fig. 2, and three occurrences in Fig 3. It should be noted that while the layers in Figs. 2 and 3 are shown having different shapes and sizes, the same reference numbers are used in accordance with the type of layer. Layers with the same reference number do not necessarily perform the same processing function as will be described in more detail below.
In the example of Fig. 2, input data 210 is input to the neural network 200, first to a convolution layer 220 where convolution is performed on the input data 210. The convolution processing function is the same function for all convolution layers. Next, the output of the convolution layer 220 is input to the activation layer 230 where activation processing is performed. The activation processing may be (depending on the neural network configuration) one of any number of processing functions commonly used for activation processing, such as a max function, a tanh function, a sigmoid function, or the like. It is possible that multiple activation layers 230 occurring within the same neural network 200 (as in Fig. 3) may use different processing functions as the activation processing of each layer. Next, the output of the activation layer 230 is input to the pooling layer 240 where pooling processing is performed. The pooling processing may be (depending on the neural network configuration) one of any number of processing functions commonly used for pooling processing, such as a max function, an average function, or the like. Again, it is possible that multiple pooling layers 240 occurring within the same neural network may use different processing functions as the pooling processing.
Next, the swapping technique used in the present invention will be described. As previously mentioned, when there is an occurrence of, in order, a convolution layer 220, an activation layer 230, and a pooling layer 230 in a neural network 200, it may be possible to optimize this portion of the neural network 200 if swapping the order of the activation layer 230 and the pooling layer would not cause a degradation of the output of this portion of the neural network. Therefore, the activation layer 230 is examined (analyzed) to confirm that the activation processing performed thereby is a non- decreasing function, and the pooling layer 240 is examined (analyzed) to confirm that the pooling processing performed thereby is a max function. This is because, if the activation layer has a decreasing function or the pooling layer is an averaging function (as in Fig. 9), swapping order of the activation layer and the pooling layer may change the output of this portion of the neural network or may produce dead neurons which have a potential to adversely effect the output of the entire neural network.
In a case that there is an occurrence of, in order, a convolution layer 220, an activation layer 230, and a pooling layer 240; the activation layer 230 is a non-decreasing function; and the pooling layer 240 is a max function; then, swapping of the activation layer 230 and the pooling layer 240 is performed in order to reconfigure this portion of the neural network to become the convolution layer 220 followed by the pooling layer 240 followed by the activation layer 230. This swapping may be performed on as many applicable occurrences of a convolution layer 220 followed by an activation layer 230 followed by a pooling layer of a neural network as necessary to reconfigure the entire neural network. For example, in the case of Fig. 3, reconfiguration may occur up to three times if all the occurrences of the convolution, activation, pooling (in order) meet the above requirements for the swapping technique.
A first embodiment of the present invention is an information processing device for a neural network determining whether or not it is safe to swap the order of the activation layer and the pooling layer without changing the output.
Fig. 2 is a block diagram of a general configuration of a CNN model. It shows a simplified CNN model which may be used, for example, for image recognition can be comprised of input 210, followed by a neural network 200, followed by a fully connected layer 250, which may also generate output of the CNN model. The neural network 200 in this example includes a convolution layer 220, followed by an activation layer 230, followed by a pooling layer 240.
Fig.3 is a block diagram of a schematic configuration for a simplified representation of an actual CNN model used for image recognition in real world according to one embodiment of the invention. Input 210 is an image input to the CNN network. The input 210 is connected to a neural network 200, which is connected to softmax 290, which is the output of the system. This layer (softmax 290) finds the class of the object in the image. The neural network 200 includes any number of processing units, which are potential target portions for reconfiguration. One processing unit is a combination of a convolution layer 220 and an activation layer 230, and optionally a pooling layer 240. The neural network 200 may further include, one or more of fully connected layers 250, each of which may be combined with an activation layer 230.
As shown in Fig. 3, patterns of order of layers i.e. convolution 220 followed by activation 230 followed by pooling 240 are very common in the network. Input 210 of the network is forwarded to the first processing unit (a target network or a target portion) and taken as an input by the convolution layer 220 in the unit, which performs the task of pattern matching also known as the operation of convolution. The output of the convolution layer 220 is given as an input to the activation layer 230 which performs the task of the activation. The activation layer 230 can perform the operation of the activation using any one out of various functions such as max, tanh, or sigmoid.
Although sigmoid was a very common function used for the activation in some old CNN architectures, it became out of fashion in most of the CNN networks nowadays. Output of the activation layer 230 may be given as an input to the pooling layer 240 which performs the task of the pooling or resizing. Pooling layer 240 can use one out of various functions such as max or average, in order to perform the task of the resizing. For performing the pooling, max operation (max function) is most preferable while average function can also be used in practice.
The neural network 200 in this embodiment includes a three repetitive combinations of one fully connected layer 250 followed by one activation layer 230. These layers are basically performing the task of the linear classifier, which is used to generate the score for several classes of the input. Softmax layer calculates the loss or error occurred while training or accuracy while performing the testing phase.
Fig 4. Shows the block diagram of swapping technique, where if in input CNN model, a pattern of convolution-activation-pooling is found, swapping technique first determines if it is safe to swap activation and pooling layer, by using steps explained in Fig. 6. If it is found to be safe then such occurrence of convolution-activation-pooling is replaced with convolution-pooling-activation.
FIG. 5 shows an example, how the embodiment may reduce the number of operations compared to the related arts. It is shown in this figure that in the related art (NPL 1). The total number of operations performed by a activation layer and pooling layer together were 4 + 3 = 7 respectively. Where as in the embodiment the number of operations performed by such a technique is reduced to 3 + 1 = 4 respectively.
FIG. 6 shows a flow of control used by the proposed technique to determine if it's safe to perform the swapping of activation and pooling layer. It is shown in step S610 that the swapping technique first analyzes the input CNN model configuration. It searches for a pattern where convolution layer is followed by the activation layer which is further followed by the pooling layer. Then in step S620, if such a pattern is found then it check for the function used for the activation layer in step S630. If no such pattern is found in the input CNN model the flow reaches end of the reconfiguration processing. If in step S630, if it is found that non-decreasing (or monotonically increasing) function is used then control reaches step S640 where it checks for the function used for the pooling layer. If it is found that a non-decreasing function is not used for the activation function control reaches S610. In step S640 if it is found that max function is used for the pooling layer then control reaches step S650. If in step S640 it is found that max function is not used for the pooling layer control reaches step S610. In step S650 the
reconfiguration of the network is found to be safe and occurrence of convolution- activation-pooling is replaced with convolution-pooling-activation.
FIG. 7A is an example showing how proposed idea can reduce the number of operations in case, when max function is used for activation operation and max function is used for pooling layer. FIG. 7A considers a case when there is no overlap between pooling operations. In related art NPL 1 , convolution layer is followed by activation layer, and activation layer is followed by pooling layer. Whereas in the embodiment,
convolution layer is followed by pooling layer, and pooling layer is followed by activation layer. Dashed line in the FIG. 7A represents activation operation i.e. maximum of value and 0 is calculated. Solid line represents pooling operation i.e. quaternary maximum of four values inside the solid line window is calculated. Therefore dashed line over -9 in NPL 1 represents max(-9,0) is calculated, and result i.e. 0 is saved at appropriate location. Such activation operation is performed for each value from the convolution layer. Solid line over 0, 0, 0, 0 in NPL 1 represents max(0,0,0,0) is calculated, and result i.e. 0 is saved at appropriate location. In NPL 1 there is one binary max operation for each element in the activation layer, since there are sixteen elements in the activation layer i.e. sixteen binary max operations. Also in NPL 1 there is one quaternary max operation for each non-overlapped window, and there are four such windows i.e. (0,6,9,4),(1,0,0,10),(3,0,0,0),(0,0,0,0), also one quaternary max operation for a window consists of three binary max operation. For example, max(0,0,0,0) is calculated as ml=max(0,0), m2=max(ml,0), m3=max(m2,0) where m3=0 is the final output. Therefore a total of 16x1+4x3=28 binary max operations are performed in NPL 1.
In the embodiment, convolution layer is followed by a pooling layer, and pooling layer is followed by activation layer. Solid line over -9, 6, 9, 4 in the
embodiment represents max(-9,6,9,4) is calculated, and results i.e. 9 is saved at appropriate location. Dashed line over -1 represents max(-l,0) is calculated, and result i.e. 0 is saved at appropriate location. In the embodiment one quaternary max operation for each non-overlapped window, and there are four such windows, i.e. (-9,6,9,4),(1,-1,- 8,10),(3,-8,-7,-2),(-l,-5,-7,-9), also one quaternary max operation for a window consists of three binary max operation. For example(-9,6,9,4) is calculated as ml=max(-9,6), m2=max(ml,9),m3=max(m2,4), where m3=9 is the final output. Also in the embodiment one binary max operation is performed for each element on output of pooling layer, i.e. four binary max operations are performed for four elements in the pooling layer.
Therefore a total 4x3+4x1=16 binary max operations, whereas 28 binary max operations were performed in the NPL 1 are performed in the embodiment.
FIG. 7B is an example showing how proposed idea can reduce the number of operations in case, when tanh functions is used for activation operation respectively. And max function is used for pooling layer also there is no overlap between successive pooling operations. FIG. 7B is almost similar to FIG. 7A, only difference is the function used for the activation layer, i.e. FIG 7A uses maximum function for the activation layer, FIG. 7B uses tanh function for the activation layer, but still both the figures uses maximum function in the pooling layer. Therefore a total of 16x1=16 tanh operations and 4x3=12 binary max operations are performed in NPL 1. On contrary a total of 4x3=12 binary max operations and 4x1=4 tanh operations are performed in the embodiment. FIG. 7C is an examples showing how proposed idea can reduce the number of operations in case, when sigmoid functions is used for activation operation respectively. And max function is used for pooling layer also there is no overlap between successive pooling operations. FIG. 7C is almost similar to FIG. 7A, only difference is the function used for the activation layer, i.e. FIG. 7A uses maximum function for the activation layer, FIG. 7C uses sigmoid function for the activation layer, but still both the figures uses maximum function in the pooling layer. Therefore a total of 16x1=16 sigmoid operations and 4x3=12 binary max operations are performed in NPL1. On contrary a total of 4x3=12 binary max operations and 4x1=4 sigmoid operations are performed in the embodiment.
It can be easily seen from these examples FIG. 7A, FIG. 7B, FIG. 7C that the proposed technique reduces the number of operations performed in the activation layer without changing overall output compared to NPL 1.
FIG. 8A is an example showing how proposed idea can reduce the number of operations in case, when max function is used for activation operation and max function is used for pooling layer and there exists overlapping between successive pooling operations. In NPL 1, convolution layer is followed by activation layer, and activation layer is followed by pooling layer. Whereas in the embodiment, convolution layer is followed by pooling layer and pooling layer is followed by activation layer. Dashed line in FIG 8A represents activation operation i.e. maximum of value and 0 is calculated.
Solid line represents pooling operation i.e. maximum of four values inside the solid line window is calculated. Therefore dashed line over -9 in NPL 1 represents max(-9,0) is calculate, and result i.e. 0 is saved at appropriate location. Such activation operation is performed for each value from the convolution layer. Solid line over 3,0,0,0 and 0,0,0,0 in NPL 1 represents max(3, 0,0,0) and max(0,0,0,0) is calculated. This figures differs from FIG. 7A in the sense that in FIG. 8A there is overlap between two solid line i.e. pooling operations. In NPL 1 there is one binary max operation for each element in the activation layer, since there are sixteen elements in the activation layer i.e. 16 binary max operations. Also in NPL 1 there is one quaternary max operation for each
overlapped window, and there are nine such windows i.e. (0,6,9,4), (6,1,4,0), (1,0,0,10), (9,4,3,0), (4,0,0,0), (0,10,0,0), (3,0,0,0), (0,0,0,0), (0,0,0,0), also one quaternary max operation for a window consists of three binary max operation. For example,
max(3,0,0,0) is calculated as ml=max(3,0), m2=max(ml,0), m3=max(m2,0) where m3 = 3 is the final output. Therefore a total of 16x1+9x3=43 binary max operations are performed in NPL 1.
In the embodiment, convolution layer is followed by a pooling layer, and pooling layer is followed by activation layer. Solid line over -9, 6, 9, 4 in the
embodiment represents max(-9,6,9,4) is calculated, and results i.e. 9 is saved at appropriate location. Dashed line over -1 represents max(-l,0) is calculated, and result i.e. 0 is saved at appropriate location. In the embodiment, one quaternary max operation for each window, and there are nine such windows, i.e. (-9,6,9,4),(6, 1 ,4,-8),(l ,- 1 ,- 8,10),(9,4,3,-8),(4,-8,-8,-l),(-8,10,-l,-5),(3,-8,-7,-2),(-8,-l,-2,-7),(-l,-5,-7,-9), also one binary max operation for a window consists of three binary max operation. For example(- 9,6,9,4) is calculated as ml=max(-9,6), m2=max(ml,9), m3==max(m2,4), where m3=9 is the final output. Also in the embodiment, one binary max operation is performed for each element in output of pooling layer, i.e. four binary max operations are performed for four elements in the pooling layer. Therefore a total 9x3+9x1=36 binary max operations, whereas 43 binary max operations were performed in the NPL 1 are performed in the embodiment. FIG. 8B is an example showing how proposed idea can reduce the number of operations in case, when tanh functions is used for activation operation respectively. And max function is used for pooling layer also there is overlap between successive pooling operations. FIG. 8B is almost similar to FIG. 8A, only difference is the function used for the activation layer, i.e. FIG 8 A uses maximum function for the activation layer, FIG. 8B uses tanh function for the activation layer, but still both the figures uses maximum function in the pooling layer. Therefore a total of 16x1=16 tanh operations and 9x3=27 binary max operations are performed in NPL 1. On contrary a total of 9x3=27binary max operations and 9x1=9 tanh operations are performed in the embodiment.
FIG. 8C is an examples showing how proposed idea can reduce the number of operations in case, when sigmoid functions is used for activation operation respectively. And max function is used for pooling layer also there is overlap between successive pooling operations. FIG. 8C is almost similar to FIG. 8A, only difference is the function used for the activation layer, i.e. FIG. 8A uses maximum function for the activation layer, FIG. 8C uses sigmoid function for the activation layer, but still both the figures uses maximum function in the pooling layer. Therefore a total of 16x1=16 sigmoid operations and 9x3=27binary max operations are performed in NPL1. On contrary a total of 9x3=27 binary max operations and 9x1=9 sigmoid operations are performed in the embodiment.
It can be easily seen from these examples FIG. 8 A, FIG. 8B, FIG. 8C that the proposed technique reduces the number of operations performed in the activation layer without changing overall output of the 3 operations, even when there exists overlapping between the pooling operations.
FIG. 9 shows an example how PTL 1 can change the output after swapping activation and pooling layer when average function is used for pooling. In NPL 1 first in the activation layer max of each element and 0 is determined, for example max(-5,0). There are 4 such elements i.e. -5, 5, -5, 5. In NPL 1 in Pooling layer average function is then used to calculate average of four elements i.e. avg(0,0,5,5) = 2.5. Whereas in PTL 1 which suggest performing pooling before activation, changes the output. First average of four elements is taken i.e. avg(-5,5,-5,5) = 0 and then max of element with 0 i.e. max(0,0) = 0. Hence it can be seen that the output of the NPL 1 i.e. 2.5 is different from PTL 1 i.e. 0. In the embodiment such a case will never exists because it will be found in step S640 that the function used for the pooling layer is not a max function and hence such a swapping or reconfiguration is not safe in our proposed idea.
It should be noted that a program capable of implementing functionalities of the information processing method according to the present invention may be recorded in a non-transitory computer readable medium, and the operations of identifying target portions of a neural network to be optimized (i.e., swapping the activation layer and pooling layer of the neural network), and the like may be performed by causing a computer system to read and execute the program recorded in the computer readable medium. The term "computer system" used herein refers to software such as an operating system (OS) or hardware devices such as peripherals. In addition, the
"computer system" may also include a world wide web (WWW) system capable of providing a website environment (or a display environment). Further, the term
"computer readable media" refers to portable media such as a flexible disk, a magneto- optical (MO) disc, a read-only memory (ROM), and a compact disc (CD) ROM, and a storage device built in the computer system such as a hard disk. Moreover, the
"computer readable media" includes media capable of maintaining the program during a certain period of time, such as a volatile memory (random-access memory (RAM)) inside the computer system serving as a server or a client when the program is transmitted via network such as the Internet or a communication line such as a telephone line. The program may be transmitted from the computer system in which the program is stored in, for example, the storage device, to another computer system through transmission media or transmission waves in the transmission media. Here, the term "transmission media" for transmitting the program refers to media capable of transmitting information like a network (communication network) such as the Internet or a communication circuit (communication line) such as a telephone line. Furthermore, the program may also include a program for implementing a part of the aforementioned functionalities and include a discrete file (discrete program) in which the aforementioned functionalities are implemented in combination with a program that has already been recorded in the computer system.
While preferred embodiments of the invention have been described and illustrated above, it should be understood that these are exemplary of the invention and are not to be considered as limiting. Additions, omissions, substitutions, and other modifications can be made without departing from the spirit or scope of the present invention. Accordingly, the invention is not to be considered as being limited by the foregoing description, and is only limited by the scope of the appended claims.
INDUSTRIAL APPLICABILITY
The present invention can be applied to the field of data processing, particularly image processing, text processing, speech processing, and machine learning. Reference Signs List
110 processor
111 Registers
120 cache subsystem 130 GPU subsystem
140 graphics output device(s)
150 memory bridge
160 I/O subsystem
170 mouse/keyboard
180 memory subsystem
181 OS
182 driver
183 application
190 secondary storage
200 neural network
201 reconfigured neural network
210 input
220 convolution layer
230 activation layer
240 pooling layer
250 output/ fully connected layer
290 softmax
300 swapping technique

Claims

1. An information processing device for a neural network, the information processing device comprising:
a neural network reconfiguration unit configured to swap an order of activation processing and pooling processing in a target neural network, when the activation processing is a non-decreasing function and the pooling processing is a max function, which is a portion of the target neural network in which convolution processing, the activation processing, and the pooling processing occur in order; and
a processing unit configured to process input data by using a reconfigured neural network by the neural network reconfiguration unit.
2. The information processing device of claim 1, further comprising:
a neural network analyzation unit configured to analyze the target neural network by identifying a target portion to be reconfigured by the neural network reconfiguration unit.
3. A computer-implemented information processing method for a neural network, the method comprising:
identifying a target portion in which the neural network is configured to perform, in order, convolution processing, activation processing, and pooling processing;
when the activation processing of the target portion is a non-decreasing function and the pooling processing of the target portion is a max function, swapping the order of the activation processing and the pooling processing in the target portion of the neural network so as to reconfigure the neural network; and
processing input data using the reconfigured neural network.
4. A non-transitory computer readable medium containing program instructions for causing a computer to perform the method of claim 3.
PCT/JP2016/068741 2016-06-17 2016-06-17 Information processing method and device for neural network WO2017216976A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2016/068741 WO2017216976A1 (en) 2016-06-17 2016-06-17 Information processing method and device for neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2016/068741 WO2017216976A1 (en) 2016-06-17 2016-06-17 Information processing method and device for neural network

Publications (1)

Publication Number Publication Date
WO2017216976A1 true WO2017216976A1 (en) 2017-12-21

Family

ID=56507773

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2016/068741 WO2017216976A1 (en) 2016-06-17 2016-06-17 Information processing method and device for neural network

Country Status (1)

Country Link
WO (1) WO2017216976A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020038462A1 (en) * 2018-08-24 2020-02-27 深圳市前海安测信息技术有限公司 Tongue segmentation device and method employing deep learning, and storage medium
CN111868754A (en) * 2018-03-23 2020-10-30 索尼公司 Information processing apparatus, information processing method, and computer program
CN112801266A (en) * 2020-12-24 2021-05-14 武汉旷视金智科技有限公司 Neural network construction method, device, equipment and medium
CN112995333A (en) * 2021-04-02 2021-06-18 深圳市大富网络技术有限公司 Remote file activation method, system and related device
US11892925B2 (en) 2018-10-19 2024-02-06 Samsung Electronics Co., Ltd. Electronic device for reconstructing an artificial intelligence model and a control method thereof

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150309961A1 (en) 2014-04-28 2015-10-29 Denso Corporation Arithmetic processing apparatus
US20150324685A1 (en) * 2014-05-07 2015-11-12 Seagate Technology Llc Adaptive configuration of a neural network device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150309961A1 (en) 2014-04-28 2015-10-29 Denso Corporation Arithmetic processing apparatus
US20150324685A1 (en) * 2014-05-07 2015-11-12 Seagate Technology Llc Adaptive configuration of a neural network device

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111868754A (en) * 2018-03-23 2020-10-30 索尼公司 Information processing apparatus, information processing method, and computer program
WO2020038462A1 (en) * 2018-08-24 2020-02-27 深圳市前海安测信息技术有限公司 Tongue segmentation device and method employing deep learning, and storage medium
US11892925B2 (en) 2018-10-19 2024-02-06 Samsung Electronics Co., Ltd. Electronic device for reconstructing an artificial intelligence model and a control method thereof
CN112801266A (en) * 2020-12-24 2021-05-14 武汉旷视金智科技有限公司 Neural network construction method, device, equipment and medium
CN112801266B (en) * 2020-12-24 2023-10-31 武汉旷视金智科技有限公司 Neural network construction method, device, equipment and medium
CN112995333A (en) * 2021-04-02 2021-06-18 深圳市大富网络技术有限公司 Remote file activation method, system and related device
CN112995333B (en) * 2021-04-02 2023-05-23 深圳市大富网络技术有限公司 Remote file activation method, system and related device

Similar Documents

Publication Publication Date Title
Loni et al. DeepMaker: A multi-objective optimization framework for deep neural networks in embedded systems
WO2017216976A1 (en) Information processing method and device for neural network
US20210256355A1 (en) Evolving graph convolutional networks for dynamic graphs
CN108345827B (en) Method, system and neural network for identifying document direction
CN113408743A (en) Federal model generation method and device, electronic equipment and storage medium
CN112598080A (en) Attention-based width map convolutional neural network model and training method thereof
CN104598579A (en) Automatic question and answer method and system
CN112862092B (en) Training method, device, equipment and medium for heterogeneous graph convolution network
US20180330229A1 (en) Information processing apparatus, method and non-transitory computer-readable storage medium
US20240028898A1 (en) Interpreting convolutional sequence model by learning local and resolution-controllable prototypes
McCarthy et al. Addressing posterior collapse with mutual information for improved variational neural machine translation
Chandak et al. A comparison of word2vec, hmm2vec, and pca2vec for malware classification
JP7063274B2 (en) Information processing equipment, neural network design method and program
US20210133552A1 (en) Neural network learning device, neural network learning method, and recording medium on which neural network learning program is stored
KR20210083624A (en) Method and apparatus for controlling data input and output of neural network
Yoon et al. Learning polymorphic neural ODEs with time-evolving mixture
WO2021177031A1 (en) Quantum computer, quantum computation method, and program
US20220076103A1 (en) Data Processing Processor, Corresponding Method and Computer Program.
CN112651492A (en) Self-connection width graph convolution neural network model and training method thereof
KR20220099749A (en) Malware detection device and method based on hybrid artificial intelligence
WO2022153711A1 (en) Training apparatus, classification apparatus, training method, classification method, and program
Manzan et al. A mathematical discussion concerning the performance of multilayer perceptron-type artificial neural networks through use of orthogonal bipolar vectors
CN114730331A (en) Data processing apparatus and data processing method
US20220108156A1 (en) Hardware architecture for processing data in sparse neural network
WO2022153710A1 (en) Training apparatus, classification apparatus, training method, classification method, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16741710

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16741710

Country of ref document: EP

Kind code of ref document: A1