WO2018153128A1 - 卷积神经网络和用于其的处理方法、装置、系统、介质 - Google Patents
卷积神经网络和用于其的处理方法、装置、系统、介质 Download PDFInfo
- Publication number
- WO2018153128A1 WO2018153128A1 PCT/CN2017/111617 CN2017111617W WO2018153128A1 WO 2018153128 A1 WO2018153128 A1 WO 2018153128A1 CN 2017111617 W CN2017111617 W CN 2017111617W WO 2018153128 A1 WO2018153128 A1 WO 2018153128A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- neural network
- convolutional neural
- output
- pixel
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
Definitions
- Embodiments of the present invention relate to the field of image processing, and more particularly to a convolutional neural network and processing method, apparatus, system, medium therefor.
- CNN Convolutional Neural Network
- At least one embodiment of the present invention provides a processing method for a convolutional neural network, comprising the steps of: utilizing an activation recorder layer as an activation function layer in a convolutional neural network, wherein responsive to said convolutional neural network The network input has a detected image of content, wherein the active recorder layer performs the same activation operation as the activation function layer and records an activation result of the activation operation; modifying the convolutional neural network, the modifying step including utilizing Replacing the active recorder layer with a masking layer, wherein the masking layer uses the activated result of the recording; inputting an analysis image as an input image to the modified convolutional neural network to output the modified convolutional neural network An image is output to analyze a positive or negative effect between the input image and the output image, wherein the analysis image is a pixel level binary image.
- the step of positively affecting includes: inputting an all-zero image to the modified convolutional neural network to output a first output value as an offset coefficient; inputting one or more analysis images to the modified convolutional neural network to Offset coefficients output an output image of the modified convolutional neural network, wherein the analysis image is a binary image that is 1 at one pixel and 0 at other pixels; by analyzing the analysis image and output The correspondence of the images at the pixel level obtains the influence of each pixel of the input image of the modified convolutional neural network on each pixel of the output image as a positive influence.
- the convolutional neural network includes an upsampling layer.
- the step of inputting an analysis image as an input image to the modified convolutional neural network to obtain an analysis result in order to analyze the reverse influence between the input image and the output image of the convolutional neural network before the modification includes: Inputting an all-zero image to the modified convolutional neural network to output a first output value as an offset coefficient; inputting all possible different analysis images to the modified convolutional neural network to output the modification based on the bias coefficient
- An output image of the subsequent convolutional neural network wherein the analysis image is a binary image that is 1 at one pixel and 0 at other pixels, and the positions of pixels that are 1 in different analysis images are different.
- the influence of each pixel of the output image of the convolutional neural network on each pixel of the input image is obtained as a reverse influence by analyzing the correspondence relationship between the analysis image and the output image at the pixel level.
- the step of modifying the convolutional neural network includes replacing the activation recorder layer with a masking layer, wherein the activation result of the masking layer usage record comprises: configuring a deconvolution network as a modified volume An integrated neural network, wherein the deconvolution network is a reverse network of the convolutional neural network, wherein the inputting an analysis image to the modified convolutional neural network as an input image to obtain an analysis result for analyzing the modification
- the reverse effect between the input image and the output image of the convolutional neural network includes: inputting an all-zero image to the modified convolutional neural network to output a first output value as a bias coefficient; to the modified convolution
- the neural network inputs one or more analysis images to output an output image of the modified convolutional neural network based on the bias coefficient, wherein the analysis image is 1 at a certain pixel and 0 at other pixels Binary image; obtaining the modified convolutional neural network by analyzing the correspondence between the analysis image and the output image at the pixel level Effects of each pixel of the
- the convolutional neural network includes an upsampling layer
- the configuring the deconvolution network as the modified convolutional neural network includes replacing the upsampled layer with a corresponding downsampled layer of the upsampled layer.
- At least one embodiment of the present invention provides a processing apparatus for a convolutional neural network, package a recorder that uses an activation recorder layer as an activation function layer in a convolutional neural network, wherein the recorder causes the activation recorder layer to execute in response to inputting a probe image having content to the convolutional neural network An activation operation identical to the activation function layer and recording an activation result of the activation operation; a modifier configured to modify the convolutional neural network, the modifying step comprising replacing the activation recorder layer with a masking layer Wherein the masking layer uses the recorded activation result; the analyzer is configured to input an analysis image as an input image to the modified convolutional neural network to output an output image of the modified convolutional neural network so that A positive or negative effect between the input image and the output image is analyzed, wherein the analysis image is a pixel level binary image.
- the analyzer is configured to: input an all zero image to the modified convolutional neural network to output a first output value as a bias coefficient; input one or more analysis images to the modified convolutional neural network to be based on The bias coefficient outputs an output image of the modified convolutional neural network, wherein the analysis image is a binary image that is 1 at a certain pixel and 0 at other pixels; by analyzing the analysis image Correspondence with the output image at the pixel level to obtain the effect of each pixel of the input image of the modified convolutional neural network on each pixel of the output image as a positive influence.
- the convolutional neural network includes an upsampling layer.
- the analyzer is configured to: input an all zero image to the modified convolutional neural network to output a first output value as an offset coefficient; input all possible different analysis images to the modified convolutional neural network to Outputting an output image of the modified convolutional neural network based on the bias coefficient, wherein the analysis image is a binary image that is 1 at one pixel and 0 at other pixels, and different analysis images
- the analysis image is a binary image that is 1 at one pixel and 0 at other pixels, and different analysis images
- the position of the pixel of the middle is different; by analyzing the correspondence relationship between the analysis image and the output image at the pixel level, the influence of each pixel of the output image of the convolutional neural network on each pixel of the input image is obtained as Reverse impact.
- the modifier is configured to: configure a deconvolution network as a modified convolutional neural network, wherein the deconvolution network is a reverse network of the convolutional neural network, wherein the analyzer is configured to Converting an all-zero image to the modified convolutional neural network to output a first output value as an offset coefficient; inputting one or more analysis images to the modified convolutional neural network to output the modification based on the bias coefficient An output image of a subsequent convolutional neural network, wherein the analysis image is a binary image that is 1 at one pixel and 0 at other pixels; by analyzing the analysis image and input The correspondence of the images at the pixel level is obtained to obtain the influence of each pixel of the input image of the modified convolutional neural network on each pixel of the output image as a reverse influence.
- the convolutional neural network includes an upsampling layer
- the analyzer is configured to configure the deconvolution network as a modified convolutional neural network by using a corresponding downsampling layer of the upsampling layer
- the upsampled layer is replaced.
- At least one embodiment of the present invention provides a processing system for a convolutional neural network comprising: one or more processors; one or more memories in which computer readable code is stored, the computer readable code
- a processing method according to any of at least one embodiment of the present invention is performed when executed by the one or more processors.
- At least one embodiment of the present invention provides a convolutional neural network comprising: one or more convolutional layers; one or more masking layers corresponding to the one or more convolutional layers to replace a corresponding one or a plurality of activated recorder layers, the one or more active recorder layers acting as an activation function layer in the convolutional neural network, wherein the one or more are detected in response to inputting a detected image having content to the convolutional neural network
- the activation recorder layer performs the same activation operation as the activation function layer and records the activation result of the activation operation, wherein the one or more masking layers use the recorded activation result;
- the input terminal receives one or a plurality of analysis images; an output that outputs an output image of the modified convolutional neural network to analyze a positive or negative influence between the input image and the output image, wherein the analysis
- the image is a binary image at the pixel level.
- the input prior to receiving the analysis image, receives an all zero image to output a first output value from the output as a bias coefficient; the input is configured to receive the one or more analysis images to be based on The bias coefficient outputs an output image of the modified convolutional neural network from the output, wherein the analysis image is a binary image that is 1 at a certain pixel and 0 at other pixels;
- the correspondence between the analysis image and the output image at the pixel level is analyzed to obtain the influence of each pixel of the input image of the modified convolutional neural network on each pixel of the output image as a positive influence.
- a convolutional neural network also includes an upsampling layer.
- the input prior to receiving the analysis image, the input receives an all zero image to output a first output value from the output as a bias coefficient, the input receiving all possible different analysis images to be based on the offset a coefficient outputting an output image of the modified convolutional neural network from the output, wherein the analysis image is a binary image that is 1 at one pixel and 0 at other pixels, And the positions of the pixels of 1 in different analysis images are different; by analyzing the correspondence relationship between the analysis image and the output image at the pixel level to obtain the respective pixels of the output image of the convolutional neural network, the input image The effect of each pixel acts as a reverse.
- the convolutional neural network may be replaced by a deconvoluted network as a modified convolutional neural network, wherein the deconvolution network is a reverse network of the convolutional neural network, the input being replaced with An output of the modified convolutional neural network, the output being replaced with an input of the modified convolutional neural network, wherein the input of the modified convolutional neural network receives an all-zero image from the The output of the modified convolutional neural network outputs a first output value as an offset coefficient; the input of the modified convolutional neural network receives all one or more analysis images to output the based on the bias coefficient An output image of the modified convolutional neural network, wherein the analysis image is a binary image that is 1 at one pixel and 0 at other pixels; by analyzing the analysis image and the output image at the pixel level The correspondence is obtained as a reverse influence on the influence of each pixel of the input image of the modified convolutional neural network on each pixel of the output image.
- the convolutional neural network further includes an upsampling layer, wherein the upsampling layer is replaced with a corresponding downsampling layer of the upsampling layer in the deconvolution network.
- At least one embodiment of the present invention provides a computer storage medium for storing computer readable code that, when executed by one or more processors, performs any of at least one embodiment in accordance with the present invention A processing method as described.
- Figure 1 shows a simplified schematic of a convolutional neural network.
- Fig. 2 shows an example of a small number of filters equivalent due to the activation result of the activation function in the convolutional neural network.
- FIG. 3 shows a flow chart of a processing method for a convolutional neural network in accordance with one embodiment of the present invention.
- Figure 4 shows a convolutional layer of a convolutional neural network and a simplified diagram thereof.
- Figure 5 shows a simplified diagram of the activation function layer of a convolutional neural network.
- Fig. 6 shows a schematic diagram of step S301 in the processing method shown in Fig. 3.
- Fig. 7 shows a schematic diagram of step S302 in the processing method shown in Fig. 3.
- FIG. 8 shows an embodiment of step S303 in the processing method shown in FIG.
- Fig. 9 shows another embodiment of step S303 in the processing method shown in Fig. 3.
- Fig. 10A shows still another embodiment of step S302 and step S303 in the processing method shown in Fig. 3.
- FIG. 10B shows an example diagram of a modified convolutional neural network.
- Figure 11A shows a schematic diagram of an upsampled layer in a convolutional neural network.
- Figure 11B shows an example of the location of adding the upsampled layer MUXOUT layer in a convolutional neural network.
- Figure 11C shows a schematic diagram of understanding the downsampling layer in a convolutional neural network.
- Figure 12 shows a block diagram of a processing device for a convolutional neural network in accordance with one embodiment of the present invention.
- Figure 13 shows a schematic diagram of a convolutional neural network in accordance with one embodiment of the present invention.
- FIG. 14 illustrates an exemplary processing system that can be used to implement the processing methods of the present disclosure.
- a convolutional network is a neural network structure that uses an image as an input/output and replaces a scalar weight with a filter (convolution).
- a simple structure with 3 layers is shown in FIG.
- the convolutional neural network is used, for example, for image processing, using images as inputs and outputs, such as by filters (ie, convolution) instead of scalar weights.
- a convolutional neural network with a simple structure of 3 layers is shown in FIG. As shown in FIG. 1, the structure acquires 4 input images at the four input terminals on the left side, has 3 units (output image) in the hidden layer at the center, and has 2 units in the output layer. Produces 2 output images.
- Each box corresponds to a filter (eg, a 3x3 or 5x5 core), where k is a label indicating the input layer number, and i and j are labels indicating the input and output units, respectively.
- Bias Is the scalar added to the output of the convolution.
- the result of adding several convolutions and offsets is then passed through an activity box, which typically corresponds to a rectifying linear unit (ReLU) or a sigmoid function or hyperbolic tangent or the like.
- ReLU rectifying linear unit
- the filters and offsets are fixed during operation of the system, are obtained through a training process using a set of input/output sample images, and are adjusted to suit some optimization criteria depending on the application.
- a typical configuration involves one tenth or hundreds of filters in each layer.
- a network with 3 layers is considered shallow, while a number of layers greater than 5 or 10 is generally considered to be deep.
- a convolutional network is a nonlinear system. This non-linearity is due to the activity function, which prevents the entire system from being reduced to a small set of filters acting on each input.
- it is convenient to interpret a convolutional network as an adaptive filter.
- ReLU rectifying linear unit
- FIG. 2 An example of a small number of filters that are equivalent due to the activation result of the activation function in the convolutional neural network is shown in FIG.
- a typical use of a deep learning system begins with the choice of network architecture, then trains the model and obtains a set of parameters (filter coefficients and offsets). If the training process is successful, for a given input network, the output will match the desired target with high precision, in many cases better than any other method. But there are still many questions that are still difficult to answer. For example: Is the network architecture the best choice for this problem? Is the number of parameters sufficient? Still too much? And, at a more basic level, how do these parameters work inside the network to get the output? How does the use of many layers (deep networks) help improve results compared to using a few layers (shallow networks)?
- Filters in deep network architectures are usually small (3x3 or 5x5), and visualizing a large number of filters one by one does not provide much insight into the system, and biasing does not give a complex mechanism to work within the network. The scalar number of clues. Understanding the parameters of a deep learning system is still largely an open question.
- the present invention is a system that introduces a system that allows extending the classical method for linear systems to analyze convolutional networks.
- Linear systems can be fully described by the so-called impulse response.
- an impulse response refers to an output of an input equal to zero anywhere except for a single position being one.
- the convolutional network is due to the activation function (ReLU) rather than the linear system.
- ReLU activation function
- the impulse response can show the effect of the input pixels on the output pixels.
- the opposite relationship can also be obtained, namely which input pixels are used to obtain the output pixels, and how important each input pixel is, ie the effect of the output pixels on the input pixels. This can be visualized as an image representing the overall filter effect of the network.
- At least one embodiment of the present invention provides a convolutional neural network and processing method, apparatus, and system therefor, which can be used to determine the influence and effect of an output generated by a single or several pixel pairs of one input (abbreviation Positive influence) or how a single or a few pixels in the output are affected by the input (referred to as the reverse effect).
- FIG. 3 shows a flow diagram of a processing method 300 for a convolutional neural network in accordance with one embodiment of the present invention.
- the processing method 300 for a convolutional neural network includes: step S301, using an activation recorder layer as an activation function layer in a convolutional neural network, wherein the input has content in response to input to the convolutional neural network a detection image, wherein the activation recorder layer performs the same activation operation as the activation function layer and records an activation result of the activation operation; and in step S302, the convolutional neural network is modified, the modifying step including utilizing masking Layer replacing the activated recorder layer, wherein the masking layer uses the activated result of the recording; and in step S303, inputting an analysis image as an input image to the modified convolutional neural network to output the modified convolutional nerve An output image of the network to analyze the positive or negative effects between the input image and the output image shown, wherein the analysis image is a binary image at the pixel level.
- the activation recorder layer is used to record the activation result of the activation operation of the activation function layer which originally caused the nonlinearity of the convolutional neural network, and the activation result of the record is fixed by the mask layer, so that the convolutional neural network is modified from nonlinear to linear.
- the subsequent analysis is performed more stably.
- the relationship between the input analysis image and the output image can be analyzed at the pixel level, thereby obtaining the convolutional neural network.
- the positive or negative influence between the input image and the output image provides guidance for how to improve the number of filters, parameters, and the like of the convolutional neural network.
- the activation function layer, the activation recorder layer, and the mask layer are all one.
- the present application is not limited thereto, and in fact, the activation function layer, the activation recorder layer, and the mask layer may be plural, and they may be There is a one-to-one correspondence, that is, for example, there are three activation function layers, then there are three active recorder layers to replace the original three activation function layers one by one, and three mask layers to replace three in one-to-one correspondence. Activate the logger layer.
- FIG. 4 shows a convolutional layer 401 of convolutional neural network 400 and a simplified diagram thereof.
- convolutional layer 401 in convolutional neural network 400 is shown, which is simplified as a simplified diagram shown in the right part of Fig. 4.
- convolutional neural networks may exist, and there are usually more than one convolutional layer, which are not shown here.
- FIG. 5 shows a simplified diagram of the activation function layer 402 of the convolutional neural network 400.
- an activation function layer 402 is typically added at the end of the convolutional layer 401 to form a convolutional neural network 400 having an activation function layer 402.
- the input and output of the activation function layer 402 in FIG. 5 adopt the phase in the convolution layer in FIG.
- L in the symbol indicates the number of layers, and the number of layers in the convolution layer and the activation function layer can be different. Therefore, it is only necessary to take L as a different value in the graph of the convolution layer and the activation function layer to distinguish.
- Fig. 6 shows a schematic diagram of step S301 in the processing method shown in Fig. 3.
- the activation recorder layer 403 is utilized as the activation function layer 402 in the convolutional neural network 400, in response to inputting a probe image having content to the convolutional neural network 400, wherein the activation recorder layer 403 performs an activation function Layer 402 has the same activation operation and records the activation result of the activation operation, for example, Where i, j represents the number of rows and columns of pixels of the input image, L represents the first layer, and n represents the first input terminals.
- Fig. 7 shows a schematic diagram of step S302 in the processing method shown in Fig. 3.
- Fig. 8 shows an embodiment (step S303-1) of step S303 in the processing method shown in Fig. 3.
- This embodiment is an embodiment for analyzing the influence of each pixel of the input image on each pixel of the output image as a positive influence.
- step S303-1 includes: step S3031-1, inputting an all-zero image to the modified convolutional neural network 700 to output a first output value as a bias coefficient; and step S3031-2, to the modified convolutional nerve
- the network 700 inputs one or more analysis images to output an output image of the modified convolutional neural network 700 based on the bias coefficients, wherein the analysis image is a binary image that is 1 at one pixel and 0 at other pixels;
- Step S3031-3 by analyzing the correspondence relationship between the analysis image and the output image at the pixel level to obtain the influence of each pixel of the input image of the convolutional neural network 400 on each pixel of the output image as a positive influence.
- the offset coefficient b eff [p, q] is an output of zero input 0 (all zero images of all input values equal to zero), and represents a bias coefficient b eff [p, q] The overall contribution to the output value.
- step S3031-2 the following analysis image ⁇ n,m [i,j] is input again:
- h n,m [p,q] represents the contribution of an input pixel (n,m) to the output pixel (p,q), ie positive influence. This is similar to the concept of "impulse response" in linear systems.
- the above example describes the case where it is desired to analyze the positive influence of a certain pixel (n, m) in the input image on the output, but the embodiment is not limited thereto, and in fact, the input image may also be analyzed.
- the positive effect of some (rather than one) specific pixels on the output is similar to that described above, except that the analysis image needs to be changed to a binary image of 1 at some particular pixels and 0 at other pixels.
- the relationship between the input analysis image and the output image can be analyzed at the pixel level, thereby obtaining the convolutional neural network.
- the positive influence between the input image and the output image provides guidance for how to improve the number of filters, parameters, and the like of the convolutional neural network.
- Fig. 9 shows another embodiment step S303-2 of step S303 in the processing method shown in Fig. 3.
- This embodiment is one embodiment for analyzing the influence of each pixel of the output image on each pixel of the input image as a reverse effect.
- step S303-2 includes: step S3031-2, inputting an all-zero image to the modified convolutional neural network 700 to output a first output value as a bias coefficient; and step S3032-2, to the modified volume
- the neural network 700 inputs all possible different analysis images to output an output image of the modified convolutional neural network 700 based on the bias coefficients, wherein the analysis image is a binary image of 1 at one pixel and 0 at other pixels, and the positions of pixels of 1 in different analysis images are different; step S3033-2, by analyzing the analysis image and the output image in pixels Correspondence in level obtains the effect of each pixel of the output image of the convolutional neural network 400 on each pixel of the input image as a reverse effect.
- step S303-1 when analyzing the reverse influence, based on a certain output image, it is desired to know how the output image is affected by the input image, but since the output image is not known in advance Which or which input images are obtained, so that all possible different analysis images (when all pixels (n, m) are 1) can be input to obtain all output images H p,q [n,m].
- H p,q [n,m] represents the contribution of an output pixel (p,q) to the input pixel (n,m), which is substantially the same as h n,m [p,q] as depicted in FIG. Because they are all derived from the same convolutional neural network, but the analysis angle and application are different.
- each input image corresponding to a certain output image to be analyzed is known, and thus the influence of one or some pixels of the output image on each pixel of the input image is obtained as a reverse influence.
- this embodiment may store all possible different analysis images and all output images, and may have to pass through the modified convolutional neural network 700 multiple times in advance. operating.
- the configuration and modification of the convolutional neural network of this embodiment is similar to that of step S303-1 shown in Fig. 8, and therefore the processing of the convolutional neural network itself is relatively simple.
- Such an embodiment can generally be applied to applications such as machine identification, machine classification, and the like. Because, in such applications, more attention is paid to how the results of machine recognition, machine classification, etc. are affected by the input image, or which input image or input image (which pixel points) can output the result, etc. . For example, if the result of machine recognition, machine classification, etc. is a flower, it is usually desirable to know which input image or which of the input images (which pixel points) can obtain the result of a flower.
- various analysis images of a binary image as a pixel level are input to the modified convolutional neural network to obtain various output images, and the relationship between an output image and its corresponding analysis image can be analyzed at the pixel level.
- the inverse effect between the output image and the input image for the convolutional neural network is obtained, thereby providing guidance on how to improve the number of filters, parameters and the like of the convolutional neural network.
- Fig. 10A shows still another embodiment of step S302 and step S303 in step S302 and step S303 in the processing method shown in Fig. 3.
- This embodiment is another embodiment for analyzing the influence of each pixel of the output image on each pixel of the input image as a reverse effect.
- a deconvolution network is configured as a modified convolutional neural network 700-1, wherein the deconvolution network is a reverse network of the convolutional neural network 400.
- the convolution layer becomes a deconvolution layer
- the pooling layer becomes an unpooling layer and the activation function in the masking layer does not change, etc.
- the specific process of deconvolution network is in the field Well known, it will not be described here.
- step S303' including: step S3031-1
- step S3032-1 input one or more analysis images to the modified convolutional neural network 700-1, for example:
- step S303' is substantially similar to the steps and principles of step S303-1 shown in Fig. 8, except that the reverse effect is obtained in step S3033-1', and therefore, it will not be described herein.
- the modified convolutional neural network 700 is replaced with a deconvolution network 700-1.
- Deconvolution networks can be understood as the inverse of convolutional neural networks.
- the input of the deconvolution network 700-1 is relative to the output of the original pre-replacement convolutional neural network 700
- the output of the deconvolution network 700-1 is relative to the original pre-replacement convolutional neural network 700.
- FIG. 10B shows an example diagram of a modified convolutional neural network. Therefore, in order to analyze the influence of an output image of the original pre-replacement convolutional neural network 700 on the input image, it is only necessary to use the certain output image as an input to the deconvolution network 700-1.
- This embodiment can avoid storing all possible different analysis images and all output images in the embodiment shown in FIG. 9 without going through the operation of the modified convolutional neural network 700 multiple times in advance, but this embodiment It may be necessary to deconvolute the original convolutional neural network 700 to obtain a corresponding deconvolution neural network.
- the relationship between an output image and its corresponding analysis image can be analyzed at the pixel level to obtain a reversal between the output image and the input image for the original convolutional neural network.
- Influence which provides guidance for how to improve the number of filters, parameters, etc. of the convolutional neural network.
- Figure 11A shows a schematic diagram of an upsampled layer in a convolutional neural network.
- an upsampling layer (referred to as a MUXOUT layer) may be included in the original convolutional neural network.
- the up-sampling layer has the structure shown in FIG. 11A and is capable of up-sampling the input pixels to obtain higher resolution.
- the general definition of the MUXOUT layer is as follows:
- U 1 ,...,U M are the upsampling operators in the feature that replicates pixels to different locations than zero:
- the number of outputs of the feature is constant, which is equal to c.
- c denotes the first input terminal and (p, q) denotes the input pixel.
- Figure 11B shows an example of the location of adding the upsampled layer MUXOUT layer in a convolutional neural network.
- the upsampled layer MUXOUT layer can be placed at other locations.
- the resolution is increased by adding an upsampled layer MUXOUT layer.
- Figure 11C shows a schematic diagram of understanding the downsampling layer in a convolutional neural network.
- an upsampling layer is added to the convolutional neural network, it is also used in the corresponding deconvolution network.
- the upsampled layer of the upsampling layer replaces the upsampled layer.
- the pooling layer becomes an unpooling layer, and the upsampling layer is also downsampled. Floor.
- Figure 12 shows a block diagram of a processing device 1200 for a convolutional neural network, in accordance with one embodiment of the present invention.
- the processing device 1200 for convolutional neural networks shown in Figure 12 includes a recorder 1201 that uses an active recorder layer as an activation function layer in a convolutional neural network, wherein the input has content in response to input to the convolutional neural network a probe image, the recorder causing the activation recorder layer to perform the same activation operation as the activation function layer and recording an activation result of the activation operation; a modifier 1202 configured to modify the convolutional neural network
- the modifying step includes replacing the active recorder layer with a masking layer, wherein the masking layer uses the recorded activation result;
- the analyzer 1203 is configured to input an analysis image to the modified convolutional neural network as Inputting an image to output an output image of the modified convolutional neural network to analyze a positive or negative effect between the input image and the output image, wherein the analysis image is pixel level two Value image.
- the analyzer 1203 can be configured to: input an all zero image to the modified convolutional neural network to output a first output value as a bias coefficient; input one or to the modified convolutional neural network a plurality of analysis images to output an output image of the modified convolutional neural network based on the bias coefficient, wherein the analysis image is a binary image that is 1 at a certain pixel and 0 at other pixels; The influence of each pixel of the input image of the modified convolutional neural network on each pixel of the output image is positively affected by analyzing the correspondence between the analysis image and the output image at the pixel level.
- the convolutional neural network can include an upsampling layer.
- the analyzer 1203 can be configured to: input an all zero image to the modified convolutional neural network to output a first output value as a bias coefficient; input all possible to the modified convolutional neural network Different analysis images to output an output image of the modified convolutional neural network based on the bias coefficient, wherein the analysis image is a binary image that is 1 at one pixel and 0 at other pixels And the positions of the pixels of 1 in different analysis images are different; by analyzing the correspondence relationship between the analysis image and the output image at the pixel level, each pixel pair input image of the output image of the convolutional neural network is obtained. The effect of each pixel is affected as a reverse.
- the modifier 1202 can be configured to: configure a deconvolution network as a modified convolutional neural network, wherein the deconvolution network is a reverse network of the convolutional neural network, wherein the analyzer 1203 is configured to: input an all zero image to the modified convolutional neural network for output a first output value as an offset coefficient; inputting one or more analysis images to the modified convolutional neural network to output an output image of the modified convolutional neural network based on the bias coefficient, wherein the analysis image Is a binary image of 1 at a certain pixel and 0 at other pixels; obtaining an input image of the modified convolutional neural network by analyzing a correspondence relationship between the analysis image and the output image at a pixel level The effect of each pixel on each pixel of the output image is adversely affected.
- the convolutional neural network may include an upsampling layer
- the analyzer 1203 is configured to configure the deconvolution network as a modified convolutional neural network by using the upsampling The corresponding downsampled layer of the layer replaces the upsampled layer.
- the activation recorder layer is used to record the activation result of the activation operation of the activation function layer which originally caused the nonlinearity of the convolutional neural network, and the activation result of the record is fixed by the mask layer, so that the convolutional neural network is modified from nonlinear to linear.
- the subsequent analysis is performed more stably.
- the relationship between the input analysis image and the output image can be analyzed at the pixel level, thereby obtaining the convolutional neural network.
- the positive or negative influence between the input image and the output image provides guidance for how to improve the number of filters, parameters, and the like of the convolutional neural network.
- Figure 13 shows a schematic diagram of a convolutional neural network in accordance with one embodiment of the present invention.
- the convolutional neural network 1300 as shown in FIG. 13 includes one or more convolution layers 1301, 1301'...; one or more masking layers 1302, 1302' corresponding to the one or more convolution layers... Replacing a corresponding one or more active recorder layers as an activation function layer in a convolutional neural network, wherein in response to inputting a content with content to the convolutional neural network An image, the one or more activation recorder layers performing the same activation operation as the activation function layer and recording an activation result of the activation operation, wherein the one or more masking layers use the activated result of the recording; An input 1303 that receives one or more analysis images; an output 1304 that outputs an output image of the modified convolutional neural network to analyze a positive influence between the input image and the output image or The reverse effect, wherein the analysis image is a binary image at the pixel level.
- the above-exemplified activation function layer, activation recorder layer, and mask layer may be one or more, respectively, and the activation function layer, the activation recorder layer, and the mask layer may be multiple.
- the activation function layer, the activation recorder layer, and the mask layer may be multiple.
- there are three activation function layers then there are three active recorder layers to correspond one-to-one as the original three activation function layers, and there are three masking layers. Replace the three active recorder layers one by one.
- the input 1303 prior to receiving the analysis image, can receive an all zero image to output a first output value from the output as a bias coefficient; the input 1303 can be configured to receive the One or more analysis images to output an output image of the modified convolutional neural network from the output end 1305 based on the bias coefficient, wherein the analysis image is 1 at a certain pixel and at other pixels a binary image of 0; by analyzing a correspondence relationship between the analysis image and the output image at a pixel level to obtain an influence of each pixel of the input image of the modified convolutional neural network on each pixel of the output image as Positive impact.
- the convolutional neural network 1300 further includes an upsampling layer 1305.
- an upsampling layer 1305. the location of these upsampled layers is only an example, and is not limiting, and may actually be at other locations.
- the input 1303 prior to receiving the analysis image, can receive an all zero image to output a first output value from the output as a bias factor, the input 1303 can receive all possible different An image is analyzed to output an output image of the modified convolutional neural network from the output 1304 based on the bias coefficient, wherein the analysis image is 1 at a certain pixel and 0 at other pixels a binary image, and positions of pixels of 1 in different analysis images are different; by analyzing a correspondence relationship between the analysis image and the output image at a pixel level to obtain respective pixels of an output image of the convolutional neural network The effect on each pixel of the input image is reversed.
- the convolutional neural network 1300 can be replaced with a deconvolution network as a modified convolutional neural network, wherein the deconvolution network is a reverse network of the convolutional neural network,
- the input 1303 is replaced with an output of the modified convolutional neural network, the output 1304 being replaced with an input of the modified convolutional neural network, wherein the input of the modified convolutional neural network is received
- An all-zero image to output a first output value as an offset coefficient from an output of the modified convolutional neural network
- the input of the modified convolutional neural network receives all one or more analysis images to be based on Outputting an output image of the modified convolutional neural network, wherein the analysis image is a binary image that is 1 at a certain pixel and 0 at other pixels; by analyzing the analysis image and The correspondence of the output images at the pixel level is obtained to obtain the influence of each pixel of the input image of the modified convolutional neural network on each pixel of the output image as a reverse influence.
- the convolutional neural network 1300 can also include an upsampling layer 1305 in which the upsampling layer 1305 is replaced with a corresponding downsampled layer of the upsampling layer 1305.
- the activation recorder layer is used to record the activation result of the activation operation of the activation function layer which originally caused the nonlinearity of the convolutional neural network, and the activation result of the record is fixed by the mask layer, so that the convolutional neural network is modified from nonlinear to linear.
- the subsequent analysis is performed more stably.
- the relationship between the input analysis image and the output image can be analyzed at the pixel level, thereby obtaining the convolutional neural network.
- the positive or negative influence between the input image and the output image provides guidance for how to improve the number of filters, parameters, and the like of the convolutional neural network.
- FIG. 14 illustrates an exemplary processing system that can be used to implement the processing methods of the present disclosure.
- the processing system 1000 includes at least one processor 1002 that executes instructions stored in the memory 1004. These instructions may be, for example, instructions for implementing functions described as being performed by one or more of the above-described modules or instructions for implementing one or more of the above methods.
- the processor 1002 can access the memory 1004 through the system bus 1006. In addition to storing executable instructions, the memory 1004 can also store training data and the like.
- the processor 1002 can be a variety of computing capable devices such as a central processing unit (CPU) or a graphics processing unit GPU.
- the CPU can be an X86 or ARM processor; the GPU can be directly integrated directly into the motherboard, or built into the Northbridge of the motherboard, or built into the central processing unit (CPU).
- Processing system 1000 also includes a data store 1008 that is accessible by processor 1002 via system bus 1006.
- Data store 1008 can include executable instructions, multiple image training data, and the like.
- Processing system 1000 also includes an input interface 1010 that allows an external device to communicate with processing system 1000.
- input interface 1010 can be used to receive instructions from an external computer device, from a user, and the like.
- Processing system 1000 can also include an output interface 1012 that interfaces processing system 1000 with one or more external devices.
- processing system 1000 can display images and the like through output interface 1012. It is contemplated that external devices that communicate with processing system 1000 through input interface 1010 and output interface 1012 can be included in an environment that provides a user interface with which virtually any type of user can interact.
- Examples of user interface types include graphical user interfaces, natural user interfaces, and the like.
- the graphical user interface can accept input from a user's input device(s), such as a keyboard, mouse, remote control, etc., and provide output on an output device such as a display.
- the natural language interface allows users to behave in a way that is not subject to constraints imposed by input devices such as keyboards, mice, remote controls, and the like.
- processing system 1000 may be used to interact with processing system 1000.
- natural user interfaces may rely on speech recognition, touch and stylus recognition, gesture recognition on and near the screen, aerial gestures, head and eye tracking, speech and speech, vision, touch, gestures, and machine intelligence.
- processing system 1000 is illustrated as a single system, it will be appreciated that the processing system 1000 can also be a distributed system, and can also be arranged as a cloud facility (including a public cloud or a private cloud). Thus, for example, several devices can communicate over a network connection and can collectively perform tasks that are described as being performed by processing system 1000.
- Computer readable media includes computer readable storage media.
- the computer readable storage medium can be any available storage medium that can be accessed by a computer.
- such computer readable media may comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, disk storage or other magnetic storage device, or can be used to carry or store instructions or data structures. Any other medium that expects program code and can be accessed by a computer. Additionally, the propagated signals are not included within the scope of computer readable storage media.
- Computer readable media also includes communication media including any medium that facilitates transfer of a computer program from one place to another.
- the connection can be, for example, a communication medium.
- the software uses coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technology such as infrared, radio, and microwave to transmit from a web site, server, or other remote source
- coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of communication medium.
- the functions described herein may be performed at least in part by one or more hardware logic components.
- FPGA Field Programmable Gate Array
- ASIC Program Specific Integrated Circuit
- ASSP Program Specific Standard Product
- SOC System on Chip
- CPLD Complex Programmable Logic Device
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
提供一种卷积神经网络和用于其的处理方法、装置、系统、介质,该方法包括利用激活记录器层作为卷积神经网络中的激活函数层,其中响应于向所述卷积神经网络输入具有内容的探测图像,其中所述激活记录器层执行与所述激活函数层相同的激活操作并记录所述激活操作的激活结果;修改所述卷积神经网络,所述修改步骤包括利用掩盖层来替换所述激活记录器层,其中所述掩盖层使用记录的所述激活结果;向修改后的卷积神经网络输入分析图像作为输入图像以输出所述修改后的卷积神经网络的输出图像,以便分析所述输入图像与所述输出图像之间的正向影响或反向影响,其中,所述分析图像是像素级别的二值图像。
Description
本专利申请要求于2017年2月21日递交的中国专利申请第201710094069.9号的优先权,在此全文引用上述中国专利申请公开的内容以作为本申请的一部分。
本发明的实施例涉及图像处理领域,且更具体地涉及一种卷积神经网络和用于其的处理方法、装置、系统、介质。
当前,基于人工神经网络的深度学习技术已经在诸如图像分类、图像捕获和搜索、面部识别、年龄和语音识别等领域取得了巨大进展。卷积神经网络(Convolutional Neural Network,CNN)是近年发展起来并引起广泛重视的一种人工神经网络,CNN是一种特殊的图像识别方式,属于非常有效的带有前向反馈的网络。现在,CNN的应用范围已经不仅仅限于图像识别领域,也可以应用在人脸识别、文字识别、图像处理等应用方向。
发明内容
本发明的至少一个实施例提供了一种用于卷积神经网络的处理方法,包括如下步骤:利用激活记录器层作为卷积神经网络中的激活函数层,其中响应于向所述卷积神经网络输入具有内容的探测图像,其中所述激活记录器层执行与所述激活函数层相同的激活操作并记录所述激活操作的激活结果;修改所述卷积神经网络,所述修改步骤包括利用掩盖层来替换所述激活记录器层,其中所述掩盖层使用记录的所述激活结果;向修改后的卷积神经网络输入分析图像作为输入图像以输出所述修改后的卷积神经网络的输出图像,以便分析所述输入图像与所述输出图像之间的正向影响或反向影响,其中,所述分析图像是像素级别的二值图像。
例如,所述向修改后的卷积神经网络输入分析图像作为输入图像以获得分析结果、以便分析所述修改前的卷积神经网络的输入图像与输出图像之间
的正向影响的步骤包括:向修改后的卷积神经网络输入全零图像以输出第一输出值作为偏置系数;向修改后的卷积神经网络输入一个或多个分析图像以基于所述偏置系数输出所述修改后的卷积神经网络的输出图像,其中所述分析图像是在某个像素处为1而在其他像素处为0的二值图像;通过分析所述分析图像与输出图像在像素级别上的对应关系以获得所述修改后的卷积神经网络的输入图像的各个像素对输出图像的各个像素的影响作为正向影响。
例如,所述卷积神经网络包括增采样层。
例如,所述向修改后的卷积神经网络输入分析图像作为输入图像以获得分析结果的步骤、以便分析所述修改前的卷积神经网络的输入图像与输出图像之间的反向影响包括:向修改后的卷积神经网络输入全零图像以输出第一输出值作为偏置系数;向修改后的卷积神经网络输入所有可能的不同的分析图像以基于所述偏置系数输出所述修改后的卷积神经网络的输出图像,其中所述分析图像是在某个像素处为1而在其他像素处为0的二值图像,且不同的分析图像中为1的像素的位置是不同的;通过分析所述分析图像与输出图像在像素级别上的对应关系以获得所述卷积神经网络的输出图像的各个像素对输入图像的各个像素的影响作为反向影响。
例如,所述修改所述卷积神经网络的步骤包括利用掩盖层来替换所述激活记录器层、其中所述掩盖层使用记录的所述激活结果包括:配置解卷积网络作为修改后的卷积神经网络,其中所述解卷积网络是所述卷积神经网络的逆向网络,其中所述向修改后的卷积神经网络输入分析图像作为输入图像以获得分析结果、以便分析所述修改前的卷积神经网络的输入图像与输出图像之间的反向影响的步骤包括:向修改后的卷积神经网络输入全零图像以输出第一输出值作为偏置系数;向修改后的卷积神经网络输入一个或多个分析图像以基于所述偏置系数输出所述修改后的卷积神经网络的输出图像,其中所述分析图像是在某个像素处为1而在其他像素处为0的二值图像;通过分析所述分析图像与输出图像在像素级别上的对应关系以获得所述修改后的卷积神经网络的输入图像的各个像素对输出图像的各个像素的影响作为反向影响。
例如,所述卷积神经网络包括增采样层,所述配置解卷积网络作为修改后的卷积神经网络包括:用所述增采样层相应的降采样层替换所述增采样层。
本发明的至少一个实施例提供了一种用于卷积神经网络的处理装置,包
括:记录器,使用激活记录器层作为卷积神经网络中的激活函数层,其中响应于向所述卷积神经网络输入具有内容的探测图像,所述记录器使得所述激活记录器层执行与所述激活函数层相同的激活操作并记录所述激活操作的激活结果;修改器,被配置为修改所述卷积神经网络,所述修改步骤包括利用掩盖层来替换所述激活记录器层,其中所述掩盖层使用记录的所述激活结果;分析器,被配置为向修改后的卷积神经网络输入分析图像作为输入图像以输出所述修改后的卷积神经网络的输出图像,以便分析所述输入图像与所述输出图像之间的正向影响或反向影响,其中,所述分析图像是像素级别的二值图像。
例如,所述分析器被配置为:向修改后的卷积神经网络输入全零图像以输出第一输出值作为偏置系数;向修改后的卷积神经网络输入一个或多个分析图像以基于所述偏置系数输出所述修改后的卷积神经网络的输出图像,其中所述分析图像是在某个像素处为1而在其他像素处为0的二值图像;通过分析所述分析图像与输出图像在像素级别上的对应关系以获得所述修改后的卷积神经网络的输入图像的各个像素对输出图像的各个像素的影响作为正向影响。
例如,所述卷积神经网络包括增采样层。
例如,所述分析器被配置为:向修改后的卷积神经网络输入全零图像以输出第一输出值作为偏置系数;向修改后的卷积神经网络输入所有可能的不同的分析图像以基于所述偏置系数输出所述修改后的卷积神经网络的输出图像,其中所述分析图像是在某个像素处为1而在其他像素处为0的二值图像,且不同的分析图像中为1的像素的位置是不同的;通过分析所述分析图像与输出图像在像素级别上的对应关系以获得所述卷积神经网络的输出图像的各个像素对输入图像的各个像素的影响作为反向影响。
例如,所述修改器被配置为:配置解卷积网络作为修改后的卷积神经网络,其中所述解卷积网络是所述卷积神经网络的逆向网络,其中所述分析器被配置为:向修改后的卷积神经网络输入全零图像以输出第一输出值作为偏置系数;向修改后的卷积神经网络输入一个或多个分析图像以基于所述偏置系数输出所述修改后的卷积神经网络的输出图像,其中所述分析图像是在某个像素处为1而在其他像素处为0的二值图像;通过分析所述分析图像与输
出图像在像素级别上的对应关系以获得所述修改后的卷积神经网络的输入图像的各个像素对输出图像的各个像素的影响作为反向影响。
例如,所述卷积神经网络包括增采样层,则所述分析器被配置为通过如下步骤来配置解卷积网络作为修改后的卷积神经网络:用所述增采样层相应的降采样层替换所述增采样层。
本发明的至少一个实施例提供了一种用于卷积神经网络的处理系统,包括:一个或多个处理器;一个或多个存储器,其中存储了计算机可读代码,所述计算机可读代码当由所述一个或多个处理器执行时进行根据本发明的至少一个实施例中的任一所述的处理方法。
本发明的至少一个实施例提供了一种卷积神经网络,包括:一个或多个卷积层;与所述一个或多个卷积层对应的一个或多个掩盖层来替换对应的一个或多个激活记录器层,所述一个或多个激活记录器层作为卷积神经网络中的激活函数层,其中响应于向所述卷积神经网络输入具有内容的探测图像,所述一个或多个激活记录器层执行与所述激活函数层相同的激活操作并记录所述激活操作的激活结果,其中所述一个或多个掩盖层使用记录的所述激活结果;输入端,其接收一个或多个分析图像;输出端,其输出所述修改后的卷积神经网络的输出图像,以便分析所述输入图像与所述输出图像之间的正向影响或反向影响,其中,所述分析图像是像素级别的二值图像。
例如,在接收分析图像之前,所述输入端接收全零图像以从所述输出端输出第一输出值作为偏置系数;所述输入端被配置为接收所述一个或多个分析图像以基于所述偏置系数从所述输出端输出所述修改后的卷积神经网络的输出图像,其中所述分析图像是在某个像素处为1而在其他像素处为0的二值图像;通过分析所述分析图像与输出图像在像素级别上的对应关系以获得所述修改后的卷积神经网络的输入图像的各个像素对输出图像的各个像素的影响作为正向影响。
例如,卷积神经网络还包括增采样层。
例如,在接收分析图像之前,所述输入端接收全零图像以从所述输出端输出第一输出值作为偏置系数,所述输入端接收所有可能的不同的分析图像以基于所述偏置系数从所述输出端输出所述修改后的卷积神经网络的输出图像,其中所述分析图像是在某个像素处为1而在其他像素处为0的二值图像,
且不同的分析图像中为1的像素的位置是不同的;通过分析所述分析图像与输出图像在像素级别上的对应关系以获得所述卷积神经网络的输出图像的各个像素对输入图像的各个像素的影响作为反向影响。
例如,所述卷积神经网络可被替换为解卷积网络作为修改后的卷积神经网络,其中所述解卷积网络是所述卷积神经网络的逆向网络,所述输入端被替换为修改后的卷积神经网络的输出端,所述输出端被替换为修改后的卷积神经网络的输入端,其中所述修改后的卷积神经网络的输入端接收全零图像以从所述修改后的卷积神经网络的输出端输出第一输出值作为偏置系数;所述修改后的卷积神经网络的输入端接收全一个或多个分析图像以基于所述偏置系数输出所述修改后的卷积神经网络的输出图像,其中所述分析图像是在某个像素处为1而在其他像素处为0的二值图像;通过分析所述分析图像与输出图像在像素级别上的对应关系以获得所述修改后的卷积神经网络的输入图像的各个像素对输出图像的各个像素的影响作为反向影响。
例如,卷积神经网络还包括增采样层,其中,在所述解卷积网络中采用所述增采样层相应的降采样层替换所述增采样层。
本发明的至少一个实施例提供一种计算机存储介质,用于存储计算机可读代码,所述计算机可读代码当由一个或多个处理器执行时进行根据本发明的至少一个实施例中的任一所述的处理方法。
为了更清楚地说明本发明实施例的技术方案,下面将对实施例的附图作简单地介绍,显而易见地,下面描述中的附图仅仅涉及本发明的一些实施例,而非对本发明的限制。
图1示出了一种卷积神经网络的简单示意图。
图2示出了由于卷积神经网络中的激活函数的激活结果而等效的少量滤波器的示例。
图3示出了根据本发明的一个实施例的用于卷积神经网络的处理方法的流程图。
图4示出了卷积神经网络的卷积层及其简化图。
图5示出了卷积神经网络的激活函数层的简化图。
图6示出了如图3所示的处理方法中的步骤S301的示意图。
图7示出了如图3所示的处理方法中的步骤S302的示意图。
图8示出了如图3所示的处理方法中的步骤S303的一个实施例。
图9示出了如图3所示的处理方法中的步骤S303的另一个实施例。
图10A示出了如图3所示的处理方法中的步骤S302和步骤S303的再一个实施例。
图10B示出了修改后的卷积神经网络的示例图。
图11A示出了卷积神经网络中的增采样层的示意图。
图11B示出了在卷积神经网络中添加该增采样层MUXOUT层的位置的示例。
图11C示出了解卷积神经网络中的降采样层的示意图。
图12示出了根据本发明的一个实施例的用于卷积神经网络的处理装置的方框图。
图13示出了根据本发明的一个实施例的卷积神经网络的示意图。
图14示出了可以用于实现本公开的处理方法的示例性处理系统。
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例的附图,对本发明实施例的技术方案进行清楚、完整地描述。显然,所描述的实施例是本发明的一部分实施例,而不是全部的实施例。基于所描述的本发明的实施例,本领域普通技术人员在无需创造性劳动的前提下所获得的所有其他实施例,都属于本发明保护的范围。
除非另外定义,本公开使用的技术术语或者科学术语应当为本发明所属领域内具有一般技能的人士所理解的通常意义。本公开中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。同样,“包括”或者“包含”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者物件及其等同,而不排除其他元件或者物件。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电性连接或信号连接,不管是直接的还是间接的。
信息技术市场在过去5年里在深度学习领域已经有了巨大的投资。今天
这种技术的主要用途是解决人工智能(AI)问题,如:推荐引擎、图像分类、图像字幕和搜索、面部识别、年龄识别、语音识别等。一般来说,深度学习技术已经成功地解决了类似数据的理解,例如描述图像的内容或识别困难的图像中的对象条件,或在嘈杂环境中识别语音。深度学习的另一个优点是其通用结构,允许相对类似的系统解决非常不同的问题。与其前代方法相比,神经网络、深层学习结构在过滤器和层的数量上大得多。
深度学习系统的主要组成部分是卷积网络。卷积网络是使用图像作为输入/输出并且通过滤波器(卷积)来替换标量权重的神经网络结构。作为示例,具有3层的简单结构在图1中示出。该卷积神经网络例如用于图像处理,使用图像作为输入和输出,例如通过滤波器(即,卷积)替代标量权重。图1中示出了具有3层的简单结构的卷积神经网络。如图1所示,该结构在左侧的四个输入端子处获取4个输入图像,在中心的隐藏层(hidden layer)具有3个单位(输出图像),并且在输出层具有2个单位,产生2个输出图像。具有权重的每个框对应于过滤器(例如,3×3或5×5内核),其中k是指示输入层编号的标签,i和j分别是指示输入和输出单元的标签。偏置(bias)是添加到卷积的输出的标量。加入几个卷积和偏置的结果然后通过激活盒(activity box),激活盒通常对应于整流线性单元(rectifying linear unit,ReLU)或S形函数或双曲正切等。滤波器和偏置在系统的操作期间是固定的,通过使用一组输入/输出示例图像的训练过程获得,并且被调整以适合取决于应用的一些优化准则。典型的配置涉及每层中的十分之一或数百个过滤器。具有3层的网络被认为是浅的,而大于5或10的层的数目通常被认为是深的。
卷积网络是一种非线性系统。该非线性是由于激活函数(activity function),其防止整个系统被减少到作用于每个输入的少量的一组滤波器。在本发明中,将卷积网络解释为自适应滤波器是方便的。首先,假设整流线性单元(ReLU)作为激活函数。对于固定输入,激活盒的一些输入将是正的,然后由于激活函数的线性形状而不变地传递到下一层;并且激活盒的其他输入将为负,从而消除输出中的任何效果。在图2中示出了示例。图2中示出了由于卷积神经网络中的激活函数的激活结果而等效的少量滤波器的示例。其中假设激活第一层中的第二ReLU和第二层中的第一ReLU的特定输入。对于该特定输入,到其他ReLU的输入是负的,因此可以在图2中省略,因为它们不影响输出。得到的系统是线性的,并且可以减少到4个不同的滤波器,加上偏置,作用
于每个输入。对于不同的输入也是如此,但随着不同的输入,各个ReLU的激活状态将改变,从而改变作为单个滤波器的结果。然后,对于任意输入,系统的净效应总是等效于少量的一组滤波器集合加上偏置(例如图2所示的一组滤波器集合加上偏置),但是滤波器随着输入而改变,导致自适应滤波器效应(adaptive filter effect)。
当今深度学习的主要目前缺点是难以解释工作系统的参数。深度学习系统的典型使用从网络架构的选择开始,接着训练模型并获得一组参数(滤波器系数和偏置)。如果训练过程成功,对于给定的输入网络,输出将以高精度匹配期望的目标,在许多情况下比任何其他方法更好。但是有很多问题仍然很难回答,比如:网络架构是这个问题的最佳选择吗?参数的数量是否足够?还是太多?而且,在更基本的层面上,这些参数如何在网络内部工作以获得输出?与使用少数层(浅网络)相比,使用许多层(深层网络)如何帮助提高结果?
深度网络架构中的过滤器通常很小(3x3或5x5),并且一个接一个地可视化大量过滤器不能给系统提供太多的洞察力,并且偏置是不给出在网络内工作的复杂机制的线索的标量数。理解工作深度学习系统的参数在很大程度上仍然是一个开放的问题。
本发明是各个实施例引入了一种系统,其允许扩展用于线性系统的经典方法以分析卷积网络。线性系统可以通过所谓的脉冲响应而被完全描述。在本说明书中,脉冲响应指的是对于除了单个位置为1之外的任何地方等于零的输入的输出。卷积网络由于激活函数(ReLU)而不是线性系统。但是,根据本发明的各个实施例,能够记录ReLU的激活状态并固定它,使得系统变成线性的,并且可以进行脉冲响应分析。
脉冲响应可以示出输入像素对输出像素的影响。使用来自线性系统的标准方法,也可以获得相反的关系,即,哪些输入像素用于获得输出像素,以及每个输入像素有多重要,即输出像素对输入像素的影响。这可以被可视化为表示网络的整体过滤器效果的图像。
本发明的至少一个实施例提供了一种卷积神经网络和用于其的处理方法、装置、系统,可以用于确定由一个输入的单个或几个像素对产生的输出的影响和效果(简称正向影响)或由输出中的单个或几个像素是如何被输入影响的(简称反向影响)。
图3示出了根据本发明的一个实施例的用于卷积神经网络的处理方法300的流程图。
如图3所示,用于卷积神经网络的处理方法300包括:步骤S301,利用激活记录器层作为卷积神经网络中的激活函数层,其中响应于向所述卷积神经网络输入具有内容的探测图像,其中所述激活记录器层执行与所述激活函数层相同的激活操作并记录所述激活操作的激活结果;步骤S302,修改所述卷积神经网络,所述修改步骤包括利用掩盖层来替换所述激活记录器层,其中所述掩盖层使用记录的所述激活结果;步骤S303,向修改后的卷积神经网络输入分析图像作为输入图像以输出所述修改后的卷积神经网络的输出图像,以便分析所述输入图像与所示输出图像之间的正向影响或反向影响,其中,所述分析图像是像素级别的二值图像。
如此,利用激活记录器层来记录原本导致卷积神经网络非线性的激活函数层的激活操作的激活结果,并用掩盖层固定记录的激活结果,使得该卷积神经网络从非线性修改为线性,从而更稳定地进行后续分析。而且,向修改后的卷积神经网络输入作为像素级别的二值图像的分析图像,可以在像素级别上分析该输入的分析图像与输出图像之间的关系,从而获得用于该卷积神经网络的该输入图像和输出图像之间的正向影响或反向影响,从而给如何改进卷积神经网络的滤波器个数、参数等等提供指导意义。
以上均举例激活函数层、激活记录器层、掩盖层均为1个的情况,然而,本申请不限于此,事实上激活函数层、激活记录器层、掩盖层可以是多个,且它们之间是一一对应的关系,即例如存在三个激活函数层,则有三个激活记录器层来一一对应地替换原来的三个激活函数层,而有三个掩盖层来一一对应地替换三个激活记录器层。
图4示出了卷积神经网络400的卷积层401及其简化图。
如图4左侧部分所示,示出了卷积神经网络400中的一个卷积层401,简化后为图4右侧部分所示的简化的图。当然,卷积神经网络中可以存在,且通常存在多于一个卷积层,在此不一一示出。
图5示出了卷积神经网络400的激活函数层402的简化图。
如图5所示,通常在卷积层401的末尾添加激活函数层402,形成具有激活函数层402的卷积神经网络400。
在此,图5中的激活函数层402的输入和输出采用图4中的卷积层中相
同的符号,但是并不限制该输入和输出必须与图4中的卷积层的相同,因此符号中的L表示的是层数,卷积层和激活函数层的层数可以是不同的,因此在卷积层和激活函数层的图中只需要将L取为不同的值即可分辨。
图6示出了如图3所示的处理方法中的步骤S301的示意图。
在步骤S301中,利用激活记录器层403作为卷积神经网络400中的激活函数层402,其中响应于向卷积神经网络400输入具有内容的探测图像,其中激活记录器层403执行与激活函数层402相同的激活操作,并记录激活操作的激活结果,例如,其中,i,j表示输入图像的像素点的行数和列数,L表示第几层,n表示第几个输入端子。
图7示出了如图3所示的处理方法中的步骤S302的示意图。
在步骤S302中,修改卷积神经网络400,该修改步骤包括利用掩盖层404来替换激活记录器层403,其中掩盖层404使用记录的激活结果,例如,如此,得到修改后的卷积神经网络700。假设该修改后的卷积神经网络700将被表示为S,使得从修改后的卷积神经网络700输入x而得到的输出y可以被写为y=S(x)。
图8示出了如图3所示的处理方法中的步骤S303的一个实施例(步骤S303-1)。
该实施例是用于分析输入图像的各个像素对输出图像的各个像素的影响作为正向影响的一个实施例。
具体地,步骤S303-1包括:步骤S3031-1,向修改后的卷积神经网络700输入全零图像以输出第一输出值作为偏置系数;步骤S3031-2,向修改后的卷积神经网络700输入一个或多个分析图像以基于偏置系数输出修改后的卷积神经网络700的输出图像,其中分析图像是在某个像素处为1而在其他像素处为0的二值图像;步骤S3031-3,通过分析分析图像与输出图像在像素级别上的对应关系以获得卷积神经网络400的输入图像的各个像素对输出图像的各个像素的影响作为正向影响。
具体地,在步骤S3031-1中,该偏置系数beff[p,q]是零输入0(所有输入值等于零的全零图像)的输出,并且表示偏置系数beff[p,q]对输出值的总体贡献。
beff[p,q]=S(0)
其中,p,q表示输出图像的像素点的行数和列数。
在步骤S3031-2中,再输入如下分析图像δn,m[i,j]:
于是,基于步骤S3031-1中得到的偏置系数beff,得到如下输出图像hn,m[p,q]:
hn,m[p,q]=S(δn,m)-beff
hn,m[p,q]表示某个输入像素(n,m)对输出像素(p,q)的贡献,即正向影响。这类似于线性系统中的“脉冲响应”的概念。
在此,以上举例描述了想要分析输入图像中的某个特定像素(n,m)对输出的正向影响时的情况,但是该实施例不限于此,实际上也可以分析输入图像中的某些(而非一个)特定像素对输出的正向影响,方法类似于上述方式,只需要改变分析图像为在某些特定像素处为1、其它像素处为0的二值图像。
在此,由于想要分析哪几种分析图像的输出图像,即可以输入几次这几种分析图像,不一定要存储所有可能的分析图像,例如为1像素点可能处于分析图像中的所有位置处的大量分析图像,因此可以节省存储空间和系统资源。
如此,通过进一步分析输入对输出的贡献,给如何改进卷积神经网络的滤波器个数、参数等等提供指导意义。
总之,向修改后的卷积神经网络输入作为像素级别的二值图像的分析图像,可以在像素级别上分析该输入的分析图像与输出图像之间的关系,从而获得用于该卷积神经网络的该输入图像和输出图像之间的正向影响,从而给如何改进卷积神经网络的滤波器个数、参数等等提供指导意义。
图9示出了如图3所示的处理方法中的步骤S303的另一个实施例步骤S303-2。
该实施例是用于分析输出图像的各个像素对输入图像的各个像素的影响作为反向影响的一个实施例。
具体地,步骤S303-2包括:包括:步骤S3031-2,向修改后的卷积神经网络700输入全零图像以输出第一输出值作为偏置系数;步骤S3032-2,向修改后的卷积神经网络700输入所有可能的不同的分析图像以基于所述偏置系数输出所述修改后的卷积神经网络700的输出图像,其中所述分析图像是在
某个像素处为1而在其他像素处为0的二值图像,且不同的分析图像中为1的像素的位置是不同的;步骤S3033-2,通过分析所述分析图像与输出图像在像素级别上的对应关系以获得所述卷积神经网络400的输出图像的各个像素对输入图像的各个像素的影响作为反向影响。
在此,与步骤S303-1不同的是,在分析反向影响时,基于某种输出图像,想要知道该输出图像是如何被输入图像影响的,但是由于事先并不知道该输出图像是由哪个或哪些输入图像得到的,因此可以输入所有可能的不同的分析图像(所有的像素(n,m)分别为1的情况)来获得所有输出图像Hp,q[n,m]。其中Hp,q[n,m]表示某个输出像素(p,q)对输入像素(n,m)的贡献,其如图8中描述的hn,m[p,q]实质上相同,因为都是用相同的卷积神经网络得出的,只是分析角度和应用不同。从而,得知想要分析的某种输出图像所对应的各个输入图像,如此来获得输出图像的某个或某些像素对输入图像的各个像素的影响作为反向影响。
与如图8所示的步骤S303-1不同的是,该实施例可能要存储所有可能的不同的分析图像和所有输出图像,且可能要事先多次经过该修改后的卷积神经网络700的操作。而该实施例的卷积神经网络的配置和修改与如图8所示的步骤S303-1类似,因此对卷积神经网络本身的处理较为简单。
这种实施例通常可以应用于机器识别、机器分类等应用中。因为,在这类应用中,更关注机器识别、机器分类等的结果是如何被输入图像影响的,或者,什么样的输入图像或输入图像中的哪个(哪些像素点)能够输出该结果等等。例如,机器识别、机器分类等的结果是一朵花,则通常想要知道什么样的输入图像或输入图像中的哪个(哪些像素点)能够得到一朵花的结果。
总之,向修改后的卷积神经网络输入作为像素级别的二值图像的各种分析图像以得到各种输出图像,可以在像素级别上分析某个输出图像与其对应的分析图像之间的关系,从而获得用于该卷积神经网络的该输出图像和输入图像之间的反向影响,从而给如何改进卷积神经网络的滤波器个数、参数等等提供指导意义。
图10A示出了如图3所示的处理方法中的步骤S302和步骤S303的再一个实施例步骤S302’和步骤S303’。
该实施例是用于分析输出图像的各个像素对输入图像的各个像素的影响作为反向影响的另一个实施例。
具体地,在步骤302’中,配置解卷积网络作为修改后的卷积神经网络700-1,其中所述解卷积网络是所述卷积神经网络400的逆向网络。例如,卷积层变成解卷积层(Deconvolution),池化层变成解池化层(unpooling)且掩盖层中的激活函数不改变,等等,解卷积网络的具体过程是本领域公知的,在此不赘述。如此,在步骤S303’包括:步骤S3031-1,向修改后的卷积神经网络700-1输入全零图像0以输出第一输出值作为偏置系数Beff[n,m]=S(0);步骤S3032-1,向修改后的卷积神经网络700-1输入一个或多个分析图像,例如:
以基于所述偏置系数Beff[n,m]输出所述修改后的卷积神经网络700-1的输出图像Hp,q[n,m]=S(δi,j)-beff,其中所述分析图像是在某个像素处为1而在其他像素处为0的二值图像;步骤S3033-1’,通过分析所述分析图像δi,j[n,m]与输出图像Hp,q[n,m]在像素级别上的对应关系以获得所述修改后的卷积神经网络700-1的输入图像的各个像素对输出图像的各个像素的影响作为反向影响。
在此,步骤S303’实质上与图8所示的步骤S303-1的步骤和原理类似,除了在步骤S3033-1’中获得的是反向影响,因此,在此不赘述。在此,在步骤S302’中,该修改后的卷积神经网络700被替换为解卷积网络700-1。解卷积网络可以理解为是卷积神经网络的逆。因此,该解卷积网络700-1的输入相对于原始的替换前的卷积神经网络700的输出,而该解卷积网络700-1的输出相对于原始的替换前的卷积神经网络700的输入,参见图10B。图10B示出了修改后的卷积神经网络的示例图。因此,想要分析原始的替换前的卷积神经网络700的某个输出图像对输入图像的影响,只需要将该某个输出图像作为解卷积网络700-1的输入即可。
该实施例可避免如图9所示的实施例中的存储所有可能的不同的分析图像和所有输出图像,且不用事先多次经过该修改后的卷积神经网络700的操作,但该实施例中可能需要对原来的卷积神经网络700进行解卷积处理,来得到对应的解卷积神经网络。
总之,可以在像素级别上分析某个输出图像与其对应的分析图像之间的关系,从而获得用于原始卷积神经网络的该输出图像和输入图像之间的反向
影响,从而给如何改进卷积神经网络的滤波器个数、参数等等提供指导意义。
图11A示出了卷积神经网络中的增采样层的示意图。
在该实施例中,原始的卷积神经网络中可以包括一个增采样(upsampling)层(称为MUXOUT层)。该增采样层具有图11A所示的结构,且能够将输入的像素增采样,获得更高的分辨率。
具体地,如图11A所示,该MUXOUT层的因子M=Mx×My增加了输入特征i=0…H-1,j=0...W-1到输出特征p=0...MyH-1,q=0...MxW-1的分辨率。MUXOUT层的通用定义如下:
首先,U1,…,UM为将像素复制到不同位置处的大于零的特征中的增采样算子:
其中%是“模数”运算符,并且[x]是<x的最大整数,使得n=Mya+b+1。MUXOUT层需要输入特征的数目是M的倍数,也就是说,具有整数G的C=G·M。特征的输出数目不变,即等于c。其中,c表示第几个输入端子,(p,q)表示输入的像素。特征以M个特征的组被处理,因此将该组中的输入和输出划分为:x=[x1…xG]和=[y1…yG]。然后,MUXOUT层的输出可以写为:
y1=U1x1+…+UMxM
y2=U2x1+…+U1xM
yG=UMx1+…+UMxM
图11A的例子中,My=Mx=2(M=4)。
图11B示出了在卷积神经网络中添加该增采样层MUXOUT层的位置的示例。但是,这仅为示例而非限制,事实上,可以将增采样层MUXOUT层放置在其他位置处。
如此,通过添加增采样层MUXOUT层从而增加分辨率。
图11C示出了解卷积神经网络中的降采样层的示意图。
如果在卷积神经网络中添加了增采样层,则在相应的解卷积网络中还用
所述增采样层相应的降采样层替换所述增采样层。
也就是说,在解卷积神经网络中,除了将卷积层变成解卷积层(Deconvolution),池化层变成解池化层(unpooling),还要将升采样层变成降采样层。
图12示出了根据本发明的一个实施例的用于卷积神经网络的处理装置1200的方框图。
图12所示的用于卷积神经网络的处理装置1200包括:记录器1201,使用激活记录器层作为卷积神经网络中的激活函数层,其中响应于向所述卷积神经网络输入具有内容的探测图像,所述记录器使得所述激活记录器层执行与所述激活函数层相同的激活操作并记录所述激活操作的激活结果;修改器1202,被配置为修改所述卷积神经网络,所述修改步骤包括利用掩盖层来替换所述激活记录器层,其中所述掩盖层使用记录的所述激活结果;分析器1203,被配置为向修改后的卷积神经网络输入分析图像作为输入图像以输出所述修改后的卷积神经网络的输出图像,以便分析所述输入图像与所述输出图像之间的正向影响或反向影响,其中,所述分析图像是像素级别的二值图像。
在一个实施例中,所述分析器1203可以被配置为:向修改后的卷积神经网络输入全零图像以输出第一输出值作为偏置系数;向修改后的卷积神经网络输入一个或多个分析图像以基于所述偏置系数输出所述修改后的卷积神经网络的输出图像,其中所述分析图像是在某个像素处为1而在其他像素处为0的二值图像;通过分析所述分析图像与输出图像在像素级别上的对应关系以获得所述修改后的卷积神经网络的输入图像的各个像素对输出图像的各个像素的影响作为正向影响。
在一个实施例中,所述卷积神经网络可以包括增采样层。
在一个实施例中,所述分析器1203可以被配置为:向修改后的卷积神经网络输入全零图像以输出第一输出值作为偏置系数;向修改后的卷积神经网络输入所有可能的不同的分析图像以基于所述偏置系数输出所述修改后的卷积神经网络的输出图像,其中所述分析图像是在某个像素处为1而在其他像素处为0的二值图像,且不同的分析图像中为1的像素的位置是不同的;通过分析所述分析图像与输出图像在像素级别上的对应关系以获得所述卷积神经网络的输出图像的各个像素对输入图像的各个像素的影响作为反向影响。
在一个实施例中,所述修改器1202可以被配置为:配置解卷积网络作为
修改后的卷积神经网络,其中所述解卷积网络是所述卷积神经网络的逆向网络,其中所述分析器1203被配置为:向修改后的卷积神经网络输入全零图像以输出第一输出值作为偏置系数;向修改后的卷积神经网络输入一个或多个分析图像以基于所述偏置系数输出所述修改后的卷积神经网络的输出图像,其中所述分析图像是在某个像素处为1而在其他像素处为0的二值图像;通过分析所述分析图像与输出图像在像素级别上的对应关系以获得所述修改后的卷积神经网络的输入图像的各个像素对输出图像的各个像素的影响作为反向影响。
在一个实施例中,所述卷积神经网络可以包括增采样层,则所述分析器1203被配置为通过如下步骤来配置解卷积网络作为修改后的卷积神经网络:用所述增采样层相应的降采样层替换所述增采样层。
如此,利用激活记录器层来记录原本导致卷积神经网络非线性的激活函数层的激活操作的激活结果,并用掩盖层固定记录的激活结果,使得该卷积神经网络从非线性修改为线性,从而更稳定地进行后续分析。而且,向修改后的卷积神经网络输入作为像素级别的二值图像的分析图像,可以在像素级别上分析该输入的分析图像与输出图像之间的关系,从而获得用于该卷积神经网络的该输入图像和输出图像之间的正向影响或反向影响,从而给如何改进卷积神经网络的滤波器个数、参数等等提供指导意义。
图13示出了根据本发明的一个实施例的卷积神经网络的示意图。
如图13所示的卷积神经网络1300包括:一个或多个卷积层1301、1301’……;与所述一个或多个卷积层对应的一个或多个掩盖层1302、1302’……来替换对应的一个或多个激活记录器层,所述一个或多个激活记录器层作为卷积神经网络中的激活函数层,其中响应于向所述卷积神经网络输入具有内容的探测图像,所述一个或多个激活记录器层执行与所述激活函数层相同的激活操作并记录所述激活操作的激活结果,其中所述一个或多个掩盖层使用记录的所述激活结果;输入端1303,其接收一个或多个分析图像;输出端1304,其输出所述修改后的卷积神经网络的输出图像,以便分析所述输入图像与所述输出图像之间的正向影响或反向影响,其中,所述分析图像是像素级别的二值图像。
在此,注意,以上均举例激活函数层、激活记录器层、掩盖层可以分别是1个也可以分别是多个,在激活函数层、激活记录器层、掩盖层可以是多
个的情况下,它们之间是一一对应的关系,即例如存在三个激活函数层,则有三个激活记录器层来一一对应地作为原来的三个激活函数层,而有三个掩盖层来一一对应地替换三个激活记录器层。
在一个实施例中,在接收分析图像之前,所述输入端1303可以接收全零图像以从所述输出端输出第一输出值作为偏置系数;所述输入端1303可以被配置为接收所述一个或多个分析图像以基于所述偏置系数从所述输出端1305输出所述修改后的卷积神经网络的输出图像,其中所述分析图像是在某个像素处为1而在其他像素处为0的二值图像;通过分析所述分析图像与输出图像在像素级别上的对应关系以获得所述修改后的卷积神经网络的输入图像的各个像素对输出图像的各个像素的影响作为正向影响。
在一个实施例中,该卷积神经网络1300,还包括增采样层1305。当然,这些增采样层的位置仅是示例,而并不是限制,实际上可以在其他位置处。
在一个实施例中,在接收分析图像之前,所述输入端1303可以接收全零图像以从所述输出端输出第一输出值作为偏置系数,所述输入端1303可以接收所有可能的不同的分析图像以基于所述偏置系数从所述输出端1304输出所述修改后的卷积神经网络的输出图像,其中所述分析图像是在某个像素处为1而在其他像素处为0的二值图像,且不同的分析图像中为1的像素的位置是不同的;通过分析所述分析图像与输出图像在像素级别上的对应关系以获得所述卷积神经网络的输出图像的各个像素对输入图像的各个像素的影响作为反向影响。
在一个实施例中,所述卷积神经网络1300可被替换为解卷积网络作为修改后的卷积神经网络,其中所述解卷积网络是所述卷积神经网络的逆向网络,所述输入端1303被替换为修改后的卷积神经网络的输出端,所述输出端1304被替换为修改后的卷积神经网络的输入端,其中所述修改后的卷积神经网络的输入端接收全零图像以从所述修改后的卷积神经网络的输出端输出第一输出值作为偏置系数;所述修改后的卷积神经网络的输入端接收全一个或多个分析图像以基于所述偏置系数输出所述修改后的卷积神经网络的输出图像,其中所述分析图像是在某个像素处为1而在其他像素处为0的二值图像;通过分析所述分析图像与输出图像在像素级别上的对应关系以获得所述修改后的卷积神经网络的输入图像的各个像素对输出图像的各个像素的影响作为反向影响。
在一个实施例中,该卷积神经网络1300还可以包括增采样层1305,其中,在所述解卷积网络中采用所述增采样层1305相应的降采样层替换所述增采样层1305。
如此,利用激活记录器层来记录原本导致卷积神经网络非线性的激活函数层的激活操作的激活结果,并用掩盖层固定记录的激活结果,使得该卷积神经网络从非线性修改为线性,从而更稳定地进行后续分析。而且,向修改后的卷积神经网络输入作为像素级别的二值图像的分析图像,可以在像素级别上分析该输入的分析图像与输出图像之间的关系,从而获得用于该卷积神经网络的该输入图像和输出图像之间的正向影响或反向影响,从而给如何改进卷积神经网络的滤波器个数、参数等等提供指导意义。
图14示出了可以用于实现本公开的处理方法的示例性处理系统。
该处理系统1000包括执行存储在存储器1004中的指令的至少一个处理器1002。这些指令可以是例如用于实现被描述为由上述一个或多个模块执行的功能的指令或用于实现上述方法中的一个或多个步骤的指令。处理器1002可以通过系统总线1006访问存储器1004。除了存储可执行指令,存储器1004还可存储训练数据等。处理器1002可以为中央处理器(CPU)或图形处理器GPU等各种具有计算能力的器件。该CPU可以为X86或ARM处理器;GPU可单独地直接集成到主板上,或者内置于主板的北桥芯片中,也可以内置于中央处理器(CPU)上。
处理系统1000还包括可由处理器1002通过系统总线1006访问的数据存储1008。数据存储1008可包括可执行指令、多图像训练数据等。处理系统1000还包括允许外部设备与处理系统1000进行通信的输入接口1010。例如,输入接口1010可被用于从外部计算机设备、从用户等处接收指令。处理系统1000也可包括使处理系统1000和一个或多个外部设备相接口的输出接口1012。例如,处理系统1000可以通过输出接口1012显示图像等。考虑了通过输入接口1010和输出接口1012与处理系统1000通信的外部设备可被包括在提供实质上任何类型的用户可与之交互的用户界面的环境中。用户界面类型的示例包括图形用户界面、自然用户界面等。例如,图形用户界面可接受来自用户采用诸如键盘、鼠标、遥控器等之类的(诸)输入设备的输入,以及在诸如显示器之类的输出设备上提供输出。此外,自然语言界面可使得用户能够以无需受到诸如键盘、鼠标、遥控器等之类的输入设备强加的约束的方式
来与处理系统1000交互。相反,自然用户界面可依赖于语音识别、触摸和指示笔识别、屏幕上和屏幕附近的手势识别、空中手势、头部和眼睛跟踪、语音和语音、视觉、触摸、手势、以及机器智能等。
另外,处理系统1000尽管图中被示出为单个系统,但可以理解,处理系统1000也可以是分布式系统,还可以布置为云设施(包括公有云或私有云)。因此,例如,若干设备可以通过网络连接进行通信并且可共同执行被描述为由处理系统1000执行的任务。
本文中描述的各功能(包括但不限于卷积神经网络模块、选择模块等)可在硬件、软件、固件或其任何组合中实现。如果在软件中实现,则这些功能可以作为一条或多条指令或代码存储在计算机可读介质上或藉其进行传送。计算机可读介质包括计算机可读存储介质。计算机可读存储介质可以是能被计算机访问的任何可用存储介质。作为示例而非限定,这样的计算机可读介质可包括RAM、ROM、EEPROM、CD-ROM或其他光盘存储、磁盘存储或其他磁存储设备、或能被用来承载或存储指令或数据结构形式的期望程序代码且能被计算机访问的任何其他介质。另外,所传播的信号不被包括在计算机可读存储介质的范围内。计算机可读介质还包括通信介质,其包括促成计算机程序从一地向另一地转移的任何介质。连接例如可以是通信介质。例如,如果软件使用同轴电缆、光纤电缆、双绞线、数字订户线(DSL)、或诸如红外线、无线电、以及微波之类的无线技术来从web网站、服务器、或其它远程源传输,则该同轴电缆、光纤电缆、双绞线、DSL、或诸如红外线、无线电、以及微波之类的无线技术被包括在通信介质的定义中。上述的组合应当也被包括在计算机可读介质的范围内。替换地或另选地,此处描述的功能可以至少部分由一个或多个硬件逻辑组件来执行。例如,可使用的硬件逻辑组件的说明性类型包括现场可编程门阵列(FPGA)、程序专用的集成电路(ASIC)、程序专用的标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑器件(CPLD)等。
以上所述仅是本发明的示范性实施方式,而非用于限制本发明的保护范围,本发明的保护范围由所附的权利要求确定。
Claims (20)
- 一种用于卷积神经网络的处理方法,包括如下步骤:利用激活记录器层作为卷积神经网络中的激活函数层,其中响应于向所述卷积神经网络输入具有内容的探测图像,其中所述激活记录器层执行与所述激活函数层相同的激活操作并记录所述激活操作的激活结果;修改所述卷积神经网络,所述修改步骤包括利用掩盖层来替换所述激活记录器层,其中所述掩盖层使用记录的所述激活结果;向修改后的卷积神经网络输入分析图像作为输入图像以输出所述修改后的卷积神经网络的输出图像,以便分析所述输入图像与所述输出图像之间的正向影响或反向影响,其中,所述分析图像是像素级别的二值图像。
- 根据权利要求1所述的处理方法,其中,所述向修改后的卷积神经网络输入分析图像作为输入图像以获得分析结果、以便分析所述修改前的卷积神经网络的输入图像与输出图像之间的正向影响的步骤包括:向修改后的卷积神经网络输入全零图像以输出第一输出值作为偏置系数;向修改后的卷积神经网络输入一个或多个分析图像以基于所述偏置系数输出所述修改后的卷积神经网络的输出图像,其中所述分析图像是在某个像素处为1而在其他像素处为0的二值图像;通过分析所述分析图像与输出图像在像素级别上的对应关系以获得所述修改后的卷积神经网络的输入图像的各个像素对输出图像的各个像素的影响作为正向影响。
- 根据权利要求1所述的处理方法,其中,所述卷积神经网络包括增采样层。
- 根据权利要求1所述的处理方法,其中,所述向修改后的卷积神经网络输入分析图像作为输入图像以获得分析结果的步骤、以便分析所述修改前的卷积神经网络的输入图像与输出图像之间的反向影响包括:向修改后的卷积神经网络输入全零图像以输出第一输出值作为偏置系数;向修改后的卷积神经网络输入所有可能的不同的分析图像以基于所述偏置系数输出所述修改后的卷积神经网络的输出图像,其中所述分析图像是在某个像素处为1而在其他像素处为0的二值图像,且不同的分析图像中为1 的像素的位置是不同的;通过分析所述分析图像与输出图像在像素级别上的对应关系以获得所述卷积神经网络的输出图像的各个像素对输入图像的各个像素的影响作为反向影响。
- 根据权利要求1所述的处理方法,其中,所述修改所述卷积神经网络的步骤包括利用掩盖层来替换所述激活记录器层、其中所述掩盖层使用记录的所述激活结果包括:配置解卷积网络作为修改后的卷积神经网络,其中所述解卷积网络是所述卷积神经网络的逆向网络,其中所述向修改后的卷积神经网络输入分析图像作为输入图像以获得分析结果、以便分析所述修改前的卷积神经网络的输入图像与输出图像之间的反向影响的步骤包括:向修改后的卷积神经网络输入全零图像以输出第一输出值作为偏置系数;向修改后的卷积神经网络输入一个或多个分析图像以基于所述偏置系数输出所述修改后的卷积神经网络的输出图像,其中所述分析图像是在某个像素处为1而在其他像素处为0的二值图像;通过分析所述分析图像与输出图像在像素级别上的对应关系以获得所述修改后的卷积神经网络的输入图像的各个像素对输出图像的各个像素的影响作为反向影响。
- 根据权利要求5所述的处理方法,其中,所述卷积神经网络包括增采样层,所述配置解卷积网络作为修改后的卷积神经网络包括:用所述增采样层相应的降采样层替换所述增采样层。
- 一种用于卷积神经网络的处理装置,包括:记录器,使用激活记录器层作为卷积神经网络中的激活函数层,其中响应于向所述卷积神经网络输入具有内容的探测图像,所述记录器使得所述激活记录器层执行与所述激活函数层相同的激活操作并记录所述激活操作的激活结果;修改器,被配置为修改所述卷积神经网络,所述修改步骤包括利用掩盖层来替换所述激活记录器层,其中所述掩盖层使用记录的所述激活结果;分析器,被配置为向修改后的卷积神经网络输入分析图像作为输入图像 以输出所述修改后的卷积神经网络的输出图像,以便分析所述输入图像与所述输出图像之间的正向影响或反向影响,其中,所述分析图像是像素级别的二值图像。
- 根据权利要求7所述的处理装置,其中,所述分析器被配置为:向修改后的卷积神经网络输入全零图像以输出第一输出值作为偏置系数;向修改后的卷积神经网络输入一个或多个分析图像以基于所述偏置系数输出所述修改后的卷积神经网络的输出图像,其中所述分析图像是在某个像素处为1而在其他像素处为0的二值图像;通过分析所述分析图像与输出图像在像素级别上的对应关系以获得所述修改后的卷积神经网络的输入图像的各个像素对输出图像的各个像素的影响作为正向影响。
- 根据权利要求7所述的处理装置,其中,所述卷积神经网络包括增采样层。
- 根据权利要求7所述的处理装置,其中,所述分析器被配置为:向修改后的卷积神经网络输入全零图像以输出第一输出值作为偏置系数;向修改后的卷积神经网络输入所有可能的不同的分析图像以基于所述偏置系数输出所述修改后的卷积神经网络的输出图像,其中所述分析图像是在某个像素处为1而在其他像素处为0的二值图像,且不同的分析图像中为1的像素的位置是不同的;通过分析所述分析图像与输出图像在像素级别上的对应关系以获得所述卷积神经网络的输出图像的各个像素对输入图像的各个像素的影响作为反向影响。
- 根据权利要求7所述的处理装置,其中,所述修改器被配置为:配置解卷积网络作为修改后的卷积神经网络,其中所述解卷积网络是所述卷积神经网络的逆向网络,其中所述分析器被配置为:向修改后的卷积神经网络输入全零图像以输出第一输出值作为偏置系数;向修改后的卷积神经网络输入一个或多个分析图像以基于所述偏置系数输出所述修改后的卷积神经网络的输出图像,其中所述分析图像是在某个像 素处为1而在其他像素处为0的二值图像;通过分析所述分析图像与输出图像在像素级别上的对应关系以获得所述修改后的卷积神经网络的输入图像的各个像素对输出图像的各个像素的影响作为反向影响。
- 根据权利要求11所述的处理装置,其中,所述卷积神经网络包括增采样层,则所述分析器被配置为通过如下步骤来配置解卷积网络作为修改后的卷积神经网络:用所述增采样层相应的降采样层替换所述增采样层。
- 一种用于卷积神经网络的处理系统,包括:一个或多个处理器;一个或多个存储器,其中存储了计算机可读代码,所述计算机可读代码当由所述一个或多个处理器执行时进行如权利要求1-6中任一所述的处理方法。
- 一种卷积神经网络,包括:一个或多个卷积层;与所述一个或多个卷积层对应的一个或多个掩盖层来替换对应的一个或多个激活记录器层,所述一个或多个激活记录器层作为卷积神经网络中的激活函数层,其中响应于向所述卷积神经网络输入具有内容的探测图像,所述一个或多个激活记录器层执行与所述激活函数层相同的激活操作并记录所述激活操作的激活结果,其中所述一个或多个掩盖层使用记录的所述激活结果;输入端,其接收一个或多个分析图像;输出端,其输出所述修改后的卷积神经网络的输出图像,以便分析所述输入图像与所述输出图像之间的正向影响或反向影响,其中,所述分析图像是像素级别的二值图像。
- 根据权利要求14所述的卷积神经网络,其中,在接收分析图像之前,所述输入端接收全零图像以从所述输出端输出第一输出值作为偏置系数;所述输入端被配置为接收所述一个或多个分析图像以基于所述偏置系数从所述输出端输出所述修改后的卷积神经网络的输出图像,其中所述分析图像是在某个像素处为1而在其他像素处为0的二值图像;通过分析所述分析图像与输出图像在像素级别上的对应关系以获得所述 修改后的卷积神经网络的输入图像的各个像素对输出图像的各个像素的影响作为正向影响。
- 根据权利要求14所述的卷积神经网络,还包括增采样层。
- 根据权利要求14所述的卷积神经网络,其中,在接收分析图像之前,所述输入端接收全零图像以从所述输出端输出第一输出值作为偏置系数;所述输入端接收所有可能的不同的分析图像以基于所述偏置系数从所述输出端输出所述修改后的卷积神经网络的输出图像,其中所述分析图像是在某个像素处为1而在其他像素处为0的二值图像,且不同的分析图像中为1的像素的位置是不同的;通过分析所述分析图像与输出图像在像素级别上的对应关系以获得所述卷积神经网络的输出图像的各个像素对输入图像的各个像素的影响作为反向影响。
- 根据权利要求14所述的卷积神经网络,其中,所述卷积神经网络可被替换为解卷积网络作为修改后的卷积神经网络,其中所述解卷积网络是所述卷积神经网络的逆向网络,所述输入端被替换为修改后的卷积神经网络的输出端,所述输出端被替换为修改后的卷积神经网络的输入端,其中所述修改后的卷积神经网络的输入端接收全零图像以从所述修改后的卷积神经网络的输出端输出第一输出值作为偏置系数;所述修改后的卷积神经网络的输入端接收全一个或多个分析图像以基于所述偏置系数输出所述修改后的卷积神经网络的输出图像,其中所述分析图像是在某个像素处为1而在其他像素处为0的二值图像;通过分析所述分析图像与输出图像在像素级别上的对应关系以获得所述修改后的卷积神经网络的输入图像的各个像素对输出图像的各个像素的影响作为反向影响。
- 根据权利要求18所述的卷积神经网络,还包括增采样层,其中,在所述解卷积网络中采用所述增采样层相应的降采样层替换所述增采样层。
- 一种计算机存储介质,用于存储计算机可读代码,所述计算机可读代码当由一个或多个处理器执行时进行如权利要求1-6中任一所述的处理方法。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP17897328.5A EP3588385A4 (en) | 2017-02-21 | 2017-11-17 | NEURAL FOLDING NETWORK AND PROCESSING METHODS, DEVICE AND SYSTEM FOR IT, AND MEDIUM |
US16/069,376 US11620496B2 (en) | 2017-02-21 | 2017-11-17 | Convolutional neural network, and processing method, processing device, processing system and medium for the same |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710094069.9 | 2017-02-21 | ||
CN201710094069.9A CN108460454B (zh) | 2017-02-21 | 2017-02-21 | 卷积神经网络和用于其的处理方法、装置、系统 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018153128A1 true WO2018153128A1 (zh) | 2018-08-30 |
Family
ID=63229224
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/111617 WO2018153128A1 (zh) | 2017-02-21 | 2017-11-17 | 卷积神经网络和用于其的处理方法、装置、系统、介质 |
Country Status (4)
Country | Link |
---|---|
US (1) | US11620496B2 (zh) |
EP (1) | EP3588385A4 (zh) |
CN (1) | CN108460454B (zh) |
WO (1) | WO2018153128A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110969217A (zh) * | 2018-09-28 | 2020-04-07 | 杭州海康威视数字技术股份有限公司 | 基于卷积神经网络进行图像处理的方法和装置 |
CN111858989A (zh) * | 2020-06-09 | 2020-10-30 | 西安工程大学 | 一种基于注意力机制的脉冲卷积神经网络的图像分类方法 |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018091486A1 (en) | 2016-11-16 | 2018-05-24 | Ventana Medical Systems, Inc. | Convolutional neural networks for locating objects of interest in images of biological samples |
CN108958801B (zh) | 2017-10-30 | 2021-06-25 | 上海寒武纪信息科技有限公司 | 神经网络处理器及使用处理器执行向量最大值指令的方法 |
US12094456B2 (en) | 2018-09-13 | 2024-09-17 | Shanghai Cambricon Information Technology Co., Ltd. | Information processing method and system |
CN109816098B (zh) * | 2019-01-25 | 2021-09-07 | 京东方科技集团股份有限公司 | 神经网络的处理方法及评估方法、数据分析方法及装置 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104809426A (zh) * | 2014-01-27 | 2015-07-29 | 日本电气株式会社 | 卷积神经网络的训练方法、目标识别方法及装置 |
US20150309961A1 (en) * | 2014-04-28 | 2015-10-29 | Denso Corporation | Arithmetic processing apparatus |
CN106326939A (zh) * | 2016-08-31 | 2017-01-11 | 深圳市诺比邻科技有限公司 | 卷积神经网络的参数优化方法及系统 |
CN106355244A (zh) * | 2016-08-30 | 2017-01-25 | 深圳市诺比邻科技有限公司 | 卷积神经网络的构建方法及系统 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
RU2579958C1 (ru) * | 2014-12-25 | 2016-04-10 | Федеральное государственное бюджетное образовательное учреждение высшего образования "Донской государственный технический университет" | Искусственный нейрон |
US10034005B2 (en) * | 2015-06-05 | 2018-07-24 | Sony Corporation | Banding prediction for video encoding |
US9723144B1 (en) * | 2016-09-20 | 2017-08-01 | Noble Systems Corporation | Utilizing predictive models to improve predictive dialer pacing capabilities |
US10170110B2 (en) * | 2016-11-17 | 2019-01-01 | Robert Bosch Gmbh | System and method for ranking of hybrid speech recognition results with neural networks |
CN107124609A (zh) | 2017-04-27 | 2017-09-01 | 京东方科技集团股份有限公司 | 一种视频图像的处理系统、其处理方法及显示装置 |
-
2017
- 2017-02-21 CN CN201710094069.9A patent/CN108460454B/zh active Active
- 2017-11-17 WO PCT/CN2017/111617 patent/WO2018153128A1/zh unknown
- 2017-11-17 US US16/069,376 patent/US11620496B2/en active Active
- 2017-11-17 EP EP17897328.5A patent/EP3588385A4/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104809426A (zh) * | 2014-01-27 | 2015-07-29 | 日本电气株式会社 | 卷积神经网络的训练方法、目标识别方法及装置 |
US20150309961A1 (en) * | 2014-04-28 | 2015-10-29 | Denso Corporation | Arithmetic processing apparatus |
CN106355244A (zh) * | 2016-08-30 | 2017-01-25 | 深圳市诺比邻科技有限公司 | 卷积神经网络的构建方法及系统 |
CN106326939A (zh) * | 2016-08-31 | 2017-01-11 | 深圳市诺比邻科技有限公司 | 卷积神经网络的参数优化方法及系统 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3588385A4 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110969217A (zh) * | 2018-09-28 | 2020-04-07 | 杭州海康威视数字技术股份有限公司 | 基于卷积神经网络进行图像处理的方法和装置 |
CN110969217B (zh) * | 2018-09-28 | 2023-11-17 | 杭州海康威视数字技术股份有限公司 | 基于卷积神经网络进行图像处理的方法和装置 |
CN111858989A (zh) * | 2020-06-09 | 2020-10-30 | 西安工程大学 | 一种基于注意力机制的脉冲卷积神经网络的图像分类方法 |
CN111858989B (zh) * | 2020-06-09 | 2023-11-10 | 西安工程大学 | 一种基于注意力机制的脉冲卷积神经网络的图像分类方法 |
Also Published As
Publication number | Publication date |
---|---|
CN108460454B (zh) | 2022-07-26 |
EP3588385A4 (en) | 2020-12-30 |
US11620496B2 (en) | 2023-04-04 |
US20210209448A1 (en) | 2021-07-08 |
EP3588385A1 (en) | 2020-01-01 |
CN108460454A (zh) | 2018-08-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018153128A1 (zh) | 卷积神经网络和用于其的处理方法、装置、系统、介质 | |
TWI773189B (zh) | 基於人工智慧的物體檢測方法、裝置、設備及儲存媒體 | |
US11321593B2 (en) | Method and apparatus for detecting object, method and apparatus for training neural network, and electronic device | |
CN108351984B (zh) | 硬件高效的深度卷积神经网络 | |
JP7425147B2 (ja) | 画像処理方法、テキスト認識方法及び装置 | |
US12032906B2 (en) | Method, apparatus and device for quality control and storage medium | |
US20210264195A1 (en) | Technologies for enabling analytics of computing events based on augmented canonicalization of classified images | |
WO2022227768A1 (zh) | 动态手势识别方法、装置、设备以及存储介质 | |
CN109816098B (zh) | 神经网络的处理方法及评估方法、数据分析方法及装置 | |
US11164306B2 (en) | Visualization of inspection results | |
JP7177878B2 (ja) | 画像処理方法、画像処理装置、電子機器及び記憶媒体 | |
EP3852007A2 (en) | Method, apparatus, electronic device, readable storage medium and program for classifying video | |
WO2023005386A1 (zh) | 模型训练方法和装置 | |
JP2023531350A (ja) | サンプル画像を増分する方法、画像検出モデルの訓練方法及び画像検出方法 | |
WO2023221422A1 (zh) | 用于文本识别的神经网络及其训练方法、文本识别的方法 | |
US11514695B2 (en) | Parsing an ink document using object-level and stroke-level processing | |
CN111539897A (zh) | 用于生成图像转换模型的方法和装置 | |
CN111091182A (zh) | 数据处理方法、电子设备及存储介质 | |
US20190205728A1 (en) | Method for visualizing neural network models | |
JP2023501934A (ja) | ゲームのスーパーレゾリューション | |
CN111291833A (zh) | 应用于监督学习系统训练的数据增强方法和数据增强装置 | |
CN112308145A (zh) | 一种分类网络训练方法、分类方法、装置以及电子设备 | |
CN117151987A (zh) | 一种图像增强方法、装置及电子设备 | |
CN115700838A (zh) | 用于图像识别模型的训练方法及其装置、图像识别方法 | |
WO2023273621A1 (zh) | 脚本生成方法、装置、设备及介质 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17897328 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2017897328 Country of ref document: EP Effective date: 20190923 |