WO2020151438A1 - 神经网络的处理方法及评估方法、数据分析方法及装置 - Google Patents

神经网络的处理方法及评估方法、数据分析方法及装置 Download PDF

Info

Publication number
WO2020151438A1
WO2020151438A1 PCT/CN2019/127431 CN2019127431W WO2020151438A1 WO 2020151438 A1 WO2020151438 A1 WO 2020151438A1 CN 2019127431 W CN2019127431 W CN 2019127431W WO 2020151438 A1 WO2020151438 A1 WO 2020151438A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
neural network
input
layer
output
Prior art date
Application number
PCT/CN2019/127431
Other languages
English (en)
French (fr)
Inventor
那彦波
刘瀚文
卢运华
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Priority to US17/042,265 priority Critical patent/US20210049447A1/en
Publication of WO2020151438A1 publication Critical patent/WO2020151438A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • the embodiments of the present disclosure relate to a neural network processing method and processing device, neural network evaluation method, data analysis method and data analysis device, and computer-readable storage medium.
  • CNN Convolution Neural Network
  • a convolutional neural network is a nonlinear system that includes multiple layer structures and nonlinear units connecting these layer structures. These nonlinear units can allow the convolutional neural network to adapt to various inputs.
  • the input of the linear layer, A N and B N are determined according to the input matrix and the output matrix, where N is a positive integer.
  • the processing method provided by some embodiments of the present disclosure further includes: performing the linearization processing on all the nonlinear layers in the neural network to determine the expressions of linear functions corresponding to all the nonlinear layers.
  • the at least one nonlinear layer includes an activation layer, an instance normalization layer, a maximum pooling layer, or a softmax layer, and the activation function of the activation layer is a ReLU function, tanh function or sigmod function.
  • Some embodiments of the present disclosure also provide a neural network-based data analysis method, including: acquiring input data; using the neural network to process the input data to obtain first output data; according to the input data and the The first output data is to execute the processing method according to any one of the above, and perform the linearization processing on all the nonlinear layers in the neural network to determine the linearized neural network corresponding to the neural network; The linearized neural network analyzes the correspondence between the input data and the first output data.
  • analyzing the corresponding relationship between the input data and the output data includes: determining detection data based on the input data Group, wherein the detection data group is a binary matrix group; the detection data group is processed by the linearized neural network to obtain a second output data group; based on the detection data group and the second output data Group to analyze the positive or negative influence between the input data and the first output data.
  • the detection data group includes at least one detection data
  • the second output data group includes at least one second output data
  • the at least one detection data is related to the At least one second output data has a one-to-one correspondence.
  • analyzing the positive influence between the input data and the first output data includes: using the linear The neural network processes the at least one detection data separately to obtain the at least one second output data; by analyzing the element-level correspondence between the at least one detection data and the at least one second output data, the The positive influence of each input element of the input data on each output element of the first output data, wherein each of the detection data in the detection data group includes a target detection element, and the value of the target detection element Is 1, and the values of the remaining detection elements in each detection data except the target detection element are all 0.
  • the detection data group includes a plurality of detection data
  • the positions of target detection elements in at least part of the detection data in the plurality of detection data are different.
  • the size of the multiple detection data is the same, and the size of the multiple detection data is the same as the size of the input data.
  • the detection data group includes a plurality of detection data
  • the second output data group includes a plurality of second output data
  • the plurality of detection data is related to the A one-to-one correspondence between a plurality of second output data, based on the detection data set and the second output data set, analyzing the reverse influence between the input data and the first output data, including: using the linear The neural network processes the plurality of detection data separately to obtain the plurality of second output data; by analyzing the correspondence between the plurality of detection data and the plurality of second output data at the element level, determine the The reverse influence of each output element of the output data on each input element of the input data, wherein each of the detection data in the detection data group includes a target detection element, and the value of the target detection element is 1.
  • the values of the remaining detection elements in each detection data except for the target detection element are all 0, and the number of the multiple detection data is the same as the number of all detection elements in each detection data, The positions of the target detection elements of any two detection data in the plurality of detection data are different.
  • At least some embodiments of the present disclosure further provide a neural network evaluation method, including: executing the processing method according to any one of the above embodiments to determine at least one linear interpreter unit corresponding to the at least one nonlinear layer; Based on the at least one linear interpreter unit, the neural network is evaluated.
  • evaluating the neural network includes: evaluating the at least one linear interpreter unit to determine the at least one linear interpreter unit The evaluation result of the non-linear layer; based on the evaluation result, the neural network is trained.
  • training the neural network based on the evaluation result includes: determining the training weight of the at least one nonlinear layer based on the evaluation result; obtaining training input Data and training target data; use the neural network to process the training input data to obtain training output data; calculate the loss value of the loss function of the neural network based on the training output data and the training target data
  • the parameters of the neural network are modified based on the training weight of the at least one nonlinear layer and the loss value, and the trained neural network is obtained when the loss function of the neural network meets a predetermined condition, and When the loss function of the neural network does not meet the predetermined condition, continue to input the training input data and the training target data to repeat the above training process.
  • At least some embodiments of the present disclosure also provide a neural network processing device, including: a memory for storing computer-readable instructions; and a processor for running the computer-readable instructions, and the computer-readable instructions are When the processor is running, the processing method according to any of the above embodiments can be executed.
  • At least some embodiments of the present disclosure also provide a computer-readable storage medium for storing computer-readable instructions, and when the computer-readable instructions are executed by a computer, the processing method according to any of the above-mentioned embodiments can be executed.
  • Figure 1 is a schematic diagram of a convolutional neural network
  • Figure 2 is a schematic diagram of a small number of filters equivalent to the activation result of the activation function in the convolutional neural network
  • FIG. 3 is a flowchart of a neural network processing method provided by some embodiments of the present disclosure.
  • FIG. 4 is a schematic diagram of a nonlinear layer and a linear function corresponding to the nonlinear layer provided by some embodiments of the present disclosure
  • FIG. 5A is a schematic diagram of a partial structure of a neural network provided by some embodiments of the present disclosure.
  • FIG. 5B is a schematic partial structural diagram of a modified neural network provided by some embodiments of the present disclosure.
  • FIG. 6A is a schematic structural diagram of a neural network provided by some embodiments of the present disclosure.
  • 6B is a schematic structural diagram of a linearized neural network provided by some embodiments of the present disclosure.
  • FIG. 7 is a flowchart of a neural network-based data analysis method provided by some embodiments of the present disclosure.
  • FIG. 8 is a flowchart of a neural network evaluation method provided by some embodiments of the present disclosure.
  • FIG. 9 is a flowchart of a neural network training method provided by some embodiments of the present disclosure.
  • FIG. 10 is a schematic diagram of a neural network processing device according to some embodiments of the disclosure.
  • FIG. 11 is a schematic diagram of a data analysis device according to some embodiments of the disclosure.
  • Convolutional neural network is a neural network structure that uses, for example, an image as input and output, and replaces scalar weights with filters (convolution kernels).
  • the convolutional neural network includes multiple nonlinear layers. For example, the activation layer, the instance normalization layer, the maximum pooling layer, or the softmax layer are all nonlinear layers.
  • Convolutional neural network is one of the representative algorithms of deep learning system.
  • the main disadvantage of deep learning systems is that it is difficult to explain the working process of neural networks.
  • a network architecture is first selected, and then the network architecture is trained to obtain a set of parameters (filter coefficients and bias). If the trained network is better, then for a given input, the output of the trained network will match the desired target with high accuracy.
  • parameters filter coefficients and bias
  • the filters in the deep neural network architecture are usually small (3*3 convolution kernels or 5*5 convolution kernels, etc.), and visualizing a large number of filters one by one does not provide an in-depth understanding of the deep neural network architecture.
  • bias is a scalar quantity that cannot provide clues to the complex mechanisms working in the deep neural network architecture. Understanding the parameters of deep learning systems is still a difficult problem to a large extent.
  • At least some embodiments of the present disclosure provide a neural network processing method and processing device, neural network evaluation method, data analysis method and data analysis device, and computer-readable storage medium.
  • the neural network processing method can linearize the neural network.
  • the classical methods of linear systems for example, impulse response
  • the configuration of the neural network can also be optimized .
  • FIG. 1 is a schematic diagram of a convolutional neural network.
  • the convolutional neural network can be used to process images, speech, text, and so on.
  • Fig. 1 only shows a convolutional neural network with a 3-layer structure, which is not limited in the embodiment of the present disclosure.
  • the convolutional neural network includes an input layer 101, a hidden layer 102, and an output layer 103.
  • the input layer 101 has 4 inputs 121
  • the hidden layer 102 has 3 outputs 122
  • the output layer 103 has 2 outputs 123.
  • the convolutional neural network finally outputs 2 images.
  • the 4 inputs 121 of the input layer 101 may be 4 images, or 4 features of 1 image.
  • the three outputs 122 of the hidden layer 102 may be feature maps of the image input through the input layer 101.
  • the 3 outputs of the hidden layer 102 may be feature maps of images or features (for example, 4 inputs 121) input via the input layer 101.
  • each convolutional layer has a weight And bias Weights Represents the convolution kernel, bias Is a scalar superimposed on the output of the convolutional layer, where k is the label of the input layer 101, and i and j are the labels of the unit of the input layer 101 and the unit of the hidden layer 102, respectively.
  • the hidden layer 102 may include a first convolutional layer 201 and a second convolutional layer 202.
  • the first convolution layer 201 includes a first set of convolution kernels (in Figure 1 ) And the first set of offsets (in Figure 1 ).
  • the second convolutional layer 202 includes a second set of convolution kernels (in Figure 1 ) And the second set of offsets (in Figure 1 ).
  • each convolutional layer includes tens or hundreds of convolution kernels. If the convolutional neural network is a deep convolutional neural network, it may include at least five convolutional layers.
  • the hidden layer 102 further includes a first active layer 203 and a second active layer 204.
  • the first activation layer 203 is located behind the first convolutional layer 201
  • the second activation layer 204 is located behind the second convolutional layer 202.
  • the activation layer (for example, the first activation layer 203 and the second activation layer 204) includes activation functions, which are used to introduce nonlinear factors into the convolutional neural network, so that the convolutional neural network can better solve more complex problems .
  • the activation function may include a rectifying linear unit (ReLU) function, a sigmoid function (Sigmoid function), or a hyperbolic tangent function (tanh function).
  • the activation layer can be used as a layer of the convolutional neural network alone, or the activation layer can also be included in the convolutional layer (for example, the first convolutional layer 201 can include the first activation layer 203, and the second convolutional layer 202 can be Including the second active layer 204).
  • the first convolution layer 201 For example, in the first convolution layer 201, first, several convolution kernels in the first group of convolution kernels are applied to each input 121 And several offsets in the first set of offsets In order to obtain the output of the first convolution layer 201; then, the output of the first convolution layer 201 can be processed by the first activation layer 203 to obtain the output of the first activation layer 203.
  • the output of the second convolutional layer 202 can be processed by the second activation layer 204 to obtain the output of the second activation layer 204.
  • the output of the first convolutional layer 201 can be a convolution kernel applied to its input And bias
  • the output of the second convolution layer 202 can be a convolution kernel applied to the output of the first activation layer 203 And bias The result of the addition.
  • the output of the first activation layer 203 is the output 122 of the hidden layer 102
  • the output of the second activation layer 204 is transmitted to the output layer 103 as the output 123 of the output layer 103.
  • Figure 2 is a schematic diagram of a small number of filters equivalent to the activation result of the activation function in the convolutional neural network.
  • the activation function can prevent the entire convolutional neural network architecture from being reduced to a small set of filters that act on each input.
  • Convolutional neural network can be interpreted as an adaptive filter. If the activation layer includes the ReLU function, if the input of the activation layer includes the first part of the input and the second part of the input, if the first part of the input is positive, the activation layer will pass the first part of the input to the next layer unchanged; If the second part of the input is negative, the second part of the input has no effect on the output of the activation layer.
  • the activation layer includes the ReLU function
  • the specific input of the second ReLU 2032 in the first activation layer 203 and the first ReLU 2041 in the second activation layer 204 is activated.
  • the The specific input is negative, that is, the remaining ReLU does not affect the output. Therefore, as shown in Fig. 2, the ReLU except the second ReLU 2032 in the first activation layer 203 and the first ReLU 2041 in the second activation layer 204 are omitted, and a linear convolutional neural network is obtained.
  • the linear convolutional neural network only includes four different filters and some biases, which act on each input. For different inputs, the activation state of each ReLU will be different, thereby changing the output result of the convolutional neural network.
  • the net effect of a convolutional neural network is always equivalent to a small number of filters and offsets (for example, a set of filters and offsets as shown in Figure 2), but the set of filters follows The input changes, resulting in an adaptive filter effect.
  • FIG. 3 is a flowchart of a neural network processing method provided by some embodiments of the present disclosure
  • FIG. 4 is a schematic diagram of a nonlinear layer and a linear function corresponding to the nonlinear layer provided by some embodiments of the present disclosure.
  • the neural network is a nonlinear system, and the neural network includes at least one nonlinear layer.
  • the neural network processing method provided by some embodiments of the present disclosure includes the following steps:
  • S11 Use the Nth nonlinear layer in at least one nonlinear layer to process the input matrix input to the Nth nonlinear layer to obtain an output matrix output by the Nth nonlinear layer;
  • the neural network may be a convolutional neural network.
  • the at least one non-linear layer includes an activation layer, an instance normalization layer, a maximum pooling layer, or a softmax layer.
  • the activation function of the activation layer can be a ReLU function, a tanh function, or a sigmoid function. The following describes the embodiments of the present disclosure in detail by taking the Nth nonlinear layer as the activation layer and the activation function of the activation layer being the ReLU function as an example, but the embodiments of the present disclosure are not limited to the case of the activation layer.
  • N is a positive integer and is less than or equal to the number of all nonlinear layers in the neural network.
  • the neural network includes M nonlinear layers, and M is a positive integer, then 1 ⁇ N ⁇ M. It should be noted that although the Nth nonlinear layer is taken as an example to describe the embodiments of the present disclosure in detail, the processing methods provided by the embodiments of the present disclosure are applicable to each nonlinear layer in the neural network.
  • the input matrices input to different non-linear layers may be the same or different.
  • a neural network includes a first nonlinear layer and a second nonlinear layer, and the first nonlinear layer and the second nonlinear layer can be different.
  • the first input matrix is the input of the first non-linear layer
  • the second input matrix is the input of the second non-linear layer.
  • the first input matrix and the second input matrix can be the same.
  • each element of the first input matrix, The size, etc. are respectively the same as the elements, sizes, etc. of the second input matrix; the first input matrix and the second input matrix may also be different.
  • the size of the first input matrix and the size of the second input matrix may be different. At least some elements of an input matrix are different from at least some elements of a second input matrix.
  • different non-linear layers output different output matrices.
  • Different nonlinear layers have different function expressions. For example, when the Nth nonlinear layer 420 is the active layer, and the activation function in the active layer is the ReLU function, the nonlinear function corresponding to the Nth nonlinear layer 420 The expression can be expressed as:
  • the expression of the nonlinear function corresponding to the Nth nonlinear layer 420 is expressed as:
  • the input matrix 401 is an input matrix input to the Nth nonlinear layer, that is, the input matrix 401 is input to the Nth nonlinear layer 420.
  • the Nth nonlinear layer 420 performs corresponding processing (for example, activation processing) on the input matrix 401 to obtain the output matrix 402, so that the output matrix 402 is the output matrix output by the Nth nonlinear layer 420.
  • x1 represents the input matrix 401
  • both the input matrix 401 and the output matrix 402 may be two-dimensional matrices, and the size of the input matrix 401 and the size of the output matrix 402 are the same.
  • the value of the point in the i-th row and the j-th column in the input matrix 401 is expressed as x1 i,j
  • the value of the point in the i-th row and j-th column in the output matrix 402 is expressed as y1 i,j , where i and j are both It is a positive integer, and 0 ⁇ i ⁇ Q1, 0 ⁇ j ⁇ Q2, Q1 and Q2 are positive integers
  • Q1 represents the total number of rows of the input matrix 401
  • Q2 represents the total number of columns of the input matrix 401.
  • the Nth non-linear layer 420 is the activation layer
  • the activation function in the activation layer is the ReLU function
  • y1 i,j max(x1 i,j >0,x1 i,j, ,0).
  • a nonlinear activation including a ReLU function corresponds to the Taylor expansion of the nonlinear layer for its input. For example, based on the input matrix 401, perform Taylor expansion on the nonlinear function corresponding to the Nth linear layer 420 to determine the Taylor expansion of the Nth nonlinear layer 420, for example, the Taylor expansion of the Nth nonlinear layer 420 Yes:
  • f NN (x) f NN (x1)+(Df)(x1) ⁇ (x-x1)+...
  • the Nth linear function corresponding to the Nth nonlinear layer 420 includes a first parameter and a second parameter.
  • the expression of the Nth linear function is expressed as:
  • f LN represents the Nth linear function
  • a N represents the first parameter of the Nth linear function
  • B N represents the second parameter of the Nth linear function
  • x represents the input of the Nth nonlinear layer, as shown in the figure
  • a N and B N are determined according to the input matrix 401 and the output matrix 402.
  • the Nth linear function is determined based on the Nth nonlinear layer and the input matrix input to the Nth nonlinear layer.
  • the first parameter and the second parameter in the Nth linear function are not the same; when the input matrix is the same, and the Nth
  • the two nonlinear layers are different types of nonlinear layers, that is, the input matrix is input to different nonlinear layers, and the first parameter and the second parameter in the Nth linear function are different.
  • different input matrices can be input to the same non-linear layer to obtain different linear functions; the same input matrix can be input to different non-linear layers to obtain different linear functions.
  • the operation performed by the linear system based on the Nth linear function is similar to the operation performed by the Nth nonlinear layer 420, that is, within an acceptable range of small errors, it can be considered based on
  • the first parameter A N and the second parameter B N may be constants.
  • the first parameter A N and the second parameter B N are both matrices, and all values in the matrix are constants; in other embodiments, the first parameter A N and the second parameter B N are both constants; In still other embodiments, one of the first parameter A N and the second parameter B N is a matrix, and the other is a constant.
  • the expression of the first parameter A N of the Nth linear function can be expressed as:
  • a N (Df NN )(x1)
  • Df NN represents the first derivative of the nonlinear function corresponding to the Nth nonlinear layer 420 at x1, and x1 represents the input matrix 401.
  • f NN represents the nonlinear function corresponding to the Nth nonlinear layer 420
  • f NN (x1) represents the output matrix 402.
  • the Nth nonlinear layer 420 is the activation layer
  • the activation function in the activation layer is the ReLU function
  • the first parameter A N of the Nth linear function is 1 or 0, and the second parameter of the Nth linear function
  • the parameter B N is 0.
  • FIG. 5A is a partial structural diagram of a neural network provided by some embodiments of the present disclosure
  • FIG. 5B is a partial structural diagram of a modified neural network provided by some embodiments of the present disclosure.
  • neural networks also include convolutional layers.
  • the neural network includes a first convolutional layer 41, an Nth nonlinear layer 420, and a second convolutional layer 43, and the first convolutional layer 41, the Nth nonlinear layer The layer 420 and the second convolutional layer 43 are connected in sequence.
  • the Nth nonlinear layer 42 may be an active layer.
  • the input matrix 401 may be the output of the first convolutional layer 41, that is, the first convolutional layer 41 outputs the input matrix 401, and the Nth nonlinear layer 420 processes the input matrix 401 to obtain the output matrix 402.
  • the output matrix 402 is the input of the second convolutional layer 43.
  • both the input matrix 401 and the output matrix 402 may be feature maps.
  • the neural network can also include an average pooling layer, a fully connected layer, and so on.
  • the Nth linear interpreter unit corresponding to the Nth nonlinear layer can be determined, and the Nth linear interpreter unit can be used to replace the Nth nonlinear layer. Then you can get the modified neural network.
  • the modified neural network includes a first convolutional layer 41, an Nth linear interpretation unit 421, and a second convolutional layer 43.
  • the Nth linear interpretation unit 421 is the same as the Nth non-linear interpretation unit 421 in FIG. 5A.
  • the linear layer 420 corresponds, and the Nth nonlinear layer 420 in the neural network shown in FIG. 5A is replaced with the Nth linear interpretation unit 421 to obtain the modified neural network shown in FIG. 5B.
  • the structures, parameters, etc. of the first convolution layer 41 and the second convolution layer 43 remain unchanged.
  • the Nth linear interpreter unit 421 and the Nth nonlinear layer 420 can perform similar operations, that is, for the same input matrix x1, when the input matrix x1 is After input to the Nth nonlinear layer 420, the output matrix y1 is obtained; when the input matrix x1 is input to the Nth linear interpreter unit 421, the output matrix y1 can also be obtained.
  • a linear interpreter unit with a binary mask function can be used instead; for an activation layer with a sigmoid activation function, a continuous mask (continuous mask) function can be used for an activation layer with a sigmoid activation function; for an activation layer with a sigmoid activation function, a continuous mask (continuous mask) function can be used for the maximum pooling layer, a linear interpreter unit with non-uniform downsampling function can be used instead; for the instance normalization layer, linear normalization (Linear Normalization) function of linear interpreter unit replacement.
  • the neural network processing method may also perform linearization processing on all the nonlinear layers in the neural network to determine the expressions of the linear functions corresponding to all the nonlinear layers, so as to obtain the expressions corresponding to all the nonlinear layers.
  • the linearized neural network corresponding to the neural network can be determined.
  • the linear interpreter unit For example, after linearizing all the nonlinear layers in the neural network, you can determine the linear interpreter unit corresponding to all the nonlinear layers in the neural network, and then replace all the nonlinear layers in the neural network with the corresponding The linear interpreter unit, that is, all the nonlinear layers in the neural network are replaced by the corresponding linear interpreter units to obtain a linearized neural network.
  • all nonlinear layers in the neural network correspond to all linear interpreter units in the linearized neural network one-to-one. Therefore, the linear interpreter unit is used to interpret the operation of the nonlinear layer (for example, the activation operation), and the linearized neural network is a linear system, so that the linearized neural network can be used to analyze and evaluate the neural network.
  • the "neural network” is a nonlinear system, that is, the “neural network” includes a nonlinear layer and a linear layer;
  • the “linearized neural network” is a linear system, that is, the “linearization “Neural network” includes a linear interpreter unit and a linear layer.
  • the linear interpreter unit is not an actual layer structure in the neural network, but a layer structure defined for ease of description.
  • the entire neural network can be equivalent to a linear interpreter unit.
  • the processing method of the neural network includes the following steps: processing the input image by the neural network to obtain the output image; according to the input image and Output image and determine the linear interpreter unit corresponding to the neural network.
  • the input image may be various types of images.
  • the input image may be an image captured by an image acquisition device such as a digital camera or a mobile phone, which is not limited in the embodiments of the present disclosure.
  • FIG. 6A is a schematic structural diagram of a neural network provided by some embodiments of the disclosure
  • FIG. 6B is a schematic structural diagram of a linearized neural network provided by some embodiments of the disclosure.
  • the input for example, x represents an input image.
  • y NN1 is a nonlinear function, for example, y NN1 can be a high-order multi-term expression of x.
  • the neural network shown in FIG. 6A may include five linear layers and five nonlinear layers, with each nonlinear layer located between two adjacent linear layers.
  • a linearized neural network as shown in FIG. 6B can be obtained.
  • the linearized neural network includes five linear layers and five linear interpreter units, and the five linear interpreter units have a one-to-one correspondence with the five nonlinear layers shown in FIG. 6A.
  • the parameters of the linear function corresponding to the network are all constant terms.
  • FIG. 7 is a flowchart of a neural network-based data analysis method provided by some embodiments of the present disclosure.
  • the neural network may be a linearized neural network obtained after processing by the aforementioned neural network processing method.
  • the data analysis method provided by some embodiments of the present disclosure includes the following steps:
  • the linearized neural network is a linear system
  • the processing process of various data by the neural network can be analyzed based on the linear system to realize the visualization of the neural network.
  • Linear systems can be fully described by impulse response. Since the linearized neural network is a linear system, impulse response analysis can be performed on the linearized neural network. When analyzing the processing process of a linearized neural network on an image, the impulse response can show the influence of the input pixels in the input image on the output pixels in the output image. For example, after the linearized neural network processes the input image, you can Determine the conversion relationship between each input pixel and each output pixel, for example, which input pixels are used to obtain output pixels, and the proportion of each input pixel. According to the standard method of the linear system, the opposite relationship can also be obtained, that is, the influence of the output pixel on the input pixel, for example, which input pixels a certain output pixel corresponds to. When analyzing the processing process of a certain nonlinear layer in the linearized neural network on the input matrix, similarly, the impulse response can show the influence of the input elements in the input matrix on the output elements in the output matrix.
  • the data analysis method provided by some embodiments of the present disclosure can be applied to fields such as image recognition, image classification, speech recognition, and speech classification.
  • the input data and the first output data may be images, text, voice, and the like.
  • the input data and the first output data may be two-dimensional matrices; when the input data and the first output data are text or voice, the input data and the first output data
  • the output data can be a one-dimensional matrix.
  • the neural network processes the input data 501 to obtain the first output data 502.
  • the processing method of the neural network is the processing method provided according to any of the above-mentioned embodiments of the present disclosure.
  • the relevant description of steps S11-S12 in the processing method of the neural network please refer to the relevant description of steps S11-S12 in the processing method of the neural network, and the repetition is not repeated here.
  • the linearized neural network is determined based on the input data and the neural network.
  • different linearized neural networks can be obtained; when the input data is the same, but the structure and parameters of the neural network are different, the input If data is input to different neural networks, different linearized neural networks can also be obtained.
  • inputting different input data into the same neural network can obtain different linearized neural networks; inputting different neural networks with the same input data can also obtain different linearized neural networks.
  • step S24 may include: determining a detection data group according to the input data, wherein the detection data group is a binary matrix group; processing the detection data group by a linearized neural network to obtain a second output data group; and based on the detection data group And the second output data group to analyze the positive or negative influence between the input data and the first output data.
  • positive influence means the influence of each input element in the input data on each output element in the first output data, for example, each input element can correspond to which output elements in the first output data, etc.
  • reverse “Influence” means the influence of each output element in the first output data on each input element in the input data, for example, which input elements in the input data can each output element correspond to.
  • the number and size of the detection data in the detection data group can be determined according to the input data.
  • the size of each detection data is the same as the size of the input data.
  • the detection data group includes at least one detection data
  • the second output data group includes at least one second output data
  • the at least one detection data corresponds to at least one second output data in a one-to-one correspondence.
  • the detection data group may include three detection data
  • the second output data group includes three second output data
  • the first detection data corresponds to the first second output data
  • the second detection data corresponds to the second output data.
  • One second output data corresponds to the third detection data corresponding to the third second output data.
  • each detection data is a binary matrix. It should be noted that a binary matrix indicates that the value of an element in the binary matrix is 1 or 0.
  • impulse response means the output (for example, second output data) of one input (for example, detection data), the value of a certain pixel (for example, target detection element) in the detection data is 1, and the rest The value of all elements (for example, non-target detection elements) is 0.
  • the detection data and the second output data may also be images, text, voice, and so on.
  • the detection data and the second output data when the detection data and the second output data are images, the detection data and the second output data may be a two-dimensional matrix; when the detection data and the second output data are text or voice, the detection data and the second output data
  • the output data can be a one-dimensional matrix.
  • the input elements in the input data represent pixels in the image
  • the input elements in the input data represent Chinese characters or letters in the text data
  • the input elements in the input data represent sound elements in the voice data.
  • the above description uses input data as an example to illustrate the elements in the data, and the above description is also applicable to the first output data, the detection data, and the second output data.
  • analyzing the positive influence between the input data and the first output data includes: using a linearized neural network to separately process at least one detection data to obtain at least A second output data; by analyzing the element-level correspondence between at least one detection data and at least one second output data, the positive influence of each input element of the input data on each output element of the first output data is determined.
  • each detection data includes a target detection element
  • the value of the target detection element is 1
  • the remaining detection elements in each detection data except the target detection element are non-target detection elements
  • the values of the non-target detection elements are all 0.
  • the detection data group includes detection data 503, the second output data group includes second output data 504, and the detection data 503 is represented as x3[n,m], the second output The data 504 is represented as y3[p,q], and the detection data 503 corresponds to the second output data 504.
  • the element located in the n0th row and m0th column in the detection data 503 is the target detection element, and the remaining elements in the detection data 503 are all non-target detection elements, so the detection data 503 can be expressed as:
  • n, m, n0 and m0 are all positive integers, and 0 ⁇ n ⁇ Q3, 0 ⁇ m ⁇ Q4, Q3 and Q4 are positive integers, Q3 represents the total number of rows of detection data 503, and Q4 represents the total number of detection data 503 Number of columns.
  • the size of the input data 501 and the size of the detection data 503 are the same.
  • y2[p,q] represents the second output data 504, p and q are both positive integers, and 0 ⁇ p ⁇ Q5, 0 ⁇ q ⁇ Q6, Q5 and Q6 are positive integers, and Q5 represents the second output data 504 Q6 represents the total number of columns of the second output data 504.
  • the size of the second output data 504 and the size of the first output data 502 may be the same.
  • the target detection element in the detection data 503 is located in the n0th row and m0th column, it can be determined from the detection data 503 and the second output data 504 that the detection element in the n0th row and m0th column in the detection data 503 is The contribution of each output element in the output data 504 is a positive influence.
  • the size of a one-dimensional matrix represents the number of elements in the one-dimensional matrix
  • the size of a two-dimensional matrix represents the number of rows and columns in the two-dimensional matrix
  • the second parameter B NN2 may represent the output obtained by the linearized neural network processing the all-zero matrix, and the second parameter B NN2 may represent the bias coefficient.
  • the detection data 503 and the second output data 504 are analyzed at the element level. Then, the contribution of the input element located in the n0th row and m0th column in the input data 501 to all the output elements in the first output data 502 can be obtained.
  • the positions of the target detection elements in at least part of the detection data in the multiple detection data are different.
  • the input data 501 includes Q3*Q4 input elements
  • the detection data group may include Q3*Q4 detection data
  • the target detection element in each detection data corresponds to one input element in the input data 501.
  • the positions of the Q3*Q4 target detection pixels of the Q3*Q4 detection data correspond to the positions of the Q3*Q4 input elements in the input data 501 respectively.
  • the target detection element of a detection data is located in the first row and first column
  • the target detection element of the detection data corresponds to the input element located in the first row and first column of the input data.
  • the contribution of each input element in the input data 501 to each output element in the first output data 502 can be determined.
  • the present disclosure is not limited to this. According to actual application requirements, only part of the input elements in the input data 501 can be analyzed. At this time, only the detection data corresponding to the input elements that need to be analyzed in the input data 501 can be stored and analyzed. , Thereby saving storage space and system resources.
  • multiple detection data have the same size.
  • each target detection element is included in the detection data as an example to illustrate the embodiment of the present disclosure, that is, analyzing a certain input element in the input data (for example, the input located in the n0th row and m0th column). Element) has a positive influence on the output, but the embodiments of the present disclosure are not limited thereto. It is also possible to analyze the positive influence of multiple specific input elements in the input data on the output, so that each detection data can include multiple target detection elements (for example, two target detection elements, three target detection elements, etc.). The value of each target detection element is 1. Except for the multiple target detection elements, the values of the remaining elements in the detection data are all 0.
  • the detection data group includes a plurality of detection data
  • the second output data group includes a plurality of second output data
  • the plurality of detection data corresponds to the plurality of second output data in a one-to-one correspondence.
  • step S24 based on the detection data group and the second output data group, analyzing the reverse influence between the input data and the first output data includes: using a linearized neural network to separately process multiple detection data to obtain Multiple second output data; by analyzing the element-level correspondence between multiple detection data and multiple second output data, determine the reverse influence of each output element of the output data on each input element of the input data.
  • each detection data includes a target detection element
  • the value of the target detection element is 1
  • the remaining detection elements in each detection data except the target detection element are non-target detection elements
  • the values of the non-target detection elements are all 0.
  • the number of multiple detection data in the detection data group is the same as the number of elements in each detection data, and the position of the target detection element of any two detection data in the multiple detection data is different.
  • the input data 501 includes multiple input elements
  • the multiple detection data includes multiple target detection elements
  • the multiple input elements correspond to multiple target detection elements in a one-to-one correspondence. That is to say, if the input matrix 501 includes Q3*Q4 input elements, the detection data group can include Q3*Q4 detection data, and the target detection element in each detection data corresponds to one input element in the input data 501, Q3*Q4
  • the positions of the Q3*Q4 target detection elements in each detection data correspond to the positions of the Q3*Q4 input elements in the input data 501 respectively.
  • the detection data can be input to the linearized neural network, and the relationship between the input detection data and the corresponding second output data can be analyzed at the element level to obtain the relationship between the input data and the first output data.
  • Forward influence or reverse influence so as to analyze the specific processing process of the input data by the nonlinear layer in the neural network, and determine which input elements in the input data determine a specific output element of the first output data (reverse influence), And determining the contribution (positive influence) of each input element in the input data to a specific output element of the first output data.
  • the data analysis method provided in the embodiment of the present disclosure is described above by taking the processing process of the neural network as an example, but the present disclosure is not limited to this.
  • the data processing process of a certain nonlinear layer in the neural network can be analyzed.
  • the data processing process of the nonlinear layer is similar to the process of the above data analysis method, and will not be repeated here.
  • FIG. 8 is a flowchart of a neural network evaluation method provided by some embodiments of the present disclosure
  • FIG. 9 is a flowchart of a neural network training method provided by some embodiments of the present disclosure.
  • the neural network evaluation method may include the following steps:
  • S31 Perform a neural network processing method to determine at least one linear interpreter unit corresponding to at least one nonlinear layer;
  • the processing method of the neural network is the processing method provided according to any of the foregoing embodiments of the present disclosure.
  • the relevant descriptions of steps S11-S12 in the processing method of the neural network and the repetition is not repeated here.
  • all nonlinear layers in the neural network may be linearized to obtain multiple linear interpreter units corresponding to all nonlinear layers in the neural network.
  • step S32 may include: evaluating at least one linear interpreter unit to determine the evaluation result of the at least one nonlinear layer; and training the neural network based on the evaluation result.
  • detection data may be input to at least one linear interpreter unit to obtain second output data.
  • the positive or negative influence between the input data and the first output data is obtained, so as to determine the evaluation result of at least one non-linear layer, and then determine the neural network The contribution of each nonlinear layer to the input.
  • training the neural network includes:
  • S44 Calculate the loss value of the loss function of the neural network according to the training output data and the training target data;
  • step S47 is executed to obtain the trained neural network
  • step S42 When the loss function of the linearized neural network does not meet the predetermined condition, return to step S42, continue to input training input data and training target data to repeat the above training process.
  • step S41 in the case of performing linearization processing on all the nonlinear layers in the neural network to obtain multiple linear interpreter units corresponding to all the nonlinear layers in the neural network, the A linear interpreter unit evaluates all nonlinear layers in the neural network to determine the training weights of all nonlinear layers in the neural network. It should be noted that in step S41, only part of the nonlinear layer in the neural network may be linearized, so that during the training process, the training weight of the part of the nonlinear layer can be determined.
  • step S41 based on the impulse response, the contribution (ie weight) of each linear interpreter unit to the input can be analyzed, so as to determine that in the process of processing the input data through the neural network, each non- The contribution of the linear layer to the input data (ie weight) determines how to improve the number of filters and parameters of the neural network, and optimize the network configuration. It should be noted that it is also possible to analyze the contribution (ie weight) of each linear layer in the neural network to the input data based on the impulse response.
  • the layers with low contribution can be directly removed, thereby reducing the complexity of the neural network and reducing the amount of data in the process of training the neural network; or, during the training process
  • the parameters of the layer with the lower contribution degree may not be corrected.
  • the higher contribution layer can be trained in the training process, that is, in the training process, in step S45, the adjustment contribution is higher The parameters of the layer to make it optimal.
  • the training target data can be used as the target value of the training output data, the parameters of the neural network are continuously optimized, and a trained neural network is finally obtained.
  • the predetermined condition corresponds to the convergence of the loss of the neural network's loss function when a certain amount of training input data and training target data are input.
  • the predetermined condition is that the number of training times or training cycles of the neural network reaches a predetermined number, and the predetermined number may be millions, as long as the set of training input data and training target data is large enough.
  • FIG. 10 is a schematic diagram of a neural network processing device according to some embodiments of the disclosure.
  • the processing device 90 of the neural network may include a memory 905 and a processor 910.
  • the memory 905 is used to store computer readable instructions.
  • the processor 910 is configured to run computer-readable instructions. When the computer-readable instructions are executed by the processor 910, the neural network processing method according to any of the above embodiments can be executed.
  • a neural network includes at least one non-linear layer, and when the computer readable instructions are executed by the processor 910, the following operations can be performed: use the Nth non-linear layer in the at least one non-linear layer to process the input to the Nth non-linear layer Input matrix to get the output matrix of the Nth nonlinear layer; according to the input matrix and output matrix, linearize the Nth nonlinear layer to determine the Nth linear function corresponding to the Nth nonlinear layer expression.
  • f LN represents the Nth linear function
  • a N represents the first parameter of the Nth linear function
  • B N represents the second parameter of the Nth linear function
  • x represents the input of the Nth nonlinear layer
  • a N And B N are determined according to the input matrix and output matrix corresponding to the Nth nonlinear layer, where N is a positive integer.
  • the processor 910 may be a central processing unit (CPU), a tensor processor (TPU), or a device with data processing capabilities and/or program execution capabilities, and may control other components in the processing device 90 of the neural network to execute The desired function.
  • the central processing unit (CPU) can be an X86 or ARM architecture.
  • the memory 905 may include one or more computer program products, and the computer program products may include various forms of computer-readable storage media, such as volatile memory and/or nonvolatile memory.
  • the volatile memory may include random access memory (RAM) and/or cache memory (cache), for example.
  • the non-volatile memory may include, for example, read only memory (ROM), hard disk, erasable programmable read only memory (EPROM), portable compact disk read only memory (CD-ROM), USB memory, flash memory, etc.
  • One or more computer-readable instructions may be stored on the computer-readable storage medium, and the processor 910 may run the computer-readable instructions to implement various functions of the processing device 90 of the neural network.
  • the processing device 90 of the neural network further includes an input interface 915 that allows an external device to communicate with the processing device 90 of the neural network.
  • the input interface 915 may be used to receive instructions from external computer devices, from users, or the like.
  • the processing device 90 of the neural network may also include an output interface 920 that interfaces the processing device 90 of the neural network with one or more external devices.
  • the processing device 90 of the neural network may output the first parameter and the second parameter of the linear function corresponding to the nonlinear layer through the output interface 920. It is considered that external devices communicating with the processing device 90 of the neural network through the input interface 915 and the output interface 920 may be included in an environment that provides substantially any type of user interface with which the user can interact.
  • Examples of user interface types include graphical user interfaces, natural user interfaces, and so on.
  • a graphical user interface can accept input from a user using an input device such as a keyboard, mouse, remote control, etc., and provide output on an output device such as a display.
  • the natural language interface may enable the user to interact with the processing device 90 of the neural network in a manner that does not need to be subject to constraints imposed by input devices such as a keyboard, mouse, remote control, and the like.
  • natural user interfaces can rely on voice recognition, touch and stylus recognition, gesture recognition on and near the screen, air gestures, head and eye tracking, voice and voice, vision, touch, gestures, and machine intelligence.
  • data transmission may be implemented between the memory 905 and the processor 910 through a network or a bus system.
  • the memory 905 and the processor 910 may directly or indirectly communicate with each other.
  • FIG. 11 is a schematic diagram of a data analysis device according to some embodiments of the disclosure.
  • the data analysis device 100 may implement a data analysis process based on a neural network, and the data analysis device 100 may include a memory 1001 and a processor 1002.
  • the memory 1001 is used to store computer readable instructions.
  • the processor 1002 is configured to run the computer-readable instructions, and when the computer-readable instructions are executed by the processor 1002, the data analysis method according to any of the above embodiments can be executed.
  • the following operations are performed: obtain input data; use a neural network to process the input data to obtain first output data; according to the input data and the first output data, perform the following operations
  • the neural network processing method described in the example is to linearize all the nonlinear layers in the neural network to determine the linearized neural network corresponding to the neural network; based on the linearized neural network, analyze the input data and the first output data Correspondence between.
  • the memory 1001 may also store training input data and training target data.
  • the processor 1002 may be a central processing unit (CPU), a tensor processor (TPU), or a graphics processing unit (GPU) and other devices with data processing capabilities and/or program execution capabilities, and can control the data analysis device 100 Other components to perform the desired functions.
  • the central processing unit (CPU) can be an X86 or ARM architecture.
  • the GPU can be directly integrated on the motherboard alone or built into the north bridge chip of the motherboard. The GPU can also be built into the central processing unit (CPU).
  • the memory 1002 may include one or more computer program products, and the computer program products may include various forms of computer-readable storage media, such as volatile memory and/or nonvolatile memory.
  • the volatile memory may include random access memory (RAM) and/or cache memory (cache), for example.
  • the non-volatile memory may include, for example, read only memory (ROM), hard disk, erasable programmable read only memory (EPROM), portable compact disk read only memory (CD-ROM), USB memory, flash memory, etc.
  • One or more computer-readable instructions may be stored on the computer-readable storage medium, and the processor 1002 may run the computer-readable instructions to implement various functions of the data analysis apparatus 100.
  • data transmission between the memory 1001 and the processor 1002 may be implemented through a network or a bus system.
  • the memory 1001 and the processor 1002 may directly or indirectly communicate with each other.
  • the data analysis apparatus 100 further includes an input interface 1003 that allows an external device to communicate with the data analysis apparatus 100.
  • the input interface 1003 can be used to receive instructions from external computer devices, from users, and the like.
  • the data analysis apparatus 100 may also include an output interface 1004 for interfacing the data analysis apparatus 100 with one or more external devices.
  • the data analysis device 100 may output analysis results and the like through the output interface 1004. It is considered that external devices that communicate with the data analysis apparatus 100 through the input interface 1003 and the output interface 1004 may be included in an environment that provides substantially any type of user interface with which a user can interact. Examples of user interface types include graphical user interfaces, natural user interfaces, and so on.
  • a graphical user interface can accept input from a user using an input device such as a keyboard, mouse, remote control, etc., and provide output on an output device such as a display.
  • the natural language interface may enable the user to interact with the data analysis apparatus 100 in a manner that does not require constraints imposed by input devices such as a keyboard, mouse, remote control, and the like.
  • natural user interfaces can rely on voice recognition, touch and stylus recognition, gesture recognition on and near the screen, air gestures, head and eye tracking, voice and voice, vision, touch, gestures, and machine intelligence.
  • the data analysis device 100 is shown as a single system in the figure, it can be understood that the data analysis device 100 may also be a distributed system, and may also be arranged as a cloud facility (including a public cloud or a private cloud). Thus, for example, several devices may communicate through a network connection and may jointly perform tasks described as performed by the data analysis apparatus 100.
  • Some embodiments of the present disclosure also provide a schematic diagram of a non-transitory computer-readable storage medium.
  • one or more first computer-readable instructions may be stored on a non-transitory computer-readable storage medium.
  • the first computer-readable instruction when executed by a computer, one or more steps in the processing method according to the neural network described above can be executed.
  • one or more second computer-readable instructions may be stored on the non-transitory computer-readable storage medium.
  • the second computer-readable instruction is executed by a computer, one or more steps in the data analysis method described above can be executed.
  • one or more third computer-readable instructions may be stored on the non-transitory computer-readable storage medium.
  • the third computer-readable instruction is executed by a computer, one or more steps in the evaluation method according to the neural network described above can be executed.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Neurology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

一种神经网络的处理方法和处理装置、神经网络的评估方法、数据分析方法及装置、存储介质。该神经网络的处理方法包括:(S11)利用至少一个非线性层中的第N个非线性层处理输入至第N个非线性层的输入矩阵,以得到第N个非线性层输出的输出矩阵;(S12)根据所述输入矩阵和所述输出矩阵,对第N个非线性层进行线性化处理以确定第N个非线性层对应的第N个线性函数的表达式。

Description

神经网络的处理方法及评估方法、数据分析方法及装置
本申请要求于2019年01月25日递交的中国专利申请第201910075152.0号的优先权,在此全文引用上述中国专利申请公开的内容以作为本申请的一部分。
技术领域
本公开的实施例涉及一种神经网络的处理方法和处理装置、神经网络的评估方法、数据分析方法及数据分析装置、计算机可读存储介质。
背景技术
当前,基于人工神经网络的深度学习技术已经在诸如物体分类、文本处理、推荐引擎、图像搜索、面部识别、年龄和语音识别、人机对话以及情感计算等领域取得了巨大进展。人工神经网络包括卷积神经网络(Convolution Neural Network,CNN),卷积神经网络是一类包含卷积计算且具有深度结构的前馈神经网络,是深度学习(deep learning)的代表算法之一。卷积神经网络是一种非线性系统,其包括多个层结构以及连接这些层结构的非线性单元,这些非线性单元可以允许卷积神经网络适应各种输入。
发明内容
本公开至少一些实施例提供一种神经网络的处理方法,所述神经网络包括至少一个非线性层,所述处理方法包括:利用所述至少一个非线性层中的第N个非线性层处理输入至所述第N个非线性层的输入矩阵,以得到所述第N个非线性层输出的输出矩阵;根据所述输入矩阵和所述输出矩阵,对所述第N个非线性层进行线性化处理以确定所述第N个非线性层对应的第N个线性函数的表达式,其中,所述第N个线性函数的表达式表示为:f LN=A N*x+B N,其中,f LN表示所述第N个线性函数,A N表示所述第N个线性函数的第一参数,B N表示所述第N个线性函数的第二参数,x表示所述第N个非线性层的输入,A N和B N根据所述输入矩阵和所述输出矩阵确定,其中,N为正整数。
例如,在本公开一些实施例提供的处理方法中,所述第N个线性函数的第一参数的表达式为:A N=(Df NN)(x1),其中,Df NN表示所述第N个非线性层对 应的非线性函数的一阶导数,x1表示所述输入矩阵;所述第N个线性函数的第二参数的表达式为:B N=f NN(x1)-A*(x1),其中,f NN表示所述第N个非线性层对应的非线性函数,f NN(x1)表示所述输出矩阵。
例如,本公开一些实施例提供的处理方法还包括:对所述神经网络中的所有非线性层进行所述线性化处理以确定所述所有非线性层分别对应的线性函数的表达式。
例如,在本公开一些实施例提供的处理方法中,所述至少一个非线性层包括激活层、实例归一化层、最大池化层或softmax层,所述激活层的激活函数为ReLU函数、tanh函数或sigmod函数。
本公开一些实施例还提供一种基于神经网络的数据分析方法,包括:获取输入数据;利用所述神经网络对所述输入数据进行处理以得到第一输出数据;根据所述输入数据和所述第一输出数据,执行根据上述任一项所述的处理方法,对所述神经网络中的所有非线性层进行所述线性化处理,以确定与所述神经网络对应的线性化神经网络;基于所述线性化神经网络,分析所述输入数据和所述第一输出数据之间的对应关系。
例如,在本公开一些实施例提供的数据分析方法中,基于所述线性化神经网络,分析所述输入数据和所述输出数据之间的对应关系,包括:根据所述输入数据,确定探测数据组,其中,所述探测数据组为二值矩阵组;利用所述线性化神经网络处理所述探测数据组,以得到第二输出数据组;基于所述探测数据组和所述第二输出数据组,分析所述输入数据和所述第一输出数据之间的正向影响或反向影响。
例如,在本公开一些实施例提供的数据分析方法中,所述探测数据组包括至少一个探测数据,所述第二输出数据组包括至少一个第二输出数据,所述至少一个探测数据与所述至少一个第二输出数据一一对应,基于所述探测数据组和所述第二输出数据组,分析所述输入数据和所述第一输出数据之间的正向影响,包括:利用所述线性化神经网络分别处理所述至少一个探测数据,以得到所述至少一个第二输出数据;通过分析所述至少一个探测数据和所述至少一个第二输出数据在元素级别上的对应关系,确定所述输入数据的各个输入元素对所述第一输出数据的各个输出元素的正向影响,其中,所述探测数据组中的每个所述探测数据包括目标探测元素,所述目标探测元素的值为1,每个所述探测数据中除了所述目标探测元素之外的其余探测元素的值均为0。
例如,在本公开一些实施例提供的数据分析方法中,在所述探测数据组包括多个探测数据的情况下,所述多个探测数据中至少部分探测数据中的目标探测元素的位置不同。
例如,在本公开一些实施例提供的数据分析方法中,所述多个探测数据的尺寸相同,且所述多个探测数据的尺寸与所述输入数据的尺寸也相同。
例如,在本公开一些实施例提供的数据分析方法中,所述探测数据组包括多个探测数据,所述第二输出数据组包括多个第二输出数据,所述多个探测数据与所述多个第二输出数据一一对应,基于所述探测数据组和所述第二输出数据组,分析所述输入数据和所述第一输出数据之间的反向影响,包括:利用所述线性化神经网络分别处理所述多个探测数据,以得到所述多个第二输出数据;通过分析所述多个探测数据和所述多个第二输出数据在元素级别上的对应关系,确定所述输出数据的各个输出元素对所述输入数据的各个输入元素的反向影响,其中,所述探测数据组中的每个所述探测数据包括目标探测元素,所述目标探测元素的值为1,每个所述探测数据中除了所述目标探测元素之外的其余探测元素的值均为0,所述多个探测数据的数量和每个所述探测数据中的所有探测元素的数量相同,所述多个探测数据中任意两个探测数据的目标探测元素的位置不同。
本公开至少一些实施例还提供一种神经网络的评估方法,包括:执行根据上述任一实施例所述的处理方法,以确定与所述至少一个非线性层对应的至少一个线性解释器单元;基于所述至少一个线性解释器单元,对所述神经网络进行评估。
例如,在本公开一些实施例提供的评估方法中,基于所述至少一个线性解释器单元,对所述神经网络进行评估包括:对所述至少一个线性解释器单元进行评估以确定所述至少一个非线性层的评估结果;基于所述评估结果,对所述神经网络进行训练。
例如,在本公开一些实施例提供的评估方法中,基于所述评估结果,对所述神经网络进行训练包括:基于所述评估结果,确定所述至少一个非线性层的训练权重;获取训练输入数据和训练目标数据;利用所述神经网络对所述训练输入数据进行处理,以得到训练输出数据;根据所述训练输出数据和所述训练目标数据,计算所述神经网络的损失函数的损失值;基于所述至少一个非线性层的训练权重和所述损失值对所述神经网络的参数进行修正,在所述神经网络 的损失函数满足预定条件时,得到训练好的所述神经网络,在所述神经网络的损失函数不满足所述预定条件时,继续输入所述训练输入数据和所述训练目标数据以重复执行上述训练过程。
本公开至少一些实施例还提供一种神经网络的处理装置,包括:存储器,用于存储计算机可读指令;以及处理器,用于运行所述计算机可读指令,所述计算机可读指令被所述处理器运行时可以执行根据上述任一实施例所述的处理方法。
本公开至少一些实施例还提供一种计算机可读存储介质,用于存储计算机可读指令,当所述计算机可读指令由计算机执行时可以执行根据上述任一实施例所述的处理方法。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例的附图作简单地介绍,显而易见地,下面描述中的附图仅仅涉及本公开的一些实施例,而非对本公开的限制。
图1为一种卷积神经网络的示意图;
图2为一种由于卷积神经网络中的激活函数的激活结果而等效的少量滤波器的示意图;
图3为本公开一些实施例提供的一种神经网络的处理方法的流程图;
图4为本公开一些实施例提供的一种非线性层和与该非线性层对应的线性函数的示意图;
图5A为本公开一些实施例提供的一种神经网络的部分结构示意图;
图5B为本公开一些实施例提供的一种修改后的神经网络的部分结构示意图;
图6A为本公开一些实施例提供的一种神经网络的结构示意图;
图6B为本公开一些实施例提供的一种线性化神经网络的结构示意图;
图7为本公开一些实施例提供的一种基于神经网络的数据分析方法的流程图;
图8为本公开一些实施例提供的一种神经网络的评估方法的流程图;
图9为本公开一些实施例提供的一种神经网络的训练方法的流程图;
图10为本公开一些实施例的一种神经网络的处理装置的示意图;
图11为本公开一些实施例的一种数据分析装置的示意图。
具体实施方式
为了使得本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例的附图,对本公开实施例的技术方案进行清楚、完整地描述。显然,所描述的实施例是本公开的一部分实施例,而不是全部的实施例。基于所描述的本公开的实施例,本领域普通技术人员在无需创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
除非另外定义,本公开使用的技术术语或者科学术语应当为本公开所属领域内具有一般技能的人士所理解的通常意义。本公开中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。“包括”或者“包含”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者物件及其等同,而不排除其他元件或者物件。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电性的连接,不管是直接的还是间接的。“上”、“下”、“左”、“右”等仅用于表示相对位置关系,当被描述对象的绝对位置改变后,则该相对位置关系也可能相应地改变。
为了保持本公开实施例的以下说明清楚且简明,本公开省略了部分已知功能和已知部件的详细说明。
卷积神经网络是一种使用例如图像作为输入和输出,并且通过滤波器(卷积核)来替代标量权重的神经网络结构。卷积神经网络包括多个非线性层,例如,激活层、实例归一化层(instance normalization)、最大池化层或softmax层等均为非线性层。
卷积神经网络是深度学习系统的代表算法之一。目前,深度学习系统的主要缺点是难以解释神经网络的工作过程。在深度学习系统中,首先选择一个网络架构,然后对该网络架构进行训练以获得一组参数(滤波器系数和偏置)。如果训练得到的网络较好,则对于给定的输入,该训练后的网络的输出将以高精度匹配期望的目标。然而,仍然有许多问题难以解释,例如:该训练后的网络是解决问题的最佳选择吗?该训练后的网络的参数的数量是否足够?这些参数如何在训练后的网络内部工作以得到输出?与拥有少数层(浅网络)的网络相比,拥有许多层(深层网络)的网络如何帮助提高输出的精确度?
深度神经网络架构中的滤波器通常较小(3*3卷积核或5*5卷积核等),逐个可视化大量的滤波器并不能提供对深度神经网络架构的深入了解。另外,偏置是不能提供深度神经网络架构中工作的复杂机制的线索的标量。理解深度学习系统的参数在很大程度上仍然是一个难题。
本公开至少一些实施例提供一种神经网络的处理方法和处理装置、神经网络的评估方法、数据分析方法及数据分析装置、计算机可读存储介质,该神经网络的处理方法可以将神经网络进行线性化,以使神经网络变为一个线性系统,从而可以使用线性系统的经典方法(例如,脉冲响应)分析神经网络,提高对神经网络如何解决问题的理解,基于这些理解还可以优化神经网络的配置。
图1为一种卷积神经网络的示意图。例如,该卷积神经网络可以用于处理图像、语音、文本等。图1中仅示出了具有3层结构的卷积神经网络,本公开的实施例对此不作限制。如图1所示,卷积神经网络包括输入层101、隐藏层102和输出层103。输入层101具有4个输入121,隐藏层102具有3个输出122,输出层103具有2个输出123,最终该卷积神经网络最终输出2幅图像。
例如,输入层101的4个输入121可以为4幅图像,或者1幅图像的四种特征。隐藏层102的3个输出122可以为经过输入层101输入的图像的特征图。隐藏层102的3个输出可以为经由输入层101输入的图像或特征(例如,4个输入121)的特征图。
例如,如图1所示,每个卷积层具有权重
Figure PCTCN2019127431-appb-000001
和偏置
Figure PCTCN2019127431-appb-000002
权重
Figure PCTCN2019127431-appb-000003
表示卷积核,偏置
Figure PCTCN2019127431-appb-000004
是叠加到卷积层的输出的标量,其中,k是表示输入层101的标签,i和j分别是输入层101的单元和隐藏层102的单元的标签。例如,隐藏层102可以包括第一卷积层201和第二卷积层202。第一卷积层201包括第一组卷积核(图1中的
Figure PCTCN2019127431-appb-000005
)和第一组偏置(图1中的
Figure PCTCN2019127431-appb-000006
)。第二卷积层202包括第二组卷积核(图1中的
Figure PCTCN2019127431-appb-000007
)和第二组偏置(图1中的
Figure PCTCN2019127431-appb-000008
)。通常,每个卷积层包括数十个或数百个卷积核,若卷积神经网络为深度卷积神经网络,则其可以包括至少五层卷积层。
例如,如图1所示,该隐藏层102还包括第一激活层203和第二激活层204。第一激活层203位于第一卷积层201之后,第二激活层204位于第二卷积层202之后。激活层(例如,第一激活层203和第二激活层204)包括激活函数,激活函数用于给卷积神经网络引入非线性因素,以使卷积神经网络可以更好地解决较为复杂的问题。激活函数可以包括线性修正单元(rectifying linear  unit,ReLU)函数、S型函数(Sigmoid函数)或双曲正切函数(tanh函数)等。例如,激活层可以单独作为卷积神经网络的一层,或者激活层也可以被包含在卷积层(例如,第一卷积层201可以包括第一激活层203,第二卷积层202可以包括第二激活层204)中。
例如,在第一卷积层201中,首先,对每个输入121应用第一组卷积核中的若干卷积核
Figure PCTCN2019127431-appb-000009
和第一组偏置中的若干偏置
Figure PCTCN2019127431-appb-000010
以得到第一卷积层201的输出;然后,第一卷积层201的输出可以通过第一激活层203进行处理,以得到第一激活层203的输出。在第二卷积层202中,首先,对输入的第一激活层203的输出应用第二组卷积核中的若干卷积核
Figure PCTCN2019127431-appb-000011
和第二组偏置中的若干偏置
Figure PCTCN2019127431-appb-000012
以得到第二卷积层202的输出;然后,第二卷积层202的输出可以通过第二激活层204进行处理,以得到第二激活层204的输出。例如,第一卷积层201的输出可以为对其输入应用卷积核
Figure PCTCN2019127431-appb-000013
后再与偏置
Figure PCTCN2019127431-appb-000014
相加的结果,第二卷积层202的输出可以为对第一激活层203的输出应用卷积核
Figure PCTCN2019127431-appb-000015
后再与偏置
Figure PCTCN2019127431-appb-000016
相加的结果。例如,如图1所示,第一激活层203的输出即为隐藏层102的输出122,第二激活层204的输出被传输至输出层103以作为输出层103的输出123。
图2为一种由于卷积神经网络中的激活函数的激活结果而等效的少量滤波器的示意图。例如,激活函数可以防止整个卷积神经网络的架构被减少至作用于每个输入的少量的一组滤波器。卷积神经网络可以被解释为一种自适应滤波器。若激活层包括ReLU函数,若该激活层的输入包括第一部分输入和第二部分输入,若第一部分输入是正的,则激活层将该第一部分输入不变地传递至下一层;而若第二部分输入是负的,则该第二部分输入不对该激活层的输出产生任何影响。假设,如图1所示,激活第一激活层203中的第二个ReLU 2032和第二激活层204中的第一个ReLU 2041的特定输入,对于该卷积神经网络中的其余ReLU,该特定输入是负的,即其余ReLU不影响输出。因此,如图2所示,省略了除第一激活层203中的第二个ReLU 2032和第二激活层204中的第一个ReLU 2041之外的ReLU,得到线性的卷积神经网络,该线性的卷积神经网络仅包括四个不同的滤波器和一些偏置,其作用于每个输入。对于不同的输入,各个ReLU的激活状态将不同,从而改变卷积神经网络的输出结果。对于任意的输入,卷积神经网络的净效应总是等效于少量的滤波器和偏置(例如,如图2所示的一组滤波器和偏置),但是该一组滤波器随着输入而改变,从而产生自适应滤波器效应。
图3为本公开一些实施例提供的一种神经网络的处理方法的流程图,图4为本公开一些实施例提供的一种非线性层和与该非线性层对应的线性函数的示意图。
例如,神经网络为非线性系统,神经网络包括至少一个非线性层。如图3所示,本公开一些实施例提供的神经网络的处理方法包括以下步骤:
S11:利用至少一个非线性层中的第N个非线性层处理输入到第N个非线性层的输入矩阵,以得到第N个非线性层输出的输出矩阵;
S12:根据输入矩阵和输出矩阵,对第N个非线性层进行线性化处理以确定第N个非线性层对应的第N个线性函数的表达式。
例如,神经网络可以为卷积神经网络。至少一个非线性层包括激活层、实例归一化层、最大池化层或softmax层等。例如,激活层的激活函数可以为ReLU函数、tanh函数或sigmoid函数等。下面以第N个非线性层为激活层,且激活层的激活函数为ReLU函数为例详细描述本公开的实施例,但是本公开的实施例不限于激活层的情形。
例如,N为正整数,且小于等于神经网络中所有非线性层的数量。在一些实施例中,神经网络包括M个非线性层,M为正整数,则1≤N≤M。需要说明的是,虽然以第N个非线性层为例详细描述本公开的实施例,但是本公开的实施例提供的处理方法适用于神经网络中的每一个非线性层。
例如,在步骤S11中,输入至不同的非线性层的输入矩阵可以相同,也可以各不相同。例如,神经网络包括第一个非线性层和第二个非线性层,第一个非线性层和第二个非线性层可以不同。第一输入矩阵为第一个非线性层的输入,第二输入矩阵为第二个非线性层的输入,第一输入矩阵和第二输入矩阵可以相同,例如,第一输入矩阵的各个元素、尺寸等分别与第二输入矩阵的各个元素、尺寸等相同;第一输入矩阵和第二输入矩阵也可以不相同,例如,第一输入矩阵的尺寸与第二输入矩阵的尺寸可以不相同,第一输入矩阵的至少部分元素与第二输入矩阵的至少部分元素不相同。
例如,不同的非线性层输出的输出矩阵各不相同。
例如,在一些实施例中,如图4所示,第N个非线性层420对应的第N个非线性函数可以表示为:y NN=f NN(x),其中,x表示第N个非线性层420的输入。不同的非线性层具有不同的函数表达式,例如,当第N个非线性层420为激活层,且激活层中的激活函数为ReLU函数时,第N个非线性层420对应的 非线性函数的表达式可以表示为:
y NN=f NN(x)=max(x>0,x,0)。
又例如,当第N个非线性层420为激活层,且激活层中的激活函数为Sigmoid函数时,第N个非线性层420对应的非线性函数的表达式表示为:
Figure PCTCN2019127431-appb-000017
例如,在一些实施例中,在步骤S11中,输入矩阵401为输入至第N个非线性层的输入矩阵,即输入矩阵401被输入至第N个非线性层420。第N个非线性层420对输入矩阵401进行相应的处理(例如,激活处理)以得到输出矩阵402,从而输出矩阵402为第N个非线性层420输出的输出矩阵。如图4所示,x1表示输入矩阵401,y1表示输出矩阵402,则y1=f NN(x1)。
例如,输入矩阵401和输出矩阵402均可以为二维矩阵,输入矩阵401的尺寸和输出矩阵402的尺寸相同。输入矩阵401中的第i行第j列的点的值表示为x1 i,j,输出矩阵402中的第i行第j列的点的值表示为y1 i,j,其中,i和j均为正整数,且0<i≤Q1,0<j≤Q2,Q1和Q2为正整数,Q1表示输入矩阵401的总行数,Q2表示输入矩阵401的总列数。例如,当第N个非线性层420为激活层,且激活层中的激活函数为ReLU函数时,y1 i,j=max(x1 i,j>0,x1 i,j,,0)。
例如,一个包括ReLU函数的非线性激活对应于该非线性层对于其输入的泰勒展开。例如,基于输入矩阵401,对第N个线性层420对应的非线性函数执行泰勒展开,以确定第N个非线性层420的泰勒展开式,例如,第N个非线性层420的泰勒展开式是:
f NN(x)=f NN(x1)+(Df)(x1)·(x-x1)+....
=(Df)(x1)·x+f NN(x1)-(Df)(x1)·x1....。
需要说明的是,上述泰勒展开式的高阶项均为非线性的,为了对非线性层进行线性化,泰勒展开式的高阶项在x1处均可以被省略,从而可以得到第N个非线性层420对应的线性函数表达式。
例如,在步骤S12中,第N个非线性层420对应的第N个线性函数包括第一参数和第二参数。第N个线性函数的表达式表示为:
y LN=f LN=A N*x+B N
其中,f LN表示第N个线性函数,A N表示第N个线性函数的第一参数,B N表示第N个线性函数的第二参数,x表示第N个非线性层的输入,如图4所 示,A N和B N根据输入矩阵401和输出矩阵402确定。
需要说明的是,第N个线性函数是基于第N个非线性层和输入至第N个非线性层的输入矩阵共同决定的。当第N个非线性层固定,而输入至第N个非线性层的输入矩阵不相同,则第N个线性函数中的第一参数和第二参数不相同;当输入矩阵相同,而第N个非线性层为不同类型的非线性层,即输入矩阵输入至不同的非线性层,则第N个线性函数中的第一参数和第二参数也不相同。也就是说,不同的输入矩阵输入同一个非线性层可以得到不同的线性函数;同一个输入矩阵输入不同的非线性层也可以得到不同的线性函数。
例如,如图4所示,基于第N个线性函数的线性系统执行的操作和第N个非线性层420执行的操作相近似,也就是说,在可接受的微小误差范围内,可以认为基于第N个线性函数的线性系统执行的操作和第N个非线性层420执行的操作相同,从而第N个非线性层420可以由形式为f LN=A N*x+B N的线性系统替代。
例如,第一参数A N和第二参数B N可以为常量。在一些示例中,第一参数A N和第二参数B N均为矩阵,矩阵中所有值均为常数;在另一些实施例中,第一参数A N和第二参数B N均为常数;在又一些实施例中,第一参数A N和第二参数B N其中之一为矩阵,另一个为常数。
例如,根据上述泰勒展开式,第N个线性函数的第一参数A N的表达式可以表示为:
A N=(Df NN)(x1),
其中,Df NN表示第N个非线性层420对应的非线性函数在x1处的一阶导数,x1表示输入矩阵401。
例如,根据上述泰勒展开式,第N个线性函数的第二参数B N的表达式可以表示为:
B N=f NN(x1)-(Df NN)(x1)*(x1)=f NN(x1)-A N*(x1),
其中,f NN表示第N个非线性层420对应的非线性函数,f NN(x1)表示输出矩阵402。
例如,当第N个非线性层420为激活层,且激活层中的激活函数为ReLU函数时,第N个线性函数的第一参数A N为1或0,第N个线性函数的第二参数B N为0。
图5A为本公开一些实施例提供的一种神经网络的部分结构示意图,图5B 为本公开一些实施例提供的一种修改后的神经网络的部分结构示意图。
例如,神经网络还包括卷积层。如图5A所示,在一些实施例中,神经网络包括第一卷积层41、第N个非线性层420和第二卷积层43,且第一卷积层41、第N个非线性层420和第二卷积层43依次连接。第N个非线性层42可以为激活层。例如,输入矩阵401可以为第一卷积层41的输出,即第一卷积层41输出输入矩阵401,第N个非线性层420对该输入矩阵401进行处理以得到输出矩阵402。输出矩阵402为第二卷积层43的输入。例如,在图5A所示的示例中,输入矩阵401和输出矩阵402均可以为特征图。
需要说明的是,神经网络还可以包括平均池化层、全连接层等。
例如,对第N个非线性层进行线性化处理后,可以确定与该第N个非线性层对应的第N个线性解释器单元,利用第N个线性解释器单元替换第N个非线性层则可以得到修改后的神经网络。如图5B所示,修改后的神经网络包括第一卷积层41、第N个线性解释单元421和第二卷积层43,第N个线性解释单元421与图5A中的第N个非线性层420对应,且图5A所示的神经网络中的第N个非线性层420被替换为第N个线性解释单元421则得到图5B所示的修改后的神经网络。第N个线性解释单元421对应的函数表达式为第N个线性函数,即,f LN=A N*x+B N。第一卷积层41和第二卷积层43的结构、参数等保持不变。
例如,如图5A和图5B所示,第N个线性解释器单元421和第N个非线性层420可以执行相类似的操作,也就是说,对于同一个输入矩阵x1,当输入矩阵x1被输入至第N个非线性层420后,得到输出矩阵y1;当输入矩阵x1被输入至第N个线性解释器单元421后,也可以得到输出矩阵y1。
例如,对于激活函数为ReLU的激活层,可以采用具有二值掩膜(binary mask)功能的线性解释器单元替代;对于激活函数为sigmoid的激活层,可以采用具有连续掩膜(continuous mask)功能的线性解释器单元替代;对于最大池化层,可以采用具有非均匀下采样(Non-uniform Downsampling)功能的线性解释器单元替代;对于实例归一化层,可以采用具有线性归一化(Linear Normalization)功能的线性解释器单元替代。
例如,在一些实施例中,神经网络的处理方法还可以对神经网络中的所有非线性层进行线性化处理以确定所有非线性层分别对应的线性函数的表达式,从而得到与所有非线性层分别对应的线性解释器单元,由此则可以确定神经网 络对应的线性化神经网络。对神经网络的各个非线性层的线性化处理过程可以参考上述对步骤S11和步骤S12的相关描述。例如,对神经网络中的所有非线性层进行线性化处理后,可以确定与神经网络中的所有非线性层一一对应的线性解释器单元,然后将神经网络中的所有非线性层替换为对应的线性解释器单元,即神经网络中的所有非线性层分别被对应的线性解释器单元替代,从而得到线性化神经网络。例如,神经网络中的所有非线性层与线性化神经网络中的所有线性解释器单元一一对应。由此,利用线性解释器单元来解释非线性层的操作(例如,激活操作),线性化神经网络为线性系统,从而可以利用线性化神经网络对神经网络进行分析评估等操作。值得注意的是,在本公开中,“神经网络”为一个非线性系统,即该“神经网络”包括非线性层和线性层;“线性化神经网络”为一个线性系统,即该“线性化神经网络”包括线性解释器单元和线性层。线性解释器单元并不是神经网络中的实际存在的层结构,只是为了便于描述而定义的层结构。
需要说明的是,也可以仅将神经网络中的部分非线性层替换为对应的线性解释器单元。例如,将神经网络中的所有激活层替换为与激活层对应的线性解释器单元,将神经网络中的所有实例归一化层替换为与实例归一化层对应的线性解释器单元,而神经网络中的最大池化层和softmax层则保持不变。
例如,在一些实施例中,可以将整个神经网络等效为一个线性解释器单元,此时,神经网络的处理方法包括以下步骤:利用神经网络处理输入图像,以得到输出图像;根据输入图像和输出图像,确定与神经网络对应的线性解释器单元。输入图像可以为各种类型的图像,例如,输入图像可以为通过数码相机或手机等图像采集设备拍摄的图像,本公开的实施例对此不作限制。
图6A为本公开一些实施例提供的一种神经网络的结构示意图,图6B为本公开一些实施例提供的一种线性化神经网络的结构示意图。
例如,如图6A所示,在一些实施例中,神经网络可以等效为一个非线性系统,该非线性系统的函数表达式为:y NN1=f NN1(x),其中,x表示神经网络的输入,例如,x表示可以表示一幅输入图像。y NN1为非线性函数,例如y NN1可以为x的高阶多次项表达式。图6A中示出的神经网络可以包括五个线性层和五个非线性层,每个非线性层位于相邻两个线性层之间。
例如,当神经网络中的所有非线性层均被替换为线性解释器单元后,可以得到如图6B所示的线性化神经网络。如图6B所示,线性化神经网络包括五 个线性层和五个线性解释器单元,该五个线性解释器单元与图6A所示的五个非线性层一一对应。线性化神经网络为一个线性系统,线性化神经网络对应的线性函数的表达式为:y NN2=f NN2(x)=A NN2*x+B NN2,其中,A NN2和B NN2为线性化神经网络对应的线性函数的参数,且均为常数项。
图7为本公开一些实施例提供的一种基于神经网络的数据分析方法的流程图。
例如,神经网络可以为上述神经网络的处理方法处理后得到的线性化神经网络。如图7所示,本公开一些实施例提供的数据分析方法包括以下步骤:
S21:获取输入数据;
S22:利用神经网络对输入数据进行处理以得到第一输出数据;
S23:根据输入数据和第一输出数据,执行神经网络的处理方法,对神经网络中的所有非线性层进行线性化处理,以确定与神经网络对应的线性化神经网络;
S24:基于线性化神经网络,分析输入数据和第一输出数据之间的对应关系。
在本公开一些实施例提供的数据分析方法中,由于线性化神经网络为线性系统,从而可基于该线性系统分析神经网络对各种数据的处理过程,实现神经网络的可视化。
线性系统可以通过脉冲响应而被完全描述。由于线性化神经网络为线性系统,从而可以对该线性化神经网络进行脉冲响应分析。当分析线性化神经网络对一张图像的处理过程时,脉冲响应可以示出输入图像中的输入像素对输出图像中的输出像素的影响,例如,经过线性化神经网络处理该输入图像后,可以确定各个输入像素与各个输出像素的转换关系,例如,哪些输入像素用于获得输出像素,每个输入像素的比重等。根据线性系统的标准方法,也可以获得相反的关系,即输出像素对输入像素的影响,例如,某个输出像素对应于哪些输入像素等。当分析线性化神经网络中的某一非线性层对输入矩阵的处理过程时,类似地,脉冲响应可以示出输入矩阵中的输入元素对输出矩阵中的输出元素的影响。
本公开的一些实施例提供的数据分析方法可以应用于图像识别、图像分类、语音识别、语音分类等领域。
例如,在步骤S21和步骤S22中,输入数据和第一输出数据可以为图像、 文本、语音等。例如,在输入数据和第一输出数据为图像的情况下,输入数据和第一输出数据可以为二维矩阵;在输入数据和第一输出数据为文本或语音的情况下,输入数据和第一输出数据可以为一维矩阵。
例如,如图6A所示,在一些实施例中,神经网络对输入数据501进行处理以得到第一输出数据502。输入数据501表示为x2,第一输出数据502表示为y2,则y2=f NN1(x2)。
例如,在步骤S23中,神经网络的处理方法为根据本公开上述任一实施例提供的处理方法。关于对神经网络中的所有非线性层进行线性化处理的详细说明可以参考上述神经网络的处理方法中关于步骤S11-S12的相关描述,重复之处在此不再赘述。
需要说明的是,在步骤S23中,线性化神经网络是基于输入数据和神经网络共同决定的。当神经网络的结构、参数等固定,而输入至神经网络中的输入数据不相同,则可以得到不同的线性化神经网络;当输入数据相同,而神经网络的结构、参数等不相同,即输入数据输入至不同的神经网络,则也可以得到不同的线性化神经网络。也就是说,不同的输入数据输入同一个神经网络则可以得到不同的线性化神经网络;同一个输入数据输入不同的神经网络也可以得到不同的线性化神经网络。
例如,步骤S24可以包括:根据输入数据,确定探测数据组,其中,探测数据组为二值矩阵组;利用线性化神经网络处理探测数据组,以得到第二输出数据组;以及基于探测数据组和第二输出数据组,分析输入数据和第一输出数据之间的正向影响或反向影响。
例如,“正向影响”表示输入数据中的各个输入元素对第一输出数据中的各个输出元素的影响,例如,每个输入元素可以对应第一输出数据中的哪些输出元素等;“反向影响”表示第一输出数据中的各个输出元素对输入数据中的各个输入元素的影响,例如,每个输出元素可以对应输入数据中的哪些输入元素等。
例如,根据输入数据可以确定探测数据组中的探测数据的数量和尺寸等。每个探测数据的尺寸和输入数据的尺寸相同。
例如,在一些实施例中,探测数据组包括至少一个探测数据,第二输出数据组包括至少一个第二输出数据,至少一个探测数据与至少一个第二输出数据一一对应。例如,探测数据组可以包括三个探测数据,则,第二输出数据组包 括三个第二输出数据,第一个探测数据与第一个第二输出数据对应,第二个探测数据与第二个第二输出数据对应,第三个探测数据与第三个第二输出数据对应。
例如,每个探测数据为二值矩阵。需要说明的是,二值矩阵表示该二值矩阵中的元素的值为1或0。
在本公开中,“脉冲响应”表示一个输入(例如,探测数据)的输出(例如,第二输出数据),该探测数据中某个像素(例如,目标探测元素)的值为1,而其余所有的元素(例如,非目标探测元素)的值为0。
例如,探测数据和第二输出数据也可以为图像、文本、语音等。例如,在探测数据和第二输出数据为图像的情况下,探测数据和第二输出数据可以为二维矩阵;在探测数据和第二输出数据为文本或语音的情况下,探测数据和第二输出数据可以为一维矩阵。
例如,当输入数据为图像时,输入数据中的输入元素表示图像中的像素;当输入数据为文本时,输入数据中的输入元素表示文本数据中的汉字或字母;当输入数据为语音时,输入数据中的输入元素表示语音数据中的声元。需要说明的是,上述说明是以输入数据为例说明数据中的元素,上述说明对第一输出数据、探测数据和第二输出数据同样适用。
例如,在步骤S24中,基于探测数据组和第二输出数据组,分析输入数据和第一输出数据之间的正向影响,包括:利用线性化神经网络分别处理至少一个探测数据,以得到至少一个第二输出数据;通过分析至少一个探测数据和至少一个第二输出数据在元素级别上的对应关系,确定输入数据的各个输入元素对第一输出数据的各个输出元素的正向影响。
例如,每个探测数据包括目标探测元素,目标探测元素的值为1,每个探测数据中除了目标探测元素之外的其余探测元素均为非目标探测元素,而非目标探测元素的值均为0。
例如,如图6B所示,在一些实施例中,探测数据组包括探测数据503,第二输出数据组包括第二输出数据504,且探测数据503表示为x3[n,m],第二输出数据504表示为y3[p,q],该探测数据503与第二输出数据504对应。在一些示例中,探测数据503中的位于第n0行第m0列的元素为目标探测元素,探测数据503中的其余的元素均为非目标探测元素,从而探测数据503可以表示为:
Figure PCTCN2019127431-appb-000018
其中,n、m、n0和m0均为正整数,且0<n≤Q3,0<m≤Q4,Q3和Q4为正整数,Q3表示探测数据503的总行数,Q4表示探测数据503的总列数。例如,输入数据501的尺寸和探测数据503的尺寸相同。
例如,如图6B所示,线性化神经网络对应的线性函数的表达式为:y NN2=f NN2(x)=A NN2*x+B NN2,从而第二输出数据504表示为:
y3[p,q]=A NN2*x3[n,m]+B NN2
其中,y2[p,q]表示第二输出数据504,p和q均为正整数,且0<p≤Q5,0<q≤Q6,Q5和Q6为正整数,Q5表示第二输出数据504的总行数,Q6表示第二输出数据504的总列数。例如,第二输出数据504的尺寸和第一输出数据502的尺寸可以相同。
例如,由于探测数据503中的目标探测元素位于第n0行第m0列,则根据探测数据503和第二输出数据504可以确定探测数据503中的位于第n0行第m0列的探测元素对第二输出数据504中的各个输出元素的贡献,即正向影响。
需要说明的是,在本公开的实施例中,一维矩阵的尺寸表示一维矩阵中的元素的数量;二维矩阵的尺寸表示二维矩阵中的行数和列数。例如,输入数据501和探测数据503均为二维矩阵时,“输入数据501的尺寸和探测数据503的尺寸相同”可以表示输入数据501中的行数和探测数据503的行数相同,且输入数据501中的列数和探测数据503的列数相同;输入数据501和探测数据503均为一维矩阵时,“输入数据501的尺寸和探测数据503的尺寸相同”可以表示输入数据501中的元素的数量和探测数据503中的元素的数量相同。
例如,第二参数B NN2可以表示线性化神经网络对全零矩阵处理得到的输出,第二参数B NN2可以表示偏置系数。
由于探测数据503中,仅位于第n0行第m0列的元素的值为1,探测数据503中的其余元素的值为0,从而通过分析上述探测数据503和第二输出数据504在元素级别上的对应关系,则可以得到输入数据501中位于第n0行第m0列的输入元素对第一输出数据502中的所有输出元素的贡献。
例如,在探测数据组包括多个探测数据的情况下,多个探测数据中至少部分探测数据中的目标探测元素的位置不同。例如,在一些实施例中,输入数据501包括Q3*Q4个输入元素,探测数据组可以包括Q3*Q4个探测数据,每个 探测数据中的目标探测元素对应输入数据501中的一个输入元素,Q3*Q4个探测数据的Q3*Q4个目标探测像素的位置分别与输入数据501中的Q3*Q4输入元素的位置一一对应。也就是说,若一个探测数据的目标探测元素位于第一行第一列,则该探测数据的目标探测元素与输入数据中位于第一行第一列的输入元素对应,通过对该探测数据和与其对应的第二输出数据进行分析,则可以确定输入数据501中位于第一行第一列的输入元素对第一输出数据502中的各个输出元素的贡献。
例如,通过对应Q3*Q4个探测数据进行分析,则可以确定输入数据501中的每个输入元素对第一输出数据502中的各个输出元素的贡献。但本公开不限于此,根据实际应用需求,可以仅对输入数据501中的部分输入元素进行分析,此时,可以仅对与输入数据501中需要分析的输入元素对应的探测数据进行存储和分析,从而节省存储空间和系统资源。
例如,多个探测数据的尺寸相同。
需要说明的是,上述描述中以探测数据中仅包括一个目标探测元素为例说明本公开的实施例,即分析输入数据中的某个特定输入元素(例如,位于第n0行第m0列的输入元素)对输出的正向影响,但本公开的实施例不限于此。也可以分析输入数据中的多个特定输入元素对输出的正向影响,从而每个探测数据可以包括多个目标探测元素(例如,两个目标探测元素、三个目标探测元素等),该多个目标探测元素的值均为1。除了该多个目标探测元素之外,探测数据中的其余元素的值均为0。
例如,在另一些实施例中,探测数据组包括多个探测数据,第二输出数据组包括多个第二输出数据,多个探测数据与多个第二输出数据一一对应,每个探测数据为二值矩阵。多个探测矩阵的尺寸相同。
此时,在步骤S24中,基于探测数据组和第二输出数据组,分析输入数据和第一输出数据之间的反向影响,包括:利用线性化神经网络分别处理多个探测数据,以得到多个第二输出数据;通过分析多个探测数据和多个第二输出数据在元素级别上的对应关系,确定输出数据的各个输出元素对输入数据的各个输入元素的反向影响。
例如,每个探测数据包括目标探测元素,目标探测元素的值为1,每个探测数据中除了目标探测元素之外的其余探测元素均为非目标探测元素,而非目标探测元素的值均为0。探测数据组中的多个探测数据的数量和每个探测数据 中的元素的数量相同,多个探测数据中任意两个探测数据的目标探测元素的位置不相同。
例如,输入数据501包括多个输入元素,多个探测数据包括多个目标探测元素,多个输入元素和多个目标探测元素一一对应。也就是说,若输入矩阵501包括Q3*Q4个输入元素,探测数据组可以包括Q3*Q4个探测数据,每个探测数据中的目标探测元素对应输入数据501中的一个输入元素,Q3*Q4个探测数据中的Q3*Q4个目标探测元素的位置分别与输入数据501中的Q3*Q4输入元素的位置一一对应。
在分析反向影响时,即分析第一输出数据502对输入数据501的影响时,若要分析该第一输出数据502是如何被输入数据501影响的,对于第一输出数据502中的某个特定输出元素,由于不清楚该特定输出元素是由输入数据501中的一个或几个输入元素得到的,因此,可以输入与输入数据501中的所有输入元素一一对应的多个探测数据,以分析输入数据501中的所有输入元素对该特定输出元素的影响。
综上所述,可以向线性化神经网络输入探测数据,可以在元素级别上分析该输入的探测数据与对应的第二输出数据之间的关系,以获得输入数据和第一输出数据之间的正向影响或反向影响,从而分析神经网络中的非线性层对输入数据的具体处理过程,确定第一输出数据的一个特定输出元素由输入数据中的哪些输入元素决定(反向影响),以及确定输入数据中的各个输入元素对第一输出数据的一个特定输出元素的贡献量(正向影响)。
需要说明的是,上面以分析神经网络的处理过程为例对本公开实施例提供的数据分析方法进行描述,但本公开不限于此。在一些示例中,可以分析神经网络中的某一个非线性层的数据处理过程,非线性层的数据处理过程与上面数据分析方法的过程相似,在此不再赘述。
图8为本公开一些实施例提供的一种神经网络的评估方法的流程图;图9为本公开一些实施例提供的一种神经网络的训练方法的流程图。
例如,如图8所示,本公开一些实施例提供的神经网络的评估方法可以包括以下步骤:
S31:执行神经网络的处理方法,以确定与至少一个非线性层对应的至少一个线性解释器单元;
S32:基于至少一个线性解释器单元,对神经网络进行评估。
例如,在步骤S31中,神经网络的处理方法为根据本公开上述任一实施例提供的处理方法。关于对神经网络中的至少一个非线性层进行线性化处理的详细说明可以参考上述神经网络的处理方法中关于步骤S11-S12的相关描述,重复之处在此不再赘述。
例如,在步骤S31中,可以对神经网络中的所有非线性层进行线性化处理,以得到与神经网络中的所有非线性层一一对应的多个线性解释器单元。
例如,在一些实施例中,步骤S32可以包括:对至少一个线性解释器单元进行评估以确定至少一个非线性层的评估结果;基于评估结果,对神经网络进行训练。
例如,在步骤S32中,可以向至少一个线性解释器单元输入探测数据,以得到第二输出数据。通过分析探测数据与第二输出数据之间的关系,获得输入数据和第一输出数据之间的正向影响或反向影响,从而确定至少一个非线性层的评估结果,进而确定神经网络中的每个非线性层对输入的贡献度。
例如,如图9所示,在一些实施例中,基于评估结果,对神经网络进行训练包括:
S41:基于评估结果,确定至少一个非线性层的训练权重;
S42:获取训练输入数据和训练目标数据;
S43:利用神经网络对训练输入数据进行处理,以得到训练输出数据;
S44:根据训练输出数据和训练目标数据,计算神经网络的损失函数的损失值;
S45:基于至少一个非线性层的训练权重和损失值对神经网络的参数进行修正;
S46:判断神经网络的损失函数是否满足预定条件;
在神经网络的损失函数满足预定条件时,执行步骤S47,即得到训练好的所述神经网络;
在线性化神经网络的损失函数不满足预定条件时,返回到步骤S42,继续输入训练输入数据和训练目标数据以重复执行上述训练过程。
例如,在步骤S41中,在对神经网络中的所有非线性层进行线性化处理,以得到与神经网络中的所有非线性层一一对应的多个线性解释器单元的情况下,可以根据多个线性解释器单元,对神经网络中的所有非线性层进行评估,以确定神经网络中的所有非线性层的训练权重。需要说明的是,在步骤S41中, 也可以仅对神经网络中的部分非线性层进行线性化处理,从而在训练过程中,可以确定与该部分非线性层的训练权重。
例如,在步骤S41中,可以基于脉冲响应,分析每个线性解释器单元对输入的贡献度(即权重),从而确定在通过神经网络对输入数据进行处理过程中,神经网络中的每个非线性层对输入数据的贡献度(即权重),确定如何改进神经网络的滤波器数量、参数等,优化网络配置。需要说明的是,还可以基于脉冲响应,分析神经网络中的每个线性层对输入数据的贡献度(即权重)。例如,对于贡献度较低的层(非线性层和/或线性层),可以直接去除,从而降低神经网络的复杂度,减小训练神经网络的过程中的数据量;或者,在训练过程中,在步骤S45中,可以不对该贡献度较低的层的参数进行修正。对于贡献度较高的层(非线性层和/或线性层),则在训练过程中可以重点训练该贡献度较高的层,即在训练过程中,在步骤S45中,调整贡献度较高的层的参数,使其达到最优。
例如,在步骤S45中,可以将训练目标数据作为训练输出数据的目标值,不断优化神经网络的参数,最终得到训练好的神经网络。
例如,在步骤S46中,在一个示例中,预定条件对应于在输入一定数量的训练输入数据和训练目标数据的情况下,神经网络的损失函数的损失收敛。在另一个示例中,预定条件为神经网络的训练次数或训练周期达到预定数目,该预定数目可以为上百万,只要训练输入数据和训练目标数据的集合足够大。
图10为本公开一些实施例的一种神经网络的处理装置的示意图。如图10所示,神经网络的处理装置90可以包括存储器905和处理器910。存储器905用于存储计算机可读指令。处理器910用于运行计算机可读指令,计算机可读指令被处理器910运行时可以执行根据上文任一实施例所述神经网络的处理方法。例如,神经网络包括至少一个非线性层,计算机可读指令被处理器910运行时可以执行以下操作:利用至少一个非线性层中的第N个非线性层处理输入至第N个非线性层的输入矩阵,以得到第N个非线性层输出的输出矩阵;根据输入矩阵和输出矩阵,对第N个非线性层进行线性化处理以确定第N个非线性层对应的第N个线性函数的表达式。
例如,第N个线性函数的表达式表示为:
f LN=A N*x+B N
其中,f LN表示第N个线性函数,A N表示第N个线性函数的第一参数,B N 表示第N个线性函数的第二参数,x表示第N个非线性层的输入,A N和B N根据第N个非线性层对应的输入矩阵和输出矩阵确定,其中,N为正整数。
例如,处理器910可以是中央处理单元(CPU)、张量处理器(TPU)或者具有数据处理能力和/或程序执行能力的器件,并且可以控制神经网络的处理装置90中的其它组件以执行期望的功能。中央处理元(CPU)可以为X86或ARM架构等。
例如,存储器905可以包括一个或多个计算机程序产品,所述计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。所述易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。所述非易失性存储器例如可以包括只读存储器(ROM)、硬盘、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、USB存储器、闪存等。在所述计算机可读存储介质上可以存储一个或多个计算机可读指令,处理器910可以运行所述计算机可读指令,以实现神经网络的处理装置90的各种功能。
例如,如图10所示,神经网络的处理装置90还包括允许外部设备与神经网络的处理装置90进行通信的输入接口915。例如,输入接口915可被用于从外部计算机设备、从用户等处接收指令。神经网络的处理装置90也可包括使神经网络的处理装置90和一个或多个外部设备相接口的输出接口920。例如,神经网络的处理装置90可以通过输出接口920输出非线性层对应的线性函数的第一参数和第二参数等。考虑了通过输入接口915和输出接口920与神经网络的处理装置90通信的外部设备可被包括在提供实质上任何类型的用户可与之交互的用户界面的环境中。用户界面类型的示例包括图形用户界面、自然用户界面等。例如,图形用户界面可接受来自用户采用诸如键盘、鼠标、遥控器等之类的输入设备的输入,以及在诸如显示器之类的输出设备上提供输出。此外,自然语言界面可使得用户能够以无需受到诸如键盘、鼠标、遥控器等之类的输入设备强加的约束的方式来与神经网络的处理装置90交互。相反,自然用户界面可依赖于语音识别、触摸和指示笔识别、屏幕上和屏幕附近的手势识别、空中手势、头部和眼睛跟踪、语音和语音、视觉、触摸、手势、以及机器智能等。
例如,存储器905和处理器910之间可以通过网络或总线系统实现数据传输。存储器905和处理器910之间可以直接或间接地互相通信。
需要说明的是,关于利用神经网络的处理装置90执行神经网络的处理方法的处理过程的详细说明可以参考神经网络的处理方法的实施例中的相关描述,重复之处不再赘述。
图11为本公开一些实施例的一种数据分析装置的示意图。例如,如图11所示,数据分析装置100可以基于神经网络实现数据分析过程,数据分析装置100可以包括存储器1001和处理器1002。存储器1001用于存储计算机可读指令。处理器1002用于运行所述计算机可读指令,计算机可读指令被处理器1002运行时可以执行根据上文任一实施例所述数据分析方法。例如,计算机可读指令被处理器1002运行时执行以下操作:获取输入数据;利用神经网络对输入数据进行处理以得到第一输出数据;根据输入数据和第一输出数据,执行根据上述任一实施例所述的神经网络的处理方法,对神经网络中的所有非线性层进行线性化处理,以确定与神经网络对应的线性化神经网络;基于线性化神经网络,分析输入数据和第一输出数据之间的对应关系。
例如,除了存储计算机可读指令,存储器1001还可存储训练输入数据和训练目标数据等。
例如,处理器1002可以是中央处理单元(CPU)、张量处理器(TPU)或者图形处理器(GPU)等具有数据处理能力和/或程序执行能力的器件,并且可以控制数据分析装置100中的其它组件以执行期望的功能。中央处理元(CPU)可以为X86或ARM架构等。GPU可以单独地直接集成到主板上,或者内置于主板的北桥芯片中。GPU也可以内置于中央处理器(CPU)上。
例如,存储器1002可以包括一个或多个计算机程序产品,所述计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。所述易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。所述非易失性存储器例如可以包括只读存储器(ROM)、硬盘、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、USB存储器、闪存等。在所述计算机可读存储介质上可以存储一个或多个计算机可读指令,处理器1002可以运行所述计算机可读指令,以实现数据分析装置100的各种功能。
例如,存储器1001和处理器1002之间可以通过网络或总线系统实现数据传输。存储器1001和处理器1002之间可以直接或间接地互相通信。
例如,如图11所示,数据分析装置100还包括允许外部设备与数据分析 装置100进行通信的输入接口1003。例如,输入接口1003可被用于从外部计算机设备、从用户等处接收指令。数据分析装置100也可包括使数据分析装置100和一个或多个外部设备相接口的输出接口1004。例如,数据分析装置100可以通过输出接口1004输出分析结果等。考虑了通过输入接口1003和输出接口1004与数据分析装置100通信的外部设备可被包括在提供实质上任何类型的用户可与之交互的用户界面的环境中。用户界面类型的示例包括图形用户界面、自然用户界面等。例如,图形用户界面可接受来自用户采用诸如键盘、鼠标、遥控器等之类的输入设备的输入,以及在诸如显示器之类的输出设备上提供输出。此外,自然语言界面可使得用户能够以无需受到诸如键盘、鼠标、遥控器等之类的输入设备强加的约束的方式来与数据分析装置100交互。相反,自然用户界面可依赖于语音识别、触摸和指示笔识别、屏幕上和屏幕附近的手势识别、空中手势、头部和眼睛跟踪、语音和语音、视觉、触摸、手势、以及机器智能等。
另外,数据分析装置100尽管图中被示出为单个系统,但可以理解,数据分析装置100也可以是分布式系统,还可以布置为云设施(包括公有云或私有云)。因此,例如,若干设备可以通过网络连接进行通信并且可共同执行被描述为由数据分析装置100执行的任务。
需要说明的是,关于利用数据分析装置100执行数据分析的过程的详细说明可以参考数据分析方法的实施例中的相关描述,重复之处不再赘述。
本公开一些实施例还提供一种非瞬时性计算机可读存储介质的示意图。
例如,在一些实施例中,在非瞬时性计算机可读存储介质上可以存储一个或多个第一计算机可读指令。例如,当所述第一计算机可读指令由计算机执行时可以执行根据上文所述的神经网络的处理方法中的一个或多个步骤。
例如,在另一些实施例中,在非瞬时性计算机可读存储介质上还可以存储一个或多个第二计算机可读指令。当所述第二计算机可读指令由计算机执行时可以执行根据上文所述的数据分析方法中的一个或多个步骤。
例如,在又一些实施例中,在非瞬时性计算机可读存储介质上还可以存储一个或多个第三计算机可读指令。当所述第三计算机可读指令由计算机执行时可以执行根据上文所述的神经网络的评估方法中的一个或多个步骤。
对于本公开,还有以下几点需要说明:
(1)本公开实施例附图只涉及到与本公开实施例涉及到的结构,其他结 构可参考通常设计。
(2)在不冲突的情况下,本公开的实施例及实施例中的特征可以相互组合以得到新的实施例。
以上所述仅为本公开的具体实施方式,但本公开的保护范围并不局限于此,本公开的保护范围应以所述权利要求的保护范围为准。

Claims (15)

  1. 一种神经网络的处理方法,其中,所述神经网络包括至少一个非线性层,
    所述处理方法包括:
    利用所述至少一个非线性层中的第N个非线性层处理输入至所述第N个非线性层的输入矩阵,以得到所述第N个非线性层输出的输出矩阵;
    根据所述输入矩阵和所述输出矩阵,对所述第N个非线性层进行线性化处理以确定所述第N个非线性层对应的第N个线性函数的表达式,
    其中,所述第N个线性函数的表达式表示为:
    f LN=A N*x+B N
    其中,f LN表示所述第N个线性函数,A N表示所述第N个线性函数的第一参数,B N表示所述第N个线性函数的第二参数,x表示所述第N个非线性层的输入,A N和B N根据所述输入矩阵和所述输出矩阵确定,其中,N为正整数。
  2. 根据权利要求1所述的处理方法,其中,所述第N个线性函数的第一参数的表达式为:
    A N=(Df NN)(x1),
    其中,Df NN表示所述第N个非线性层对应的非线性函数的一阶导数,x1表示所述输入矩阵;
    所述第N个线性函数的第二参数的表达式为:
    B N=f NN(x1)-A*(x1),
    其中,f NN表示所述第N个非线性层对应的非线性函数,f NN(x1)表示所述输出矩阵。
  3. 根据权利要求1或2所述的处理方法,还包括:
    对所述神经网络中的所有非线性层进行所述线性化处理以确定所述所有非线性层分别对应的线性函数的表达式。
  4. 根据权利要求1-3任一项所述的处理方法,其中,所述至少一个非线性层包括激活层、实例归一化层、最大池化层或softmax层,
    所述激活层的激活函数为ReLU函数、tanh函数或sigmod函数。
  5. 一种基于神经网络的数据分析方法,包括:
    获取输入数据;
    利用所述神经网络对所述输入数据进行处理以得到第一输出数据;
    根据所述输入数据和所述第一输出数据,执行根据权利要求1-4任一项所述的处理方法,对所述神经网络中的所有非线性层进行所述线性化处理,以确定与所述神经网络对应的线性化神经网络;
    基于所述线性化神经网络,分析所述输入数据和所述第一输出数据之间的对应关系。
  6. 根据权利要求5所述的数据分析方法,其中,基于所述线性化神经网络,分析所述输入数据和所述输出数据之间的对应关系,包括:
    根据所述输入数据,确定探测数据组,其中,所述探测数据组为二值矩阵组;
    利用所述线性化神经网络处理所述探测数据组,以得到第二输出数据组;
    基于所述探测数据组和所述第二输出数据组,分析所述输入数据和所述第一输出数据之间的正向影响或反向影响。
  7. 根据权利要求6所述的数据分析方法,其中,所述探测数据组包括至少一个探测数据,所述第二输出数据组包括至少一个第二输出数据,所述至少一个探测数据与所述至少一个第二输出数据一一对应,
    基于所述探测数据组和所述第二输出数据组,分析所述输入数据和所述第一输出数据之间的正向影响,包括:
    利用所述线性化神经网络分别处理所述至少一个探测数据,以得到所述至少一个第二输出数据;
    通过分析所述至少一个探测数据和所述至少一个第二输出数据在元素级别上的对应关系,确定所述输入数据的各个输入元素对所述第一输出数据的各个输出元素的正向影响,
    其中,所述探测数据组中的每个所述探测数据包括目标探测元素,所述目标探测元素的值为1,每个所述探测数据中除了所述目标探测元素之外的其余探测元素的值均为0。
  8. 根据权利要求7所述的数据分析方法,其中,在所述探测数据组包括多个探测数据的情况下,所述多个探测数据中至少部分探测数据中的目标探测元素的位置不同。
  9. 根据权利要求8所述的数据分析方法,其中,所述多个探测数据的尺寸相同,且所述多个探测数据的尺寸与所述输入数据的尺寸也相同。
  10. 根据权利要求6所述的数据分析方法,其中,所述探测数据组包括多个探测数据,所述第二输出数据组包括多个第二输出数据,所述多个探测数据与所述多个第二输出数据一一对应,
    基于所述探测数据组和所述第二输出数据组,分析所述输入数据和所述第一输出数据之间的反向影响,包括:
    利用所述线性化神经网络分别处理所述多个探测数据,以得到所述多个第二输出数据;
    通过分析所述多个探测数据和所述多个第二输出数据在元素级别上的对应关系,确定所述输出数据的各个输出元素对所述输入数据的各个输入元素的反向影响,
    其中,所述探测数据组中的每个所述探测数据包括目标探测元素,所述目标探测元素的值为1,每个所述探测数据中除了所述目标探测元素之外的其余探测元素的值均为0,所述多个探测数据的数量和每个所述探测数据中的所有探测元素的数量相同,所述多个探测数据中任意两个探测数据的目标探测元素的位置不同。
  11. 一种神经网络的评估方法,包括:
    执行根据权利要求1-4任一项所述的处理方法,以确定与所述至少一个非线性层对应的至少一个线性解释器单元;
    基于所述至少一个线性解释器单元,对所述神经网络进行评估。
  12. 根据权利要求11所述的评估方法,其中,基于所述至少一个线性解释器单元,对所述神经网络进行评估包括:
    对所述至少一个线性解释器单元进行评估以确定所述至少一个非线性层的评估结果;
    基于所述评估结果,对所述神经网络进行训练。
  13. 根据权利要求12所述的评估方法,其中,基于所述评估结果,对所述神经网络进行训练包括:
    基于所述评估结果,确定所述至少一个非线性层的训练权重;
    获取训练输入数据和训练目标数据;
    利用所述神经网络对所述训练输入数据进行处理,以得到训练输出数据;
    根据所述训练输出数据和所述训练目标数据,计算所述神经网络的损失函数的损失值;
    基于所述至少一个非线性层的训练权重和所述损失值对所述神经网络的参数进行修正,在所述神经网络的损失函数满足预定条件时,得到训练好的所述神经网络,在所述神经网络的损失函数不满足所述预定条件时,继续输入所述训练输入数据和所述训练目标数据以重复执行上述训练过程。
  14. 一种神经网络的处理装置,包括:
    存储器,用于存储计算机可读指令;以及
    处理器,用于运行所述计算机可读指令,所述计算机可读指令被所述处理器运行时执行根据权利要求1-4任一项所述的处理方法。
  15. 一种计算机可读存储介质,用于存储计算机可读指令,当所述计算机可读指令由计算机执行时执行根据权利要求1-4任一项所述的处理方法。
PCT/CN2019/127431 2019-01-25 2019-12-23 神经网络的处理方法及评估方法、数据分析方法及装置 WO2020151438A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/042,265 US20210049447A1 (en) 2019-01-25 2019-12-23 Neural network processing method and evaluation method, and data analysis method and device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910075152.0A CN109816098B (zh) 2019-01-25 2019-01-25 神经网络的处理方法及评估方法、数据分析方法及装置
CN201910075152.0 2019-01-25

Publications (1)

Publication Number Publication Date
WO2020151438A1 true WO2020151438A1 (zh) 2020-07-30

Family

ID=66605230

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/127431 WO2020151438A1 (zh) 2019-01-25 2019-12-23 神经网络的处理方法及评估方法、数据分析方法及装置

Country Status (3)

Country Link
US (1) US20210049447A1 (zh)
CN (1) CN109816098B (zh)
WO (1) WO2020151438A1 (zh)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11640526B2 (en) * 2017-05-23 2023-05-02 Intel Corporation Methods and apparatus for enhancing a neural network using binary tensor and scale factor pairs
CN109816098B (zh) * 2019-01-25 2021-09-07 京东方科技集团股份有限公司 神经网络的处理方法及评估方法、数据分析方法及装置
US11748853B2 (en) * 2020-04-28 2023-09-05 Carnegie Mellon University Method and architecture for blind image deconvolution
CN112991358A (zh) * 2020-09-30 2021-06-18 北京字节跳动网络技术有限公司 风格图像生成方法、模型训练方法、装置、设备和介质
CN113379031B (zh) * 2021-06-01 2023-03-17 北京百度网讯科技有限公司 神经网络的处理方法、装置、电子设备和存储介质
CN113484770B (zh) * 2021-06-10 2022-04-01 广东恒翼能科技有限公司 基于充放电数据在线测算电池内部核心温度的方法及系统
CN114330175B (zh) * 2021-11-18 2023-08-18 北京智芯微电子科技有限公司 用于增益结构的校准方法及装置、增益结构

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101881968A (zh) * 2009-05-05 2010-11-10 同济大学 一种基于模型的设备故障诊断方法
CN105404877A (zh) * 2015-12-08 2016-03-16 商汤集团有限公司 基于深度学习和多任务学习的人脸属性预测方法及装置
CN105550709A (zh) * 2015-12-14 2016-05-04 武汉大学 一种遥感影像输电线路走廊森林区域提取方法
CN107463927A (zh) * 2017-09-21 2017-12-12 广东工业大学 一种基于卷积神经网络的道路减速带检测方法及装置
CN109816098A (zh) * 2019-01-25 2019-05-28 京东方科技集团股份有限公司 神经网络的处理方法及评估方法、数据分析方法及装置

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080109178A1 (en) * 2006-11-03 2008-05-08 Nikon Corporation Method and system for predicting and correcting signal fluctuations of an interferometric measuring apparatus
US8295951B2 (en) * 2007-12-21 2012-10-23 The University Of Florida Research Foundation, Inc. Systems and methods for offset-free model predictive control
CN108460454B (zh) * 2017-02-21 2022-07-26 京东方科技集团股份有限公司 卷积神经网络和用于其的处理方法、装置、系统
US10984054B2 (en) * 2017-07-27 2021-04-20 Robert Bosch Gmbh Visual analytics system for convolutional neural network based classifiers
CN108122028A (zh) * 2017-12-21 2018-06-05 深圳先进技术研究院 深度非线性主成分分析网络的训练方法、装置及计算机可读存储介质
US11256977B2 (en) * 2017-12-29 2022-02-22 Facebook, Inc. Lowering hardware for neural networks
CN108304921B (zh) * 2018-02-09 2021-02-02 北京市商汤科技开发有限公司 卷积神经网络的训练方法及图像处理方法、装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101881968A (zh) * 2009-05-05 2010-11-10 同济大学 一种基于模型的设备故障诊断方法
CN105404877A (zh) * 2015-12-08 2016-03-16 商汤集团有限公司 基于深度学习和多任务学习的人脸属性预测方法及装置
CN105550709A (zh) * 2015-12-14 2016-05-04 武汉大学 一种遥感影像输电线路走廊森林区域提取方法
CN107463927A (zh) * 2017-09-21 2017-12-12 广东工业大学 一种基于卷积神经网络的道路减速带检测方法及装置
CN109816098A (zh) * 2019-01-25 2019-05-28 京东方科技集团股份有限公司 神经网络的处理方法及评估方法、数据分析方法及装置

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHENG JIE ET AL., A KIND OF ZINC-MANGANESE FLOW BATTERY
YUEHUA ET AL., AN AQUEOUS ZINC MANGANESE SECONDARY BATTERY

Also Published As

Publication number Publication date
CN109816098A (zh) 2019-05-28
CN109816098B (zh) 2021-09-07
US20210049447A1 (en) 2021-02-18

Similar Documents

Publication Publication Date Title
WO2020151438A1 (zh) 神经网络的处理方法及评估方法、数据分析方法及装置
JP6504590B2 (ja) 画像のセマンティックセグメンテーションのためのシステム及びコンピューター実施方法、並びに非一時的コンピューター可読媒体
Wang et al. Auto-encoder based dimensionality reduction
WO2020200030A1 (zh) 神经网络的训练方法、图像处理方法、图像处理装置和存储介质
WO2022001805A1 (zh) 一种神经网络蒸馏方法及装置
US12062158B2 (en) Image denoising method and apparatus
CN108764195B (zh) 手写模型训练方法、手写字识别方法、装置、设备及介质
WO2021022521A1 (zh) 数据处理的方法、训练神经网络模型的方法及设备
US20220130142A1 (en) Neural architecture search method and image processing method and apparatus
US11620496B2 (en) Convolutional neural network, and processing method, processing device, processing system and medium for the same
Zheng et al. Rethinking the Role of Activation Functions in Deep Convolutional Neural Networks for Image Classification.
KR102508860B1 (ko) 이미지에서의 키 포인트 위치의 인식 방법, 장치, 전자기기 및 매체
US20220198836A1 (en) Gesture recognition method, electronic device, computer-readable storage medium, and chip
US20230042004A1 (en) Query optimization for deep convolutional neural network inferences
WO2021185330A1 (zh) 数据增强方法和数据增强装置
US20230065965A1 (en) Text processing method and apparatus
WO2022156475A1 (zh) 神经网络模型的训练方法、数据处理方法及装置
CN115841596B (zh) 多标签图像分类方法及其模型的训练方法、装置
US20240096134A1 (en) Action Recognition System and Method
CN111753954A (zh) 一种稀疏化损失函数的超参数优化方法
CN114913339A (zh) 特征图提取模型的训练方法和装置
CN114548218A (zh) 图像匹配方法、装置、存储介质和电子装置
Zhu et al. Weighted pooling for image recognition of deep convolutional neural networks
CN115019053A (zh) 一种用于点云分类分割的动态图语义特征提取方法
Geadah et al. Advantages of biologically-inspired adaptive neural activation in RNNs during learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19910945

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19910945

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 19910945

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 240322)

122 Ep: pct application non-entry in european phase

Ref document number: 19910945

Country of ref document: EP

Kind code of ref document: A1