EP3590076A1 - Neural network data processing apparatus and method - Google Patents

Neural network data processing apparatus and method

Info

Publication number
EP3590076A1
EP3590076A1 EP17713634.8A EP17713634A EP3590076A1 EP 3590076 A1 EP3590076 A1 EP 3590076A1 EP 17713634 A EP17713634 A EP 17713634A EP 3590076 A1 EP3590076 A1 EP 3590076A1
Authority
EP
European Patent Office
Prior art keywords
array
data values
input data
neural network
position dependent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
EP17713634.8A
Other languages
German (de)
French (fr)
Inventor
Jacek Konieczny
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of EP3590076A1 publication Critical patent/EP3590076A1/en
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the present invention relates to the field of machine learning or deep learning based on neural networks. More specifically, the present invention relates to a neural network data processing apparatus and method, in particular for processing data in the fields of audio processing, computer vision, image or video processing, classification, detection and/or recognition.
  • BACKGROUND Weighted aggregation which is commonly used in many signal processing applications, such as image processing methods for image quality improvement, depth or disparity estimation and many other applications [Kaiming He, Jian Sun, Xiaoou Tang, "Guided Image Filtering", ECCV 2010], is a process in which input data is being combined to pack information present in a larger spatial area into one single spatial position, with additional input in form of aggregation weights that control the influence of each input data value on the result.
  • convolutional neural networks In deep-learning, a common approach recently used in many application fields is the utilization of convolutional neural networks. Generally, a specific part of such convolutional neural networks is at least one convolution (or convolutional) layer which performs a convolution of input data values with a learned kernel K producing one output data value per convolution kernel for each output position [J. Long, E. Shelhamer, T. Darrell, "Fully Convolutional Networks for Semantic Segmentation", CVPR 2015].
  • the convolution using the learned kernel K can be expressed mathematically as follows: wherein out(x, y) denotes the array of output data values, in(x - i, y - ; ' ) denotes a sub- array of an array of input data values and K(i,j) denotes the kernel comprising an array of kernel weights or kernel values of size (2r+1 ) x(2r+1 ). B denotes an optional learned bias term, which can be added for obtaining each output data value.
  • the weights of the kernel K are the same for the whole array of input data values in(x, y) and are generally learned during a learning phase of the neural network which, in case of 1 st order methods, consists of iteratively back-propagating the gradients of the neural network output back to the input layers and updating the weights of all the network layers by a partial derivative computed in this way.
  • embodiments of the invention provide a new approach for weighted
  • the neural network layer can compute aggregated data using individual aggregation weights that are learned for each individual spatial position.
  • Aggregation weights can be computed as a function of similarity features and learned weight kernels, resulting in individual aggregation weights for each output spatial position.
  • the invention relates to a data processing apparatus comprising one or more processors configured to provide a neural network.
  • the data to be processed by the data processing apparatus can be, for instance, two- dimensional image or video data or one-dimensional audio data.
  • the neural network provided by the one or more processors of the data processing apparatus comprises a neural network layer being configured to process an array of input data values, such as a two-dimensional array of input data values in(x, y), into an array of output data values, such as a two-dimensional array of output data values out(x, y).
  • the neural network layer can be a first layer or an intermediate layer of the neural network.
  • the array of input data values can be one-dimensional (i.e. a vector, e.g. audio or other e.g. temporal sequence), two-dimensional (i.e. a matrix, e.g. an image or other temporal or spatial sequence), or N-dimensional (e.g. any kind of N-dimensional feature array, e.g. provided by a conventional pre-processing or feature extraction and/or by other layers of the neural network).
  • the array of input data values can have one or more channels, e.g. for an RGB image one R-channel, one G-channel and one B-channel, or for a black/white image only one grey-scale or intensity channel.
  • channels can refer to any "feature”, e.g.
  • the array of input data values can comprise, for instance, two-dimensional RGB or grey scale image or video data representing at least a part of an image, or a one-dimensional audio signal.
  • the array of input data values can be, for instance, any kind of array of features generated by previous layers of the neural network on the basis of an initial, e.g. original array of input data values, e.g. by means of a feature extraction.
  • the neural network layer is configured to generate from the array of input data values the array of output data values on the basis of a plurality of position dependent, i.e. spatially variable kernels and a plurality of different sub-arrays at different positions of the array of input data values.
  • Each kernel comprises a plurality of kernel values or kernel weights.
  • a respective kernel is applied to a respective sub-array of the array of input data values to generate a single output data value.
  • a "position dependent kernel” as used herein means a kernel whose kernel weights depend on the respective position, e.g. (x,y) for two-dimensional arrays, of the sub-array of input data values.
  • the kernel values applied to a first sub-array of the array of input data values can differ from the kernel values of a second kernel applied to a second sub-array of the array of input data values.
  • the position could be a spatial position defined, for instance, by two spatial coordinates.
  • the position In a one-dimensional array the position could be a temporal position defined, for instance, by a time coordinate.
  • the data processing apparatus allows to aggregate the input data in a way that can better reflect mutual data similarity, i.e. the resultant output data value is more strongly influenced by input data values that are closer and more similar to input data in the center position of the kernel. Moreover, the data processing apparatus allows adapting the kernel weights for different spatial positions of the array of input data values. This, in turn, allows, for instance, minimizing the influence of some of the input data values on the result, for instance the input data values that are associated with another part of the scene (as determined by semantic segmentation) or a different object that is being analysed.
  • the neural network comprises at least one additional network layer configured to generate the plurality of position dependent kernels on the basis of an original array of original input values of the neural network, wherein the original array of original input values of the neural network comprises the array of input values or another array of input values associated to the array of input values.
  • the original array of original input values can be the array of input data values or a different array.
  • the neural network is configured to generate the plurality of position dependent kernels based on a plurality of learned position independent kernels and a plurality of position dependent weights.
  • the position independent kernels can be learned by the neural network and the position dependent weights or similarity features can be computed, for instance, by a further preceding layer of the neural network.
  • This implementation form allows minimizing the amount of data being transferred to the neural network layer in order to obtain the kernel values. This is because the kernel values are not transferred directly, but computed from the plurality of position dependent weights and/or similarity features substantially reducing the amount of data for each element of the array of output data values. This can minimize the amount of data being stored and transferred by the neural network between the different network layers, which is especially important during the learning process on the basis of the mini-batch approach as the memory of the data processing apparatus (GPU) is currently the main bottleneck.
  • GPU data processing apparatus
  • the neural network is configured to generate a kernel of the plurality of position dependent kernels by adding the learned position independent kernels each weighted by the associated non-learned position dependent weights (i.e. similarity features).
  • This implementation form provides a very efficient representation of the plurality of position dependent kernels using a linear combination of position independent "base kernels”.
  • the plurality of position independent kernels are predetermined or learned, and wherein the neural network comprises at least one additional neural network layer or "conventional" pre-processing layer configured to generate the plurality of position dependent weights (i.e. similarity features) based on an original array of original input values of the neural network, wherein the original array of original input values of the neural network comprises the array of input values or another array of input values associated to the array of input values.
  • the original array of original input values can be the array of input data values or a different array.
  • the at least one additional neural network layer or "conventional" pre- processing layer can generate the plurality of position dependent weights (i.e. similarity features) using, for instance, bilateral filtering, semantic segmentation, per-instance object detection, and data importance indicators like ROI (region of interest).
  • the array of input data values and the array of output data values are two-dimensional arrays
  • the convolutional neural network layer is configured to generate the plurality of position dependent kernels w L (x, y, on the basis of the following equation: wherein F f (x, y) denotes the set of Nf position dependent weights (i.e. similarity features) and denotes the plurality of position independent kernels.
  • the neural network layer is a convolutional network layer or an aggregation network layer.
  • the array of input data values and the array of output data values are two-dimensional arrays, wherein the array input data values in(x, y, ) has C t different channels and wherein the neural network layer is a convolutional network layer configured to generate the array of output data values out x, y, c 0 ) on the basis of the following equations:
  • r denotes a size of each kernel of the plurality of position dependent kernels denotes a normalization factor.
  • the normalization factor can be set equal to 1 .
  • the array of input data values and the array of output data values are two-dimensional arrays, wherein the array input data values in(x, y) has only a single channel and wherein the neural network layer is a aggregation network layer configured to generate the array of output data values out(x, y) on the basis of the following equations:
  • r denotes a size of each kernel of the plurality of position dependent kernels w L (x, y, i,j) and W L (x, y) denotes a normalization factor.
  • W L (x, y) can be set equal to 1 .
  • the neural network layer is a correlation network layer configured to generate the array of output data values from the array of input data values and a further array of input data values by correlating the array of input data values with the further array of input data values and applying a position dependent kernel of the plurality of position dependent kernels, or by correlating the array of input data values with the further array of input data values and applying a position dependent kernel of the plurality of position dependent kernels associated to the array of input data values and a further position dependent kernel of a plurality of further position dependent kernels associated to the further array of input data values.
  • the array of input data values inl(x, y), the further array of input data values in2(x, y) and the plurality of position dependent kernels w L1 (x, y, i,fi are two-dimensional arrays and wherein the correlation neural network layer is configured to generate the array of output data values out(x, y) on the basis of the following equations:
  • W L (x, y) denotes a normalization factor.
  • W L (x, y) can be set equal to 1 .
  • the array of input data values inl ⁇ x, y), the further array of input data values in2(x, y), the plurality of position dependent kernels w L1 (x, y, and the plurality of further position dependent kernels w L2 (x, y, are two- dimensional arrays and wherein the correlation neural network layer is configured to generate the array of output data values out(x, y) on the basis of the following equations:
  • r denotes a size of each kernel of the plurality of position dependent kernels w L1 (x, y, and of each kernel of the plurality of further position dependent kernels w L2 (x, y, and W L12 (x, y) denotes a normalization factor.
  • the normalization factor can be set equal to 1 .
  • the neural network layer is configured to generate a respective output data value of the array of output data values by determining a respective input data value of a respective sub-array of input data values of the plurality of sub-arrays of input data values being associated with a maximum or minimum kernel value of a position dependent kernel and using the respective determined input data value as the respective output data value.
  • the invention relates to a corresponding data processing method comprising the step of generating by a neural network layer of a neural network from an array of input data values an array of output data values based on a plurality of position dependent kernels and a plurality of sub-arrays of the array of input values.
  • the method comprises the further step of generating the position dependent kernel of the plurality of position dependent kernels by an additional neural network layer of the neural network based on an original array of original input values of the neural network, wherein the original array of original input values of the neural network comprises the array of input values or another array of input values associated to the array of input values.
  • the step of generating the position dependent kernel of the plurality of position dependent kernels comprises generating the position dependent kernel of the plurality of position dependent kernels based on a plurality of position independent kernels and a plurality of position dependent weights.
  • the step of generating a kernel of the plurality of position dependent kernels comprises the step of adding, i.e. summing the position independent kernels weighted by the associated position dependent weights.
  • the step of generating the plurality of position dependent weights comprises the step of generating the plurality of position dependent weights by an additional neural network layer or a processing layer of the neural network based on an original array of original input values of the neural network, wherein the original array of original input values of the neural network comprises the array of input values or another array of input values associated to the array of input values.
  • the array of input data values and the array of output data values are two-dimensional arrays, and the step of generating a kernel of the plurality of position dependent kernels is based on the following
  • F f (x, y) denotes the plurality of N f position dependent weights (i.e. similarity features) and denotes the plurality of position independent kernels.
  • the neural network layer is a convolutional network layer or an aggregation network layer.
  • the array of input data values and the array of output data values are two-dimensional arrays and the neural network layer is a convolutional network layer, wherein the step of generating the array of output data values is based on of the following equations:
  • the neural network layer is an aggregation network layer and wherein the step of generating the array of output data values is based on the following equations:
  • the normalization factors can be set
  • the neural network layer is a correlation network layer and the step of generating the array of output data values comprises generating the array of output data values from the array of input data values and a further array of input data values (a) by correlating the array of input data values with the further array of input data values and applying a position dependent kernel of the plurality of position dependent kernels, or (b) by correlating the array of input data values with the further array of input data values and applying a position dependent kernel of the plurality of position dependent kernels associated to the array of input data values and a further position dependent kernel of a plurality of further position dependent kernels associated to the further array of input data values.
  • the array of input data values, the further array of input data values and the kernel are two-dimensional arrays and the step of generating the array of output data values by the correlation neural network layer is based on the following equations:
  • the normalization factors W L or W L12 can be set equal to 1 .
  • the step of generating an output data value of the array of output data valuesby the convolutional neural network layer comprises the steps of determining an input data value of a sub-array of input data values of the plurality of sub-arrays of input data values being associated with a maximum or minimum kernel value of a position dependent kernel and using the determined input value as the output value.
  • the invention relates to a computer program comprising program code for performing the method according to the second aspect, when executed on a processor or a computer.
  • the invention can be implemented in hardware and/or software or in any combination thereof.
  • Fig. 1 shows a schematic diagram illustrating a data processing apparatus based on a neural network according to an embodiment
  • Fig. 2 shows a schematic diagram illustrating a neural network provided by a data processing apparatus according to an embodiment
  • Fig. 3 shows a schematic diagram illustrating the concept of down-stepping or aggregation of data implemented in a data processing apparatus according to an embodiment
  • Fig. 4 shows a schematic diagram illustrating different aspects of a neural network provided by a data processing apparatus according to an embodiment
  • Fig. 5 shows a schematic diagram illustrating different aspects of a neural network provided by a data processing apparatus according to an embodiment
  • Fig. 6 shows a schematic diagram illustrating different processing steps of a data processing apparatus according to an embodiment
  • Fig. 7 shows a schematic diagram illustrating a neural network provided by a data processing apparatus according to an embodiment
  • Fig. 8 shows a schematic diagram illustrating different aspects of a neural network provided by a data processing apparatus according to an embodiment
  • Fig. 9 shows a schematic diagram illustrating different processing steps of a data processing apparatus according to an embodiment
  • Fig. 10 shows a flow diagram illustrating a neural network data processing method according to an embodiment.
  • Figure 1 shows a schematic diagram illustrating a data processing apparatus 100 according to an embodiment configured to process data on the basis of a neural network.
  • the data processing apparatus 100 shown in figure 1 comprises a processor 101 .
  • the data processing apparatus 100 can be implemented as a distributed data processing apparatus 100 comprising more than the one processor 101 shown in figure 1 .
  • the processor 101 of the data processing apparatus 100 is configured to provide a neural network 1 10.
  • the neural network 1 10 comprises a neural network layer being configured to generate from an array of input data values an array of output data values based on a plurality of sub-arrays of the array of input data values and a plurality of position dependent kernels comprising a plurality of kernel values or kernel weights.
  • the data processing apparatus 100 can further comprise a memory 103 for storing and/or retrieving the input data values, the output data values and/or the kernel values.
  • Each position dependent kernel comprises a plurality of kernel values or kernel weights. For a respective position or element of the array of input data values a respective kernel is applied to a respective sub-array of the array of input data values to generate a single output data value.
  • a "position dependent kernel” as used herein means a kernel whose kernel weights depend on the respective position of the sub-array of the array of input data values to which the kernel is applied. In other words, for a first kernel applied to a first sub-array of the plurality of input data values the kernel values can differ from the kernel values of a second kernel applied to a second sub-array of the plurality of input data values forming a different sub-array of the same array of input values.
  • the position could be a spatial position defined, for instance, by two spatial coordinates (x,y).
  • the position could be a temporal position defined, for instance, by a time coordinate (t).
  • the array of input data values can be one-dimensional (i.e. a vector, e.g. audio or other e.g. temporal sequence), two-dimensional (i.e. a matrix, e.g. an image or other temporal or spatial sequence), or N-dimensional (e.g. any kind of N-dimensional feature array, e.g. provided by a conventional pre-processing or feature extraction and/or by other layers of the neural network 1 10).
  • the array of input data values can have one or more channels, e.g. for an RGB image one R-channel, one G-channel and one B-channel, or for a black/white image only one grey-scale or intensity channel.
  • channel can refer to any "feature", e.g.
  • the array of input data values can comprise, for instance, two-dimensional RGB or grey scale image or video data representing at least a part of an image, or a one- dimensional audio signal.
  • the array of input data values can be, for instance, an array of similarity features generated by previous layers of the neural network on the basis of an initial, e.g. original array of input data values, e.g. by means of a feature extraction, as will be described in more detail further below.
  • the neural network layer 120 can be
  • the neural network layer 120 can be implemented as a convolution (or convolutional) layer configured to "mix" all channels of the array of input data values. For instance, in case the array of input data values is an RGB image, i.e.
  • the position dependent kernels may be channel-specific or common for all channels.
  • the position dependent kernels are generally multi-channel kernels.
  • the neural network layer can be implemented as a correlation layer providing a combination of aggregation or convolution (input image and weighted kernel) and an additional image (i.e. correlation of two, e.g. of the same or at the same position, sub- arrays in the two images with each other and additional application of the position dependent kernel on the correlation result).
  • the position dependent kernels may be channel-specific or common for all channels.
  • FIG. 2 shows a schematic diagram illustrating elements of the neural network 1 10 provided by the data processing apparatus 100 according to an embodiment.
  • the neural network layer 120 is implemented as a weighted aggregation layer 120.
  • the neural network layer 120 can be implemented as a convolution network layer 120 (also referred to as convolutional network layer 120) or as a correlation network layer 120, as will be described in more detail further below.
  • the aggregation layer 120 is configured to generate a two-dimensional array of output data values out(x, y) 121 on the basis of a respective sub-array of the two-dimensional array of input data values in(x, y) 1 17 and a plurality of position dependent kernels 1 18 comprising a plurality of kernel values or kernel weights.
  • the weighted aggregation layer 120 of the neural network 1 10 shown in figure 2 is configured to generate the array of output data values out(x, y) 121 on the basis of the plurality of sub-arrays of the two-dimensional array of input data values in(x, y) 1 17 and the plurality of position dependent kernels 1 18 comprising the kernel values w L (x, y, i,j) using the following equation: wherein r denotes a size of each kernel of the plurality of position dependent kernels 1 18 (in this example, each kernel and each sub-array of the array of input values has
  • the normalization factor can be omitted, i.e. set to one.
  • the normalization factor can be omitted.
  • the normalization factor allows to keep the mean value or DC component. This can be advantageous, when the weighted aggregation layer 120 is used to aggregate stereo matching costs of a stereo image, because the normalization is beneficial for making the output values for different sub-arrays of the array of input data values comparable. This is usually not necessary in the case of the convolutional network layer 120.
  • the above equations for a two-dimensional input array and a kernel having a quadratic shape can be easily adapted to the case of an array of input values 1 17 having one dimension or more than two dimensions and/or a kernel having a rectangular shape, e.g. a non-square rectangular shape with different horizontal and vertical dimensions.
  • the neural network layer 120 can be configured to generate the array of output data values out x, y, c 0 ) 121 having one or more channels on the basis of the plurality of sub- arrays of the two-dimensional array of input data values in(x, y, c ) 1 17 in the different channels and the plurality of position dependent kernels 1 18 comprising the kernel values w L (x, y, c 0 , Ci, i,j) using the following equation: wherein C t denotes the number of channels of the array of input data values
  • the neural network layer 120 is configured to generate the array of output data values 121 with a smaller size than the array of input data values 1 17.
  • the neural network 1 10 is configured to perform a down-step operation on the basis of the plurality of position dependent kernels 1 18.
  • Figure 3 illustrates a down step operation provided by the neural network 1 10 of the data processing apparatus 100 according to an embodiment.
  • Using a down step operation allows increasing the receptive field, enables processing the data with a cascade of smaller filters as compared with a single layer with a kernel covering an equal receptive field, and also enables the neural network 1 10 to better analyze the data by finding more sophisticated relationships among the data and adding more non-linearities to the processing chain by separating each convolution layer with a non-linear element like a sigmoid or a Rectified Linear Unit (ReLU).
  • ReLU Rectified Linear Unit
  • the neural network layer 120 can combine the input data values to produce the array of output data values with a reduced resolution. This can be achieved by convolving the array of input data values 1 17 with the position dependent kernels 1 18 with a stride S greater than 1 .
  • the stride S specifies the spacing between neighboring input spatial positions for which convolutions are computed. If the stride S is equal to 1 , the convolution is performed for each spatial position. If the stride S is greater than 1 , the neural network layer 120 is configured to perform a convolution for every S-th spatial position of the array of input data values 1 17, thereby reducing the output resolution (i.e. the dimensions of the array of output data values 121 by a factor of S for each spatial dimension).
  • the horizontal and the vertical stride may be the same or different.
  • the neural network layer 120 combines the array of input data values 1 17 from the spatial area of size (2r+1 )x(2r+1 ) to produce a respective output data value of the array of output data values 121 .
  • the input data values 1 17 can be aggregated to pack information present in a larger spatial area into one single spatial position.
  • the neural network 1 10 comprises one or more preceding layers 1 15 preceding the neural network layer 120 and one or more following layers 125 following the neural network layer 120.
  • the neural network layer 120 could be the first and/or the last data processing layer of the neural network 1 10, i.e. in an embodiment there could be no preceding layers 1 15 and/or no following layers 125.
  • the one or more preceding layers 1 15 can be further neural network layers and/or "conventional" pre-processing layers, such as a feature extraction layer.
  • the one or more following layers 125 can be further neural network layers, such as a deconvolutional layer, and/or "conventional" post-processing layers.
  • one or more of the preceding layers 1 15 can be configured to provide, i.e. to generate the plurality of position dependent kernels 1 18 (see the bottom signal path of the preceding layers 1 15 from guiding data 1 13 to the position dependent kernels 1 18 in Fig. 2).
  • the one or more layers of the preceding layers 1 15 can generate the plurality of position dependent kernels 1 18 on the basis of an original array 1 1 1 of original input data values, e.g. an original image as 2D example.
  • the original array 1 1 1 of original input data values can be an array of input data 1 1 1 being the original input of the neural network 1 10,.
  • the one or more preceding layers 1 15 could be configured to generate just the plurality of position dependent kernels 1 18 on the basis of the original input data 1 1 1 of the neural network 1 10 and to provide the original input data 1 1 1 of the neural network 1 10 as the array of input data values 1 17 to the neural network layer 120 (no preceding layers in the top signal path of the preceding layers 1 15 from the original input array 1 1 1 to the input array 1 17 according to an embodiment of the neural network layer 120, see Fig. 2).
  • the original array 1 1 1 may form the input array 1 17.
  • the one or more preceding layers 1 15 of the neural network 1 10 are configured to generate the plurality of position dependent kernels 1 18 on the basis of an array of guiding data 1 13.
  • a more detailed view of the processing steps of the neural network 1 10 of the data processing apparatus 100 according to such an embodiment is shown in figure 4 for the exemplary case of two- dimensional input and output arrays.
  • the array of guiding data 1 13 is used by the one or more preceding layers 1 15 of the neural network 1 10 to generate the plurality of position dependent kernels w L (x, y) 1 18 on the basis of the array of guiding data g(x, y) 1 13.
  • the neural network layer 120 is configured to generate the two-dimensional array of output data values out(x, y) 121 on the basis of the two-dimensional array of input data values in(x, y) 1 17 and the plurality of position dependent kernels w L (x, y) 1 18, which, in turn, are based on the array of guiding data g x, y 1 13.
  • the one or more preceding layers 1 15 of the neural network 1 10 are neural network layers configured to learn the plurality of position dependent kernels w i,( y) 1 18 on the basis of the array of guiding data gO, y) 1 13.
  • the one or more preceding layers 1 15 of the neural network 1 10 are pre-processing layers configured to generate the plurality of position dependent kernels w L (x, y) 1 18 on the basis of the array of guiding data 1 13 using one or more pre-processing schemes, such as feature extraction.
  • the one or more preceding layers 1 15 of the neural network 1 10 are configured to generate the plurality of position dependent kernels w L (x, y) 1 18 on the basis of the array of guiding data g(x, y) 1 13 in a way analogous to bilateral filtering, as illustrated in figure 5.
  • Bilateral filtering is known in the field of image processing for performing a weighted aggregation of input data, while decreasing the influence of some input values and amplifying the influence of other input values on the aggregation result [M. Elad, "On the origin of bilateral filter and ways to improve it", IEEE Transactions on Image Processing, vol. 1 1 , no. 10, pp. 1 141 -1 151 , October 2002].
  • the weights 518 utilized for aggregating the array of input data values 517 adapt to input data 517 using the guiding image data g 513 which provides additional information to control the aggregation process.
  • the array of guiding image data 513 can be equal to the array of input data values for generating the array of output data values 521 by the layer 520 on the basis of weights 518.
  • the bilateral filter weights 518 take into consideration the distance of the value within the kernel from the center of the kernel and, additionally, the similarity of the data values with data in the center of the kernel, as mathematically described by the following equation: wherein the normalization factor is based on the following equation:
  • the bilateral filter weights 518 are defined by the following equation: wherein d . , . ) denotes a distance function.
  • Figure 6 shows a schematic diagram highlighting the main processing stage 601 of the data processing apparatus 100 according to an embodiment, for instance, the data processing apparatus 100 providing the neural network 1 10 shown in figure 2.
  • the neural network 1 10 in a first processing step 603 can generate the plurality of position dependent kernels w L (x, y) 1 18 on the basis of the array of guiding data g(x, y) 1 13.
  • the neural network 1 10 in a second processing step 605 the neural network 1 10 can generate the array of output data values out(x, y) 121 on the basis of the array of input data values in(x, y) 1 17 and the plurality of position dependent kernels w L (x, y, 1 18.
  • Figure 7 shows a schematic diagram illustrating the neural network 1 10 provided by the data processing apparatus 100 according to a further embodiment.
  • the neural network 1 10 is configured to generate the plurality of position dependent kernels based on a plurality of position independent kernels 1 19b (shown in figure 8) and a plurality of position dependent weights F f (x, y) 1 19a (also referred to as similarity features 1 19).
  • the similarity features 1 19a are obtained based on the guiding data 1 13 and could indicate higher-level knowledge about the input data 1 1 1 , including e.g.
  • the neural network 1 10 of figure 7 is configured to generate the plurality of position dependent kernels 1 18 by adding the position independent kernels 1 19b each weighted by the associated position dependent weights F f (x, y) 1 19a.
  • the plurality of position independent kernels 1 19b can be any one of the plurality of position independent kernels 1 19b.
  • the neural network 1 10 can comprise one or more preceding layers 1 15, which precede the neural network layer 120 and which can be implemented as an additional neural network layer or a pre-processing layer.
  • one or more layers of the preceding layers 1 15 are configured to generate the plurality of position dependent weights F f (x, y) 1 19a on the basis of an original array of original input data values or the guiding data 1 13.
  • the original array of original input data values of the neural network 1 10 can comprise the array of input data values 1 17 to be processed by the neural network layer 120 or another array of input data values 1 1 1 associated to the array of input data values 1 17, for instance, the initial or original array of input data 1 1 1 .
  • the array of input data values in(x, y) 1 17 and the array of output data values out(x, y) 121 are two-dimensional arrays and the neural network layer 120 is configured to generate a respective kernel of the plurality of position dependent kernels w L (x, y, i,j) 1 18 on the basis of the following equation: wherein F f (x, y) denotes the set of N f position dependent weights (or similarity features) 1 19a and K f denotes the plurality of position independent kernels 1 19b, as also illustrated in figure 8.
  • Figure 9 shows a schematic diagram highlighting the main processing stage 901 of the data processing apparatus 100 according to an embodiment, for instance, the data processing apparatus 100 providing the neural network 1 10 illustrated in figures 7 and 8.
  • the neural network 1 10 in a first processing step 903 can generate the plurality of position dependent weights or similarity features F f (x, y) 1 19a on the basis of the array of guiding data g(x, y) 1 13.
  • the neural network 1 10 can generate the plurality of position dependent kernels w L (x, y, 1 18 on the basis of the plurality of position dependent weights or similarity features Ff ⁇ x, y) 1 19a and the plurality of position independent kernels K f 1 19b.
  • the neural network layer 120 can generate the array of output data values out(x, y) 121 on the basis of the array of input data values in(x, y) 1 17 and the plurality of position dependent kernels w L (x, y, i,j) 1 18.
  • the neural network layer 120 of the neural network 1 10 can be implemented in the form of a correlation network layer 120 configured to generate the array of output data values 121 from the array of input data values 1 17 and a further array of input data values by correlating the array of input data values 1 17 with the further array of input data values and by applying a respective position dependent kernel of the plurality of position dependent kernels 1 18 to a respective sub-array of the array of input data values 1 17 and a corresponding sub-array of the further array of input data values.
  • the correlation neural network layer 120 can be configured to generate the array of output data values out(x, y) 121 on the basis of the following equation: wherein inl(x - i, y - ; ' ) denotes the array of input data values 1 17, in2(x - i, y - ; ' ) denotes the further array of input data values, w L1 (x, y, i,fi denotes the plurality of position dependent kernels 1 18 and r denotes the size of each kernel of the plurality of position dependent kernels 1 18 (in this example, each kernel has (2r+1 ) * (2r+1 ) kernel values).
  • the output data values 121 can be normalized using the following normalization factor:
  • the normalization factor can be omitted, i.e. set to one.
  • the above equations for a two-dimensional input array and a kernel having a quadratic shape can be easily adapted to the case of an array of input values 1 17 having one dimension or more than two dimensions and/or a kernel having a non-square rectangular shape, i.e. different horizontal and vertical dimensions.
  • the correlation network layer 120 is configured to generate the array of output data values 121 from the array of input data values 1 17 and the further array of input data values by correlating the array of input data values 1 17 with the further array of input data values and by applying a respective position dependent kernel of the plurality of position dependent kernels 1 18 associated to the array of input data values 1 17 and a further position dependent kernel of a plurality of further position dependent kernels associated to the further array of input data values to a respective sub-array of the array of input data values 1 17 and a corresponding sub-array of the further array of input data values.
  • the correlation neural network layer 120 can be configured to generate the array of output data values out(x, y) 121 on the basis of the following equation:
  • inl(x - i, y - ; ' ) denotes the array of input data values 1 17, in2(x - i, y - j) denotes the further array of input data values, w L1 (x, y, i,fi denotes the plurality of position dependent kernels 1 18, w L2 (x, y, denotes the plurality of further position dependent kernels associated to the further array of input data values.
  • the output data values 121 can be normalized using the following normalization factor:
  • the neural network layer 120 is configured to process the array of input data values 1 17 on the basis of the plurality of position dependent kernels 1 18 using a maximum or minimum pooling scheme. More specifically, in such an embodiment, the neural network layer 120 is configured to generate a respective output data value of the array of output data values 121 by determining a respective input data value of a respective sub-array of the plurality of sub-arrays of the array of input data values 1 17 being associated with a maximum or minimum kernel value of a respective position dependent kernel of the plurality of position dependent kernels 1 18 and using the respective determined input data value as the respective output data value.
  • the neural network layer 120 according to one of the
  • the neural network 1 10 can use e.g. object features derived from semantic segmentation as the guiding data 1 13 in order to determine the object borders in the scene and guide the aggregation process of the input stereo matching cost producing the aggregated stereo matching cost as an output.
  • Figure 10 shows a flow diagram illustrating a data processing method 1000 based on a neural network 1 10 according to an embodiment.
  • the data processing method 1000 can be performed by the data processing apparatus 100 shown in figure 1 and its different embodiments.
  • the data processing method 1000 comprises the step 1001 of generating by the neural network layer 120 of the neural network 1 10 from the array of input data values 1 17 the array of output data values 121 based on the plurality of position dependent kernels 1 18 and the plurality of sub-arrays of the array of input data values 1 17.
  • Embodiments of the data processing methods may be implemented and/or performed by one or more processors as described above.
  • a first kernel is considered different to a second kernel if a kernel value of the array of kernel values of the first kernel at at least one position (or of at least one element) of the first kernel is different from a kernel value of the array of kernel values of the second kernel at the same position (or of the same element) of the kernel.
  • the different sub-arrays of the array of input values have all the same size and dimension. Accordingly, the different kernels typically have the same size and dimension.
  • a first sub-array of the array of input values is considered different to a second sub-array of the array of input values if the first sub-array of the array of input values comprises at least one element of the array of input values which is not comprised by the second sub- array of the array of input values.
  • the different sub-arrays of the array of input values differ at least by one column or row of elements of the array of input values.
  • the different sub-arrays may partially overlap or not overlap as shown in Fig. 3.
  • Embodiments of the proposed guided aggregation can be applied for guided feature maps down-scaling.
  • input position dependent kernels as the guiding data
  • input values which are features of the feature map are grouped to form input data sub-arrays of the input data array and can be further aggregated in a controlled way producing an output feature value representative for the whole sub-array.
  • guiding data represents information about object or region borders, obtained by e.g. color-based segmentation, semantic segmentation using preceding neural network layers or an edge map of a texture image corresponding to processed feature map.
  • Embodiments of the proposed guided convolution can be applied for switchable feature extraction.
  • Input values which are features of the feature map are convolved with adaptable feature extraction filters which are formed from the input guiding data in form of position dependent kernels.
  • each selected area of the input feature map can be processed with feature extraction filters producing only features desired for these regions.
  • guiding data in form of similarity features represents information about object or region borders, obtained by e.g. color-based segmentation, semantic segmentation using preceding neural network layers, an edge map of a texture image corresponding to processed feature map or a ROI (region of interest) binary map.
  • Embodiments of the proposed guided correlation can be applied for guided correlation of input feature maps.
  • input position dependent kernels as the guiding data
  • input values which are features of the two or more feature maps are correlated together in a controlled way enabling amplification or attenuation of some features within a correlation region. This way, features that correspond to some other objects/regions in the feature map can be excluded or taken with smaller impact to compute the result. Also, some of the features characteristic for a selected region can be amplified.
  • guiding data represents information about object or region borders, obtained by e.g. color-based segmentation, semantic segmentation using preceding neural network layers or an edge map of a texture image corresponding to processed feature map or a ROI (region of interest) binary map.
  • normalization is advantageous if the output values obtained for different spatial positions are going to be compared to each other per-value, without any intermediate step. As a result, preservation of the mean (DC) component is appreciated. If such comparison is not performed, normalization is not required but just increases complexity. Additionally, one can omit normalization in order to simplify the computations and compute only an approximate result.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a data processing apparatus (100) comprising a processor (101) configured to provide a neural network (110), wherein the neural network (110) comprises a neural network layer (120) being configured to generate from an array of input data values (117) an array of output data values (121) based on a plurality of position dependent kernels (118) and a plurality of sub-arrays of the array of input data values (117). Moreover, the invention relates to a corresponding data processing method.

Description

DESCRIPTION
Neural network data processing apparatus and method TECHNICAL FIELD
Generally, the present invention relates to the field of machine learning or deep learning based on neural networks. More specifically, the present invention relates to a neural network data processing apparatus and method, in particular for processing data in the fields of audio processing, computer vision, image or video processing, classification, detection and/or recognition.
BACKGROUND Weighted aggregation, which is commonly used in many signal processing applications, such as image processing methods for image quality improvement, depth or disparity estimation and many other applications [Kaiming He, Jian Sun, Xiaoou Tang, "Guided Image Filtering", ECCV 2010], is a process in which input data is being combined to pack information present in a larger spatial area into one single spatial position, with additional input in form of aggregation weights that control the influence of each input data value on the result.
In deep-learning, a common approach recently used in many application fields is the utilization of convolutional neural networks. Generally, a specific part of such convolutional neural networks is at least one convolution (or convolutional) layer which performs a convolution of input data values with a learned kernel K producing one output data value per convolution kernel for each output position [J. Long, E. Shelhamer, T. Darrell, "Fully Convolutional Networks for Semantic Segmentation", CVPR 2015]. For the two- dimensional case used, for instance, in image processing the convolution using the learned kernel K can be expressed mathematically as follows: wherein out(x, y) denotes the array of output data values, in(x - i, y - ;') denotes a sub- array of an array of input data values and K(i,j) denotes the kernel comprising an array of kernel weights or kernel values of size (2r+1 ) x(2r+1 ). B denotes an optional learned bias term, which can be added for obtaining each output data value. The weights of the kernel K are the same for the whole array of input data values in(x, y) and are generally learned during a learning phase of the neural network which, in case of 1 st order methods, consists of iteratively back-propagating the gradients of the neural network output back to the input layers and updating the weights of all the network layers by a partial derivative computed in this way.
SUMMARY
It is an object of the invention to provide an improved data processing apparatus and method based on neural networks.
The foregoing and other objects are achieved by the subject matter of the independent claims. Further implementation forms are apparent from the dependent claims, the description and the figures.
Generally, embodiments of the invention provide a new approach for weighted
aggregation of data for neural networks that is implemented into a neural network as a new type of neural network layer. The neural network layer can compute aggregated data using individual aggregation weights that are learned for each individual spatial position.
Aggregation weights can be computed as a function of similarity features and learned weight kernels, resulting in individual aggregation weights for each output spatial position.
In this way a variety of sophisticated position dependent or position adaptive kernels learned by the neural network can be utilized for better adaptation of the aggregation weights to input data.
More specifically, according to a first aspect the invention relates to a data processing apparatus comprising one or more processors configured to provide a neural network. The data to be processed by the data processing apparatus can be, for instance, two- dimensional image or video data or one-dimensional audio data.
The neural network provided by the one or more processors of the data processing apparatus comprises a neural network layer being configured to process an array of input data values, such as a two-dimensional array of input data values in(x, y), into an array of output data values, such as a two-dimensional array of output data values out(x, y). The neural network layer can be a first layer or an intermediate layer of the neural network. The array of input data values can be one-dimensional (i.e. a vector, e.g. audio or other e.g. temporal sequence), two-dimensional (i.e. a matrix, e.g. an image or other temporal or spatial sequence), or N-dimensional (e.g. any kind of N-dimensional feature array, e.g. provided by a conventional pre-processing or feature extraction and/or by other layers of the neural network).
The array of input data values can have one or more channels, e.g. for an RGB image one R-channel, one G-channel and one B-channel, or for a black/white image only one grey-scale or intensity channel. The term "channel" can refer to any "feature", e.g.
features obtained from conventional pre-processing or feature extraction or from other neural networks or neural network layers of the same neural network. The array of input data values can comprise, for instance, two-dimensional RGB or grey scale image or video data representing at least a part of an image, or a one-dimensional audio signal. In case the neural network layer is implemented as an intermediate layer of the neural network, the array of input data values can be, for instance, any kind of array of features generated by previous layers of the neural network on the basis of an initial, e.g. original array of input data values, e.g. by means of a feature extraction.
The neural network layer is configured to generate from the array of input data values the array of output data values on the basis of a plurality of position dependent, i.e. spatially variable kernels and a plurality of different sub-arrays at different positions of the array of input data values. Each kernel comprises a plurality of kernel values or kernel weights. A respective kernel is applied to a respective sub-array of the array of input data values to generate a single output data value.
A "position dependent kernel" as used herein means a kernel whose kernel weights depend on the respective position, e.g. (x,y) for two-dimensional arrays, of the sub-array of input data values. In other words, for a first kernel the kernel values applied to a first sub-array of the array of input data values can differ from the kernel values of a second kernel applied to a second sub-array of the array of input data values. In a two- dimensional array the position could be a spatial position defined, for instance, by two spatial coordinates. In a one-dimensional array the position could be a temporal position defined, for instance, by a time coordinate. Thus, an improved data processing apparatus based on neural networks is provided. The data processing apparatus allows to aggregate the input data in a way that can better reflect mutual data similarity, i.e. the resultant output data value is more strongly influenced by input data values that are closer and more similar to input data in the center position of the kernel. Moreover, the data processing apparatus allows adapting the kernel weights for different spatial positions of the array of input data values. This, in turn, allows, for instance, minimizing the influence of some of the input data values on the result, for instance the input data values that are associated with another part of the scene (as determined by semantic segmentation) or a different object that is being analysed.
In a further implementation form of the first aspect, the neural network comprises at least one additional network layer configured to generate the plurality of position dependent kernels on the basis of an original array of original input values of the neural network, wherein the original array of original input values of the neural network comprises the array of input values or another array of input values associated to the array of input values. The original array of original input values can be the array of input data values or a different array.
In a further implementation form of the first aspect, the neural network is configured to generate the plurality of position dependent kernels based on a plurality of learned position independent kernels and a plurality of position dependent weights. Generally, the position independent kernels can be learned by the neural network and the position dependent weights or similarity features can be computed, for instance, by a further preceding layer of the neural network. This implementation form allows minimizing the amount of data being transferred to the neural network layer in order to obtain the kernel values. This is because the kernel values are not transferred directly, but computed from the plurality of position dependent weights and/or similarity features substantially reducing the amount of data for each element of the array of output data values. This can minimize the amount of data being stored and transferred by the neural network between the different network layers, which is especially important during the learning process on the basis of the mini-batch approach as the memory of the data processing apparatus (GPU) is currently the main bottleneck.
In a further implementation form of the first aspect, the neural network is configured to generate a kernel of the plurality of position dependent kernels by adding the learned position independent kernels each weighted by the associated non-learned position dependent weights (i.e. similarity features). This implementation form provides a very efficient representation of the plurality of position dependent kernels using a linear combination of position independent "base kernels".
In a further implementation form of the first aspect, the plurality of position independent kernels are predetermined or learned, and wherein the neural network comprises at least one additional neural network layer or "conventional" pre-processing layer configured to generate the plurality of position dependent weights (i.e. similarity features) based on an original array of original input values of the neural network, wherein the original array of original input values of the neural network comprises the array of input values or another array of input values associated to the array of input values. The original array of original input values can be the array of input data values or a different array. In an
implementation form the at least one additional neural network layer or "conventional" pre- processing layer can generate the plurality of position dependent weights (i.e. similarity features) using, for instance, bilateral filtering, semantic segmentation, per-instance object detection, and data importance indicators like ROI (region of interest).
In a further implementation form of the first aspect, the array of input data values and the array of output data values are two-dimensional arrays, and the convolutional neural network layer is configured to generate the plurality of position dependent kernels wL(x, y, on the basis of the following equation: wherein Ff (x, y) denotes the set of Nf position dependent weights (i.e. similarity features) and denotes the plurality of position independent kernels.
In a further implementation form of the first aspect, the neural network layer is a convolutional network layer or an aggregation network layer.
In a further implementation form of the first aspect, the array of input data values and the array of output data values are two-dimensional arrays, wherein the array input data values in(x, y, ) has Ct different channels and wherein the neural network layer is a convolutional network layer configured to generate the array of output data values out x, y, c0) on the basis of the following equations:
wherein r denotes a size of each kernel of the plurality of position dependent kernels denotes a normalization factor. In an implementation form the normalization factor can be set equal to 1 .
In a further implementation form of the first aspect, the array of input data values and the array of output data values are two-dimensional arrays, wherein the array input data values in(x, y) has only a single channel and wherein the neural network layer is a aggregation network layer configured to generate the array of output data values out(x, y) on the basis of the following equations:
wherein r denotes a size of each kernel of the plurality of position dependent kernels wL(x, y, i,j) and WL(x, y) denotes a normalization factor. In an implementation form the normalization factor WL(x, y) can be set equal to 1 .
In a further implementation form of the first aspect, the neural network layer is a correlation network layer configured to generate the array of output data values from the array of input data values and a further array of input data values by correlating the array of input data values with the further array of input data values and applying a position dependent kernel of the plurality of position dependent kernels, or by correlating the array of input data values with the further array of input data values and applying a position dependent kernel of the plurality of position dependent kernels associated to the array of input data values and a further position dependent kernel of a plurality of further position dependent kernels associated to the further array of input data values. In a further implementation form of the first aspect, the array of input data values inl(x, y), the further array of input data values in2(x, y) and the plurality of position dependent kernels wL1(x, y, i,fi are two-dimensional arrays and wherein the correlation neural network layer is configured to generate the array of output data values out(x, y) on the basis of the following equations:
wherein r denotes a size of each kernel of the plurality of position dependent kernels wL{x, y, i,j) and WL(x, y) denotes a normalization factor. In an implementation form the normalization factor WL(x, y) can be set equal to 1 .
In a further implementation form of the first aspect, the array of input data values inl{x, y), the further array of input data values in2(x, y), the plurality of position dependent kernels wL1(x, y, and the plurality of further position dependent kernels wL2 (x, y, are two- dimensional arrays and wherein the correlation neural network layer is configured to generate the array of output data values out(x, y) on the basis of the following equations:
wherein r denotes a size of each kernel of the plurality of position dependent kernels wL1(x, y, and of each kernel of the plurality of further position dependent kernels wL2 (x, y, and WL12 (x, y) denotes a normalization factor. In an implementation form the normalization factor can be set equal to 1 .
In a further implementation form of the first aspect, the neural network layer is configured to generate a respective output data value of the array of output data values by determining a respective input data value of a respective sub-array of input data values of the plurality of sub-arrays of input data values being associated with a maximum or minimum kernel value of a position dependent kernel and using the respective determined input data value as the respective output data value.
According to a second aspect the invention relates to a corresponding data processing method comprising the step of generating by a neural network layer of a neural network from an array of input data values an array of output data values based on a plurality of position dependent kernels and a plurality of sub-arrays of the array of input values.
In a further implementation form of the second aspect, the method comprises the further step of generating the position dependent kernel of the plurality of position dependent kernels by an additional neural network layer of the neural network based on an original array of original input values of the neural network, wherein the original array of original input values of the neural network comprises the array of input values or another array of input values associated to the array of input values.
In a further implementation form of the second aspect, the step of generating the position dependent kernel of the plurality of position dependent kernels comprises generating the position dependent kernel of the plurality of position dependent kernels based on a plurality of position independent kernels and a plurality of position dependent weights.
In a further implementation form of the second aspect, the step of generating a kernel of the plurality of position dependent kernels comprises the step of adding, i.e. summing the position independent kernels weighted by the associated position dependent weights.
In a further implementation form of the second aspect, the plurality of position
independent kernels are predetermined or learned and the step of generating the plurality of position dependent weights comprises the step of generating the plurality of position dependent weights by an additional neural network layer or a processing layer of the neural network based on an original array of original input values of the neural network, wherein the original array of original input values of the neural network comprises the array of input values or another array of input values associated to the array of input values. In a further implementation form of the second aspect, the array of input data values and the array of output data values are two-dimensional arrays, and the step of generating a kernel of the plurality of position dependent kernels is based on the following
equation:
wherein Ff (x, y) denotes the plurality of Nf position dependent weights (i.e. similarity features) and denotes the plurality of position independent kernels.
In a further implementation form of the second aspect, the neural network layer is a convolutional network layer or an aggregation network layer.
In a further implementation form of the second aspect, the array of input data values and the array of output data values are two-dimensional arrays and the neural network layer is a convolutional network layer, wherein the step of generating the array of output data values is based on of the following equations:
or, wherein the neural network layer is an aggregation network layer and wherein the step of generating the array of output data values is based on the following equations:
In implementation forms, the the normalization factors can be set
equal to 1 . In a further implementation form of the second aspect, the neural network layer is a correlation network layer and the step of generating the array of output data values comprises generating the array of output data values from the array of input data values and a further array of input data values (a) by correlating the array of input data values with the further array of input data values and applying a position dependent kernel of the plurality of position dependent kernels, or (b) by correlating the array of input data values with the further array of input data values and applying a position dependent kernel of the plurality of position dependent kernels associated to the array of input data values and a further position dependent kernel of a plurality of further position dependent kernels associated to the further array of input data values.
In a further implementation form of the second aspect, the array of input data values, the further array of input data values and the kernel are two-dimensional arrays and the step of generating the array of output data values by the correlation neural network layer is based on the following equations:
or
In any of the above implementation forms, the normalization factors WL or WL12 can be set equal to 1 .
In a further implementation form of the second aspect, the step of generating an output data value of the array of output data valuesby the convolutional neural network layer comprises the steps of determining an input data value of a sub-array of input data values of the plurality of sub-arrays of input data values being associated with a maximum or minimum kernel value of a position dependent kernel and using the determined input value as the output value.
According to a third aspect the invention relates to a computer program comprising program code for performing the method according to the second aspect, when executed on a processor or a computer.
The invention can be implemented in hardware and/or software or in any combination thereof.
BRIEF DESCRIPTION OF THE DRAWINGS
Further embodiments of the invention will be described with respect to the following figures, wherein:
Fig. 1 shows a schematic diagram illustrating a data processing apparatus based on a neural network according to an embodiment;
Fig. 2 shows a schematic diagram illustrating a neural network provided by a data processing apparatus according to an embodiment;
Fig. 3 shows a schematic diagram illustrating the concept of down-stepping or aggregation of data implemented in a data processing apparatus according to an embodiment;
Fig. 4 shows a schematic diagram illustrating different aspects of a neural network provided by a data processing apparatus according to an embodiment;
Fig. 5 shows a schematic diagram illustrating different aspects of a neural network provided by a data processing apparatus according to an embodiment;
Fig. 6 shows a schematic diagram illustrating different processing steps of a data processing apparatus according to an embodiment; Fig. 7 shows a schematic diagram illustrating a neural network provided by a data processing apparatus according to an embodiment;
Fig. 8 shows a schematic diagram illustrating different aspects of a neural network provided by a data processing apparatus according to an embodiment;
Fig. 9 shows a schematic diagram illustrating different processing steps of a data processing apparatus according to an embodiment; and Fig. 10 shows a flow diagram illustrating a neural network data processing method according to an embodiment.
In the various figures, identical reference signs will be used for identical or at least functionally equivalent features.
DETAILED DESCRIPTION OF EMBODIMENTS
In the following description, reference is made to the accompanying drawings, which form part of the disclosure, and in which are shown, by way of illustration, specific aspects in which the present invention may be placed. It is understood that other aspects may be utilized and structural or logical changes may be made without departing from the scope of the present invention. The following detailed description, therefore, is not to be taken in a limiting sense, as the scope of the present invention is defined by the appended claims. For instance, it is understood that a disclosure in connection with a described method may also hold true for a corresponding device or system configured to perform the method and vice versa. For example, if a specific method step is described, a corresponding device may include a unit to perform the described method step, even if such unit is not explicitly described or illustrated in the figures. Further, it is understood that the features of the various exemplary aspects described herein may be combined with each other, unless specifically noted otherwise.
Figure 1 shows a schematic diagram illustrating a data processing apparatus 100 according to an embodiment configured to process data on the basis of a neural network. To this end, the data processing apparatus 100 shown in figure 1 comprises a processor 101 . In an embodiment, the data processing apparatus 100 can be implemented as a distributed data processing apparatus 100 comprising more than the one processor 101 shown in figure 1 . The processor 101 of the data processing apparatus 100 is configured to provide a neural network 1 10. As will be described in more detail further below, the neural network 1 10 comprises a neural network layer being configured to generate from an array of input data values an array of output data values based on a plurality of sub-arrays of the array of input data values and a plurality of position dependent kernels comprising a plurality of kernel values or kernel weights. As shown in figure 1 , the data processing apparatus 100 can further comprise a memory 103 for storing and/or retrieving the input data values, the output data values and/or the kernel values.
Each position dependent kernel comprises a plurality of kernel values or kernel weights. For a respective position or element of the array of input data values a respective kernel is applied to a respective sub-array of the array of input data values to generate a single output data value. A "position dependent kernel" as used herein means a kernel whose kernel weights depend on the respective position of the sub-array of the array of input data values to which the kernel is applied. In other words, for a first kernel applied to a first sub-array of the plurality of input data values the kernel values can differ from the kernel values of a second kernel applied to a second sub-array of the plurality of input data values forming a different sub-array of the same array of input values.
In a two-dimensional array the position could be a spatial position defined, for instance, by two spatial coordinates (x,y). In a one-dimensional array the position could be a temporal position defined, for instance, by a time coordinate (t).
The array of input data values can be one-dimensional (i.e. a vector, e.g. audio or other e.g. temporal sequence), two-dimensional (i.e. a matrix, e.g. an image or other temporal or spatial sequence), or N-dimensional (e.g. any kind of N-dimensional feature array, e.g. provided by a conventional pre-processing or feature extraction and/or by other layers of the neural network 1 10). The array of input data values can have one or more channels, e.g. for an RGB image one R-channel, one G-channel and one B-channel, or for a black/white image only one grey-scale or intensity channel. The term "channel" can refer to any "feature", e.g. features obtained from conventional pre-processing or feature extraction or from other neural networks or neural network layers of the neural network 1 10. The array of input data values can comprise, for instance, two-dimensional RGB or grey scale image or video data representing at least a part of an image, or a one- dimensional audio signal. In case the neural network layer 120 is implemented as an intermediate layer of the neural network 1 10, the array of input data values can be, for instance, an array of similarity features generated by previous layers of the neural network on the basis of an initial, e.g. original array of input data values, e.g. by means of a feature extraction, as will be described in more detail further below. As will be described in more detail below, the neural network layer 120 can be
implemented as an aggregation layer 120 configured to process each channel of the array of input data values separately, e.g. for an sub-array of the input array of R-values one (scalar) R-output value is generated. The position dependent kernels may be channel- specific or common for all channels. Moreover, the neural network layer 120 can be implemented as a convolution (or convolutional) layer configured to "mix" all channels of the array of input data values. For instance, in case the array of input data values is an RGB image, i.e. a multi-channel array, based on the three corresponding sub-arrays of the three input arrays (R,G and B) only one (scalar) output value is generated for the three channels (R,G and B) of the multi-channel array of input data values. The position dependent kernels may be channel-specific or common for all channels. In the case of a convolution layer 120 the position dependent kernels are generally multi-channel kernels. Furthermore, the neural network layer can be implemented as a correlation layer providing a combination of aggregation or convolution (input image and weighted kernel) and an additional image (i.e. correlation of two, e.g. of the same or at the same position, sub- arrays in the two images with each other and additional application of the position dependent kernel on the correlation result). Also in this case the position dependent kernels may be channel-specific or common for all channels.
Figure 2 shows a schematic diagram illustrating elements of the neural network 1 10 provided by the data processing apparatus 100 according to an embodiment. In the embodiment shown in figure 2, the neural network layer 120 is implemented as a weighted aggregation layer 120. In a further embodiment, the neural network layer 120 can be implemented as a convolution network layer 120 (also referred to as convolutional network layer 120) or as a correlation network layer 120, as will be described in more detail further below. As indicated in figure 2, in this embodiment the aggregation layer 120 is configured to generate a two-dimensional array of output data values out(x, y) 121 on the basis of a respective sub-array of the two-dimensional array of input data values in(x, y) 1 17 and a plurality of position dependent kernels 1 18 comprising a plurality of kernel values or kernel weights.
In an embodiment, the weighted aggregation layer 120 of the neural network 1 10 shown in figure 2 is configured to generate the array of output data values out(x, y) 121 on the basis of the plurality of sub-arrays of the two-dimensional array of input data values in(x, y) 1 17 and the plurality of position dependent kernels 1 18 comprising the kernel values wL(x, y, i,j) using the following equation: wherein r denotes a size of each kernel of the plurality of position dependent kernels 1 18 (in this example, each kernel and each sub-array of the array of input values has
(2r+1 )*(2r+1 ) kernel values respectively input values) and the output data values can be normalized using the following normalization factor:
In other embodiments, the normalization factor can be omitted, i.e. set to one. For instance, in case the neural network layer 120 is implemented as a convolutional network layer the normalization factor can be omitted. For weighted aggregation the normalization factor allows to keep the mean value or DC component. This can be advantageous, when the weighted aggregation layer 120 is used to aggregate stereo matching costs of a stereo image, because the normalization is beneficial for making the output values for different sub-arrays of the array of input data values comparable. This is usually not necessary in the case of the convolutional network layer 120. As will be appreciated, the above equations for a two-dimensional input array and a kernel having a quadratic shape can be easily adapted to the case of an array of input values 1 17 having one dimension or more than two dimensions and/or a kernel having a rectangular shape, e.g. a non-square rectangular shape with different horizontal and vertical dimensions. For an embodiment, where the neural network layer 120 is implemented as a convolution network layer and the array of input data values in(x, y, c ) 1 17 is a two-dimensional array of input data values having more than one channel , such as in the case of RGB image data, the neural network layer 120 can be configured to generate the array of output data values out x, y, c0) 121 having one or more channels on the basis of the plurality of sub- arrays of the two-dimensional array of input data values in(x, y, c ) 1 17 in the different channels and the plurality of position dependent kernels 1 18 comprising the kernel values wL(x, y, c0, Ci, i,j) using the following equation: wherein Ct denotes the number of channels of the array of input data values
1 17 and the output data values can be normalized using the following normalization factor:
In an embodiment, the neural network layer 120 is configured to generate the array of output data values 121 with a smaller size than the array of input data values 1 17. In other words, in an embodiment, the neural network 1 10 is configured to perform a down-step operation on the basis of the plurality of position dependent kernels 1 18. Figure 3 illustrates a down step operation provided by the neural network 1 10 of the data processing apparatus 100 according to an embodiment. Using a down step operation allows increasing the receptive field, enables processing the data with a cascade of smaller filters as compared with a single layer with a kernel covering an equal receptive field, and also enables the neural network 1 10 to better analyze the data by finding more sophisticated relationships among the data and adding more non-linearities to the processing chain by separating each convolution layer with a non-linear element like a sigmoid or a Rectified Linear Unit (ReLU).
In the down-step operation illustrated in figure 3 the neural network layer 120 can combine the input data values to produce the array of output data values with a reduced resolution. This can be achieved by convolving the array of input data values 1 17 with the position dependent kernels 1 18 with a stride S greater than 1 . The stride S specifies the spacing between neighboring input spatial positions for which convolutions are computed. If the stride S is equal to 1 , the convolution is performed for each spatial position. If the stride S is greater than 1 , the neural network layer 120 is configured to perform a convolution for every S-th spatial position of the array of input data values 1 17, thereby reducing the output resolution (i.e. the dimensions of the array of output data values 121 by a factor of S for each spatial dimension). The horizontal and the vertical stride may be the same or different. In the exemplary embodiment shown in figure 3, the neural network layer 120 combines the array of input data values 1 17 from the spatial area of size (2r+1 )x(2r+1 ) to produce a respective output data value of the array of output data values 121 . In this way, the input data values 1 17 can be aggregated to pack information present in a larger spatial area into one single spatial position.
In the embodiment shown in figure 2, the neural network 1 10 comprises one or more preceding layers 1 15 preceding the neural network layer 120 and one or more following layers 125 following the neural network layer 120. In an embodiment, the neural network layer 120 could be the first and/or the last data processing layer of the neural network 1 10, i.e. in an embodiment there could be no preceding layers 1 15 and/or no following layers 125.
In an embodiment, the one or more preceding layers 1 15 can be further neural network layers and/or "conventional" pre-processing layers, such as a feature extraction layer. Likewise, in an embodiment, the one or more following layers 125 can be further neural network layers, such as a deconvolutional layer, and/or "conventional" post-processing layers.
As shown in the embodiment shown in figure 2, one or more of the preceding layers 1 15 can be configured to provide, i.e. to generate the plurality of position dependent kernels 1 18 (see the bottom signal path of the preceding layers 1 15 from guiding data 1 13 to the position dependent kernels 1 18 in Fig. 2). In an embodiment, the one or more layers of the preceding layers 1 15 can generate the plurality of position dependent kernels 1 18 on the basis of an original array 1 1 1 of original input data values, e.g. an original image as 2D example. As indicated in figure 2, in an embodiment, the original array 1 1 1 of original input data values can be an array of input data 1 1 1 being the original input of the neural network 1 10,. In another embodiment, the one or more preceding layers 1 15 could be configured to generate just the plurality of position dependent kernels 1 18 on the basis of the original input data 1 1 1 of the neural network 1 10 and to provide the original input data 1 1 1 of the neural network 1 10 as the array of input data values 1 17 to the neural network layer 120 (no preceding layers in the top signal path of the preceding layers 1 15 from the original input array 1 1 1 to the input array 1 17 according to an embodiment of the neural network layer 120, see Fig. 2). In other words, according to an embodiment, the original array 1 1 1 may form the input array 1 17.
As indicated in figure 2, in a further embodiment the one or more preceding layers 1 15 of the neural network 1 10 are configured to generate the plurality of position dependent kernels 1 18 on the basis of an array of guiding data 1 13. A more detailed view of the processing steps of the neural network 1 10 of the data processing apparatus 100 according to such an embodiment is shown in figure 4 for the exemplary case of two- dimensional input and output arrays. The array of guiding data 1 13 is used by the one or more preceding layers 1 15 of the neural network 1 10 to generate the plurality of position dependent kernels wL(x, y) 1 18 on the basis of the array of guiding data g(x, y) 1 13. As already described in the context of figure 2, the neural network layer 120 is configured to generate the two-dimensional array of output data values out(x, y) 121 on the basis of the two-dimensional array of input data values in(x, y) 1 17 and the plurality of position dependent kernels wL(x, y) 1 18, which, in turn, are based on the array of guiding data g x, y 1 13. In an embodiment, the one or more preceding layers 1 15 of the neural network 1 10 are neural network layers configured to learn the plurality of position dependent kernels wi,( y) 1 18 on the basis of the array of guiding data gO, y) 1 13. In another embodiment, the one or more preceding layers 1 15 of the neural network 1 10 are pre-processing layers configured to generate the plurality of position dependent kernels wL(x, y) 1 18 on the basis of the array of guiding data 1 13 using one or more pre-processing schemes, such as feature extraction.
In an embodiment, the one or more preceding layers 1 15 of the neural network 1 10 are configured to generate the plurality of position dependent kernels wL(x, y) 1 18 on the basis of the array of guiding data g(x, y) 1 13 in a way analogous to bilateral filtering, as illustrated in figure 5. Bilateral filtering is known in the field of image processing for performing a weighted aggregation of input data, while decreasing the influence of some input values and amplifying the influence of other input values on the aggregation result [M. Elad, "On the origin of bilateral filter and ways to improve it", IEEE Transactions on Image Processing, vol. 1 1 , no. 10, pp. 1 141 -1 151 , October 2002]. As illustrated in figure 5, the weights 518 utilized for aggregating the array of input data values 517 adapt to input data 517 using the guiding image data g 513 which provides additional information to control the aggregation process. In an embodiment, the array of guiding image data 513 can be equal to the array of input data values for generating the array of output data values 521 by the layer 520 on the basis of weights 518. The bilateral filter weights 518 take into consideration the distance of the value within the kernel from the center of the kernel and, additionally, the similarity of the data values with data in the center of the kernel, as mathematically described by the following equation: wherein the normalization factor is based on the following equation: In an embodiment, the bilateral filter weights 518 are defined by the following equation: wherein d . , . ) denotes a distance function.
Figure 6 shows a schematic diagram highlighting the main processing stage 601 of the data processing apparatus 100 according to an embodiment, for instance, the data processing apparatus 100 providing the neural network 1 10 shown in figure 2. As already described above, in a first processing step 603 the neural network 1 10 can generate the plurality of position dependent kernels wL(x, y) 1 18 on the basis of the array of guiding data g(x, y) 1 13. In a second processing step 605 the neural network 1 10 can generate the array of output data values out(x, y) 121 on the basis of the array of input data values in(x, y) 1 17 and the plurality of position dependent kernels wL (x, y, 1 18. Figure 7 shows a schematic diagram illustrating the neural network 1 10 provided by the data processing apparatus 100 according to a further embodiment. As will be described in more detail in the following, the main difference to the embodiment shown in figure 2 is that in the embodiment shown in figure 7 the neural network 1 10 is configured to generate the plurality of position dependent kernels based on a plurality of position independent kernels 1 19b (shown in figure 8) and a plurality of position dependent weights Ff (x, y) 1 19a (also referred to as similarity features 1 19). In an embodiment, the similarity features 1 19a are obtained based on the guiding data 1 13 and could indicate higher-level knowledge about the input data 1 1 1 , including e.g. semantic segmentation, per-instance object detection, data importance indicators like ROI (Region of Interest) and many others - all learned by the neural network 1 10 itself or being an additional input to the neural network 1 10. In an embodiment, the neural network 1 10 of figure 7 is configured to generate the plurality of position dependent kernels 1 18 by adding the position independent kernels 1 19b each weighted by the associated position dependent weights Ff (x, y) 1 19a.
In an embodiment, the plurality of position independent kernels 1 19b can be
predetermined or learned by the neural network 1 10. As illustrated in figure 7, also in this embodiment the neural network 1 10 can comprise one or more preceding layers 1 15, which precede the neural network layer 120 and which can be implemented as an additional neural network layer or a pre-processing layer. In an embodiment, one or more layers of the preceding layers 1 15 are configured to generate the plurality of position dependent weights Ff (x, y) 1 19a on the basis of an original array of original input data values or the guiding data 1 13. The original array of original input data values of the neural network 1 10 can comprise the array of input data values 1 17 to be processed by the neural network layer 120 or another array of input data values 1 1 1 associated to the array of input data values 1 17, for instance, the initial or original array of input data 1 1 1 . In the exemplary embodiment shown in figure 7, the array of input data values in(x, y) 1 17 and the array of output data values out(x, y) 121 are two-dimensional arrays and the neural network layer 120 is configured to generate a respective kernel of the plurality of position dependent kernels wL(x, y, i,j) 1 18 on the basis of the following equation: wherein Ff (x, y) denotes the set of Nf position dependent weights (or similarity features) 1 19a and Kf denotes the plurality of position independent kernels 1 19b, as also illustrated in figure 8.
Figure 9 shows a schematic diagram highlighting the main processing stage 901 of the data processing apparatus 100 according to an embodiment, for instance, the data processing apparatus 100 providing the neural network 1 10 illustrated in figures 7 and 8. As already described above, in a first processing step 903 the neural network 1 10 can generate the plurality of position dependent weights or similarity features Ff (x, y) 1 19a on the basis of the array of guiding data g(x, y) 1 13. In a second processing step 905 the neural network 1 10 can generate the plurality of position dependent kernels wL(x, y, 1 18 on the basis of the plurality of position dependent weights or similarity features Ff {x, y) 1 19a and the plurality of position independent kernels Kf 1 19b. In a further step (not shown in figure 9, but similar to the processing step 605 shown in figure 6) the neural network layer 120 can generate the array of output data values out(x, y) 121 on the basis of the array of input data values in(x, y) 1 17 and the plurality of position dependent kernels wL(x, y, i,j) 1 18.
As already mentioned above, in an embodiment the neural network layer 120 of the neural network 1 10 can be implemented in the form of a correlation network layer 120 configured to generate the array of output data values 121 from the array of input data values 1 17 and a further array of input data values by correlating the array of input data values 1 17 with the further array of input data values and by applying a respective position dependent kernel of the plurality of position dependent kernels 1 18 to a respective sub-array of the array of input data values 1 17 and a corresponding sub-array of the further array of input data values. In case the array of input data values 1 17, the further array of input data values and the plurality of position dependent kernels 1 18 are respective two-dimensional arrays (as in the embodiments shown in figure 2 and 7), the correlation neural network layer 120 can be configured to generate the array of output data values out(x, y) 121 on the basis of the following equation: wherein inl(x - i, y - ;') denotes the array of input data values 1 17, in2(x - i, y - ;') denotes the further array of input data values, wL1(x, y, i,fi denotes the plurality of position dependent kernels 1 18 and r denotes the size of each kernel of the plurality of position dependent kernels 1 18 (in this example, each kernel has (2r+1 )*(2r+1 ) kernel values). The output data values 121 can be normalized using the following normalization factor:
In other embodiments, the normalization factor can be omitted, i.e. set to one. As will be appreciated, the above equations for a two-dimensional input array and a kernel having a quadratic shape can be easily adapted to the case of an array of input values 1 17 having one dimension or more than two dimensions and/or a kernel having a non-square rectangular shape, i.e. different horizontal and vertical dimensions.
In a further embodiment, the correlation network layer 120 is configured to generate the array of output data values 121 from the array of input data values 1 17 and the further array of input data values by correlating the array of input data values 1 17 with the further array of input data values and by applying a respective position dependent kernel of the plurality of position dependent kernels 1 18 associated to the array of input data values 1 17 and a further position dependent kernel of a plurality of further position dependent kernels associated to the further array of input data values to a respective sub-array of the array of input data values 1 17 and a corresponding sub-array of the further array of input data values. In case the array of input data values 1 17 and the further array of input data values are respective two-dimensional arrays (as in the embodiments shown in figure 2 and 7), the correlation neural network layer 120 can be configured to generate the array of output data values out(x, y) 121 on the basis of the following equation:
wherein inl(x - i, y - ;') denotes the array of input data values 1 17, in2(x - i, y - j) denotes the further array of input data values, wL1(x, y, i,fi denotes the plurality of position dependent kernels 1 18, wL2(x, y, denotes the plurality of further position dependent kernels associated to the further array of input data values. The output data values 121 can be normalized using the following normalization factor:
In a further embodiment, the neural network layer 120 is configured to process the array of input data values 1 17 on the basis of the plurality of position dependent kernels 1 18 using a maximum or minimum pooling scheme. More specifically, in such an embodiment, the neural network layer 120 is configured to generate a respective output data value of the array of output data values 121 by determining a respective input data value of a respective sub-array of the plurality of sub-arrays of the array of input data values 1 17 being associated with a maximum or minimum kernel value of a respective position dependent kernel of the plurality of position dependent kernels 1 18 and using the respective determined input data value as the respective output data value.
In a further embodiment, the neural network layer 120 according to one of the
embodiments described above is used by the neural network 1 10 to perform weighted aggregation of stereo matching costs in order to obtain a depth map from a stereo image. Cost aggregation is a commonly used approach as a method to minimize noise and improve the depth estimation results. Without additional weighting object borders at depth discontinuities would normally be over-smoothed. Consequently, a much desired feature is to preserve these borders by taking into account some additional knowledge about object borders in the scene. Thus, advantageously, the neural network layer 120 can use e.g. object features derived from semantic segmentation as the guiding data 1 13 in order to determine the object borders in the scene and guide the aggregation process of the input stereo matching cost producing the aggregated stereo matching cost as an output.
Figure 10 shows a flow diagram illustrating a data processing method 1000 based on a neural network 1 10 according to an embodiment. The data processing method 1000 can be performed by the data processing apparatus 100 shown in figure 1 and its different embodiments. The data processing method 1000 comprises the step 1001 of generating by the neural network layer 120 of the neural network 1 10 from the array of input data values 1 17 the array of output data values 121 based on the plurality of position dependent kernels 1 18 and the plurality of sub-arrays of the array of input data values 1 17. Embodiments of the data processing methods may be implemented and/or performed by one or more processors as described above.
Referring to back to the various embodiments described above, a first kernel is considered different to a second kernel if a kernel value of the array of kernel values of the first kernel at at least one position (or of at least one element) of the first kernel is different from a kernel value of the array of kernel values of the second kernel at the same position (or of the same element) of the kernel. Typically a kernel has the same size (number of elements, positions or values per dimension) and dimension (number of dimensions N, N>=1 ) as the sub-array of the array of input values it is applied to. Typically the different sub-arrays of the array of input values have all the same size and dimension. Accordingly, the different kernels typically have the same size and dimension.
A first sub-array of the array of input values is considered different to a second sub-array of the array of input values if the first sub-array of the array of input values comprises at least one element of the array of input values which is not comprised by the second sub- array of the array of input values. Typically the different sub-arrays of the array of input values differ at least by one column or row of elements of the array of input values. The different sub-arrays may partially overlap or not overlap as shown in Fig. 3.
In the following some further details about various aspects and embodiments (aggregation network layer, convolution network layer, correlation network layer and normalization) are provided. Aggregation
Embodiments of the proposed guided aggregation can be applied for guided feature maps down-scaling. By using input position dependent kernels as the guiding data, input values which are features of the feature map are grouped to form input data sub-arrays of the input data array and can be further aggregated in a controlled way producing an output feature value representative for the whole sub-array. This way, by changing the resolution of the input feature map object borders and other details that are normally lost while down-scaling, can be better preserved. In such cases, guiding data represents information about object or region borders, obtained by e.g. color-based segmentation, semantic segmentation using preceding neural network layers or an edge map of a texture image corresponding to processed feature map. Convolution
Embodiments of the proposed guided convolution can be applied for switchable feature extraction. Input values which are features of the feature map are convolved with adaptable feature extraction filters which are formed from the input guiding data in form of position dependent kernels. This way, each selected area of the input feature map can be processed with feature extraction filters producing only features desired for these regions. Here, guiding data in form of similarity features represents information about object or region borders, obtained by e.g. color-based segmentation, semantic segmentation using preceding neural network layers, an edge map of a texture image corresponding to processed feature map or a ROI (region of interest) binary map.
Correlation
Embodiments of the proposed guided correlation can be applied for guided correlation of input feature maps. By using input position dependent kernels as the guiding data, input values which are features of the two or more feature maps are correlated together in a controlled way enabling amplification or attenuation of some features within a correlation region. This way, features that correspond to some other objects/regions in the feature map can be excluded or taken with smaller impact to compute the result. Also, some of the features characteristic for a selected region can be amplified. Here, guiding data represents information about object or region borders, obtained by e.g. color-based segmentation, semantic segmentation using preceding neural network layers or an edge map of a texture image corresponding to processed feature map or a ROI (region of interest) binary map.
Normalization
In general, normalization is advantageous if the output values obtained for different spatial positions are going to be compared to each other per-value, without any intermediate step. As a result, preservation of the mean (DC) component is appreciated. If such comparison is not performed, normalization is not required but just increases complexity. Additionally, one can omit normalization in order to simplify the computations and compute only an approximate result.
While a particular feature or aspect of the disclosure may have been disclosed with respect to only one of several implementations or embodiments, such feature or aspect may be combined with one or more other features or aspects of the other implementations or embodiments as may be desired and advantageous for any given or particular application. Furthermore, to the extent that the terms "include", "have", "with", or other variants thereof are used in either the detailed description or the claims, such terms are intended to be inclusive in a manner similar to the term "comprise". Also, the terms "exemplary", "for example" and "e.g." are merely meant as an example, rather than the best or optimal. The terms "coupled" and "connected", along with derivatives may have been used. It should be understood that these terms may have been used to indicate that two elements cooperate or interact with each other regardless whether they are in direct physical or electrical contact, or they are not in direct contact with each other.
Although specific aspects have been illustrated and described herein, it will be
appreciated by those of ordinary skill in the art that a variety of alternate and/or equivalent implementations may be substituted for the specific aspects shown and described without departing from the scope of the present disclosure. This application is intended to cover any adaptations or variations of the specific aspects discussed herein.
Although the elements in the following claims are recited in a particular sequence with corresponding labeling, unless the claim recitations otherwise imply a particular sequence for implementing some or all of those elements, those elements are not necessarily intended to be limited to being implemented in that particular sequence.
Many alternatives, modifications, and variations will be apparent to those skilled in the art in light of the above teachings. Of course, those skilled in the art readily recognize that there are numerous applications of the invention beyond those described herein. While the present invention has been described with reference to one or more particular embodiments, those skilled in the art recognize that many changes may be made thereto without departing from the scope of the present invention. It is therefore to be understood that within the scope of the appended claims and their equivalents, the invention may be practiced otherwise than as specifically described herein.

Claims

1 . A data processing apparatus (100) comprising: a processor (101 ) configured to provide a neural network (1 10), wherein the neural network (1 10) comprises a neural network layer (120) being configured to generate from an array of input data values (1 17) an array of output data values (121 ) based on a plurality of position dependent kernels (1 18; 1 19) and a plurality of sub-arrays of the array of input data values (1 17).
2. The data processing apparatus (100) of claim 1 , wherein the neural network (1 10) comprises an additional neural network layer (1 15) configured to generate the plurality of position dependent kernels (1 18) based on an original array of original input data values (1 1 1 , 1 17) of the neural network (1 10), wherein the original array of original input data values (1 1 1 , 1 17) of the neural network (1 10) comprises the array of input data values (1 17) or another array of input data values (1 17) associated to the array of input data values (1 1 1 ).
3. The data processing apparatus (100) of claim 1 or 2, wherein the neural network (1 10) is configured to generate the plurality of position dependent kernels (1 18) based on a plurality of position independent kernels (1 19b) and a plurality of position dependent weights (1 19a).
4. The data processing apparatus (100) of claim 3, wherein the neural network (1 10) is configured to generate a kernel of the plurality of position dependent kernels (1 18) by adding the position independent kernels (1 19b) weighted by the associated position dependent weights (1 19a).
5. The data processing apparatus (100) of claim 3 or 4, wherein the plurality of position independent kernels (1 18) are predetermined or learned and wherein the neural network (1 10) comprises an additional neural network layer (1 15) or processing layer (1 15) configured to generate the plurality of position dependent weights (1 19a) based on an original array of original input data values (1 1 1 , 1 17) of the neural network (1 10), wherein the original array of original input data values (1 1 1 , 1 17) of the neural network (1 10) comprises the array of input data values (1 17) or another array of input data values
(1 1 1 ) associated to the array of input data values (1 17).
6. The data processing apparatus (100) of any one of claims 3 to 5, wherein the array of input data values (1 17) and the array of output data values (121 ) are two-dimensional arrays and the neural network layer (120) is configured to generate a kernel of the plurality of position dependent kernels (1 18) on the basis of the following equation:
wherein Ff (x, y) denotes the plurality of Nf position dependent weights (1 19a) and Kf denotes the plurality of position independent kernels (1 19b).
7. The data processing apparatus (100) of any one of the preceding claims, wherein the neural network layer (120) is a convolutional network layer or an aggregation network layer.
8. The data processing apparatus (100) of any one of the preceding claims, wherein the array of input data values (1 17) and the array of output data values (121 ) are two- dimensional arrays, wherein the neural network layer (120) is a convolutional network layer configured to generate the array of output data values (121 ) on the basis of the following equation: with
wherein out{x, y, c0) denotes the array of output data values (121 ), in(x, y, c{) denotes the array of input data values (1 17), r denotes a size of each kernel of the plurality of position dependent kernels wL(x, y, c0, cu i,j) and wL(x, y, c0) denotes a normalization factor, or, wherein the neural network layer (1 20) is an aggregation network layer configured to generate the array of output data values (1 21 ) on the basis of the following equation: with: wherein out(x, y) denotes the array of output data values (1 21 ), in(x, y) denotes the array of input data values (1 1 7) , r denotes a size of each kernel of the plurality of position dependent kernels wL (x, y, i,fi and WL (x, y) denotes a normalization factor.
9. The data processing apparatus (1 00) of any one of claims 1 to 7, wherein the neural network layer (1 20) is a correlation network layer configured to generate the array of output data values ( 1 21 ) from the array of input data values ( 1 1 7) and a further array of input data values by: correlating the array of input data values (1 1 7) with the further array of input data values and applying a position dependent kernel of the plurality of position dependent kernels (1 1 8), or correlating the array of input data values (1 1 7) with the further array of input data values and applying a position dependent kernel of the plurality of position dependent kernels (1 1 8) associated to the array of input data values (1 1 7) and a further position dependent kernel of a plurality of further position dependent kernels associated to the further array of input data values.
1 0. The data processing apparatus (1 00) of claim 9, wherein the array of input data values (1 1 7) , the further array of input data values and the plurality of position dependent kernels (1 1 8) are respective two-dimensional arrays and wherein the correlation neural network layer (120) is configured to generate the array of output data values (121 ) on the basis of the following equation: with:
wherein out(x, y) denotes the array of output data values (121 ), inl(x, y) denotes the array of input data values (1 17), in2(x, y) denotes the further array of input data values, denotes a size of each kernel of the plurality of position dependent kernels
and WL (x, y) denotes a normalization factor, or
with:
wherein out(x, y) denotes the array of output data values (121 ), inl(x, y) denotes the array of input data values (1 17), in2(x, y) denotes the further array of input data values, r denotes a size of each kernel of the plurality of position dependent kernels
and of each kernel the plurality of further position dependent kernels and
WL(x, y) denotes a normalization factor.
1 1 . The data processing apparatus (100) of any one of claims 1 to 7, wherein the neural network layer (120) is configured to generate a respective output data value of the array of output data values (121 ) by determining a respective input data value of a respective sub-array of input data values of the plurality of sub-arrays of input data values being associated with a maximum or minimum kernel value of a position dependent kernel and using the respective determined input data value as the respective output data value.
12. A data processing method (1000) comprising:
generating (1001 ) by a neural network layer (120) of a neural network (1 10) from an array of input data values (1 17) an array of output data values (121 ) based on a plurality of position dependent kernels (1 18) and a plurality of sub-arrays of the array of input data values (1 17).
13. A computer program comprising program code for performing the method (1000) of claim 12, when executed on a computer and/or a processor.
EP17713634.8A 2017-03-24 2017-03-24 Neural network data processing apparatus and method Ceased EP3590076A1 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2017/057088 WO2018171899A1 (en) 2017-03-24 2017-03-24 Neural network data processing apparatus and method

Publications (1)

Publication Number Publication Date
EP3590076A1 true EP3590076A1 (en) 2020-01-08

Family

ID=58413093

Family Applications (1)

Application Number Title Priority Date Filing Date
EP17713634.8A Ceased EP3590076A1 (en) 2017-03-24 2017-03-24 Neural network data processing apparatus and method

Country Status (3)

Country Link
EP (1) EP3590076A1 (en)
CN (1) CN110462637B (en)
WO (1) WO2018171899A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10929665B2 (en) 2018-12-21 2021-02-23 Samsung Electronics Co., Ltd. System and method for providing dominant scene classification by semantic segmentation

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106156845A (en) * 2015-03-23 2016-11-23 日本电气株式会社 A kind of method and apparatus for building neutral net
CN106156807B (en) * 2015-04-02 2020-06-02 华中科技大学 Training method and device of convolutional neural network model
US20160358069A1 (en) * 2015-06-03 2016-12-08 Samsung Electronics Co., Ltd. Neural network suppression
CN105096279A (en) * 2015-09-23 2015-11-25 成都融创智谷科技有限公司 Digital image processing method based on convolutional neural network
CN106447035B (en) * 2015-10-08 2019-02-26 上海兆芯集成电路有限公司 Processor with variable rate execution unit
CN107506828B (en) * 2016-01-20 2020-11-03 中科寒武纪科技股份有限公司 Artificial neural network computing device and method for sparse connection
CN105913117A (en) * 2016-04-04 2016-08-31 北京工业大学 Intelligent related neural network computer identification method
CN106066783A (en) * 2016-06-02 2016-11-02 华为技术有限公司 The neutral net forward direction arithmetic hardware structure quantified based on power weight
CN106407903A (en) * 2016-08-31 2017-02-15 四川瞳知科技有限公司 Multiple dimensioned convolution neural network-based real time human body abnormal behavior identification method

Also Published As

Publication number Publication date
CN110462637A (en) 2019-11-15
CN110462637B (en) 2022-07-19
WO2018171899A1 (en) 2018-09-27

Similar Documents

Publication Publication Date Title
US11687775B2 (en) Neural network data processing apparatus and method
US9418458B2 (en) Graph image representation from convolutional neural networks
US20190108618A1 (en) Image signal processor for processing images
US20220215588A1 (en) Image signal processor for processing images
JP6961640B2 (en) Data processing system and method
US8417047B2 (en) Noise suppression in low light images
Liu et al. Image de-hazing from the perspective of noise filtering
CN111340732B (en) Low-illumination video image enhancement method and device
US11893710B2 (en) Image reconstruction method, electronic device and computer-readable storage medium
CN110809126A (en) Video frame interpolation method and system based on adaptive deformable convolution
US9672447B2 (en) Segmentation based image transform
EP3590076A1 (en) Neural network data processing apparatus and method
CN111932472A (en) Image edge-preserving filtering method based on soft clustering
Silverman et al. Segmentation of hyperspectral images based on histograms of principal components
CN110647898B (en) Image processing method, image processing device, electronic equipment and computer storage medium
CN108986052B (en) Self-adaptive image illumination removing method and system
Ishida et al. Shadow detection by three shadow models with features robust to illumination changes
CN116917954A (en) Image detection method and device and electronic equipment
US20190188512A1 (en) Method and image processing entity for applying a convolutional neural network to an image
Girish et al. One network doesn't rule them all: Moving beyond handcrafted architectures in self-supervised learning
KR20200023154A (en) Method and apparatus for processing convolution neural network
EP3023934A1 (en) Method and apparatus for filtering an array of pixels
WO2023061465A1 (en) Methods, systems, and media for computer vision using 2d convolution of 4d video data tensors
CN115190226B (en) Parameter adjustment method, neural network model training method and related devices
CN118135389B (en) Underwater acoustic target identification method based on effective receptive field regulation and control

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: UNKNOWN

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20191002

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

17Q First examination report despatched

Effective date: 20210609

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

REG Reference to a national code

Ref country code: DE

Ref legal event code: R003

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED

18R Application refused

Effective date: 20220711