WO2023006170A1 - Dispositifs et procédés d'utilisation de réseaux neuronaux efficaces sur le plan informatique - Google Patents
Dispositifs et procédés d'utilisation de réseaux neuronaux efficaces sur le plan informatique Download PDFInfo
- Publication number
- WO2023006170A1 WO2023006170A1 PCT/EP2021/070803 EP2021070803W WO2023006170A1 WO 2023006170 A1 WO2023006170 A1 WO 2023006170A1 EP 2021070803 W EP2021070803 W EP 2021070803W WO 2023006170 A1 WO2023006170 A1 WO 2023006170A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- feature values
- input
- output
- tensor
- input feature
- Prior art date
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 45
- 238000000034 method Methods 0.000 title claims abstract description 28
- 238000012545 processing Methods 0.000 claims abstract description 104
- 230000008569 process Effects 0.000 claims abstract description 9
- 239000011159 matrix material Substances 0.000 claims description 31
- 239000013598 vector Substances 0.000 claims description 14
- 238000003672 processing method Methods 0.000 claims description 9
- 230000004913 activation Effects 0.000 claims description 8
- 238000010606 normalization Methods 0.000 claims description 4
- 238000004590 computer program Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 11
- 238000013527 convolutional neural network Methods 0.000 description 9
- 230000006870 function Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 230000006835 compression Effects 0.000 description 5
- 238000007906 compression Methods 0.000 description 5
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000012549 training Methods 0.000 description 2
- AZFKQCNGMSSWDS-UHFFFAOYSA-N MCPA-thioethyl Chemical compound CCSC(=O)COC1=CC=C(Cl)C=C1C AZFKQCNGMSSWDS-UHFFFAOYSA-N 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- HPNSNYBUADCFDR-UHFFFAOYSA-N chromafenozide Chemical group CC1=CC(C)=CC(C(=O)N(NC(=O)C=2C(=C3CCCOC3=CC=2)C)C(C)(C)C)=C1 HPNSNYBUADCFDR-UHFFFAOYSA-N 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000004146 energy storage Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000005055 memory storage Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/544—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices for evaluating functions by calculation
- G06F7/5443—Sum of products
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0495—Quantised networks; Sparse networks; Compressed networks
Definitions
- the present invention relates to data processing. More specifically, the present invention relates to devices and methods for providing computationally efficient convolutional neural networks, i.e. neural networks with convolutional layers being less demanding with respect to computational resources than conventional convolutional neural networks.
- CNNs convolutional neural networks
- FLOPs floating-point operations
- CNNs with convolutional layers on electronic devices with reduced hardware capabilities in terms of processing power, memory and energy storage, such as smartphones, intelligent cameras or other types of loT devices.
- processing power such as smartphones, intelligent cameras or other types of loT devices.
- memory and energy storage such as smartphones, intelligent cameras or other types of loT devices.
- models are costly to train, fine-tune and deploy, due to the cost of renting cloud computing time.
- embodiments disclosed herein provide devices and methods implementing convolutional neural networks with one or more convolutional layers that allow a more efficient processing, in particular of sparse input tensors, i.e. input tensors comprising a substantial number of zero input feature values and, thus, reduce significantly the required FLOPs, power-consumption and the training time.
- a data processing apparatus comprising a processing circuitry configured to implement a neural network.
- the data processing apparatus implementing the neural network may be, for instance, a mobile phone with limited computational and/or memory resources.
- the neural network comprises a plurality of processing layers and each processing layer is configured to process a respective input tensor of input feature values into a respective output tensor of output feature values.
- the plurality of processing layers comprises at least one convolutional layer including a convolution operator, wherein the convolution operator is configured to perform a convolution operation of the input tensor using one or more filter tensors (also known as kernel tensors or kernels).
- the plurality of input feature values which may be provided by a preceding activation layer, may comprise a plurality of zero input feature values, i.e. input feature values having the value zero and a plurality of non-zero input feature values, input feature values having a non-zero value.
- the input tensor and/or the output tensor may be based, for instance, on one or more images having a width, a height and a plurality of channels.
- the convolution operator For determining a respective output feature value of the output tensor the convolution operator is configured to determine one or more indexes identifying the one or more non zero input feature values of a subset of the plurality of input feature values and to determine the respective output feature value of the output tensor using the one or more non-zero input feature values of the subset of the plurality of input feature values identified by the one or more indexes.
- the subset of the plurality of input feature values of the input tensor may be, for instance, a one-, two- or three-dimensional sub-array of the input tensor.
- the input tensor comprises for each of a plurality of input channels a respective matrix of the input feature values at a plurality of first matrix positions and the output tensor comprises for each of a plurality of output channels a respective matrix of the output feature values at a plurality of second matrix positions.
- the plurality of first matric positions and the plurality of second matrix positions may be defined, for instance, by a row index and/or a column index.
- the convolution operation is a pointwise convolution operation and the convolution operator is configured for each first matrix position to determine the input channels having a non-zero input feature value and to determine one or more output feature values of the output tensor by representing one or more matrix multiplications as a linear combination of one or more columns of the one or more filter tensors with the matrices of input feature values of the input channels having a non-zero input feature value.
- a number of the plurality of input feature values is equal to a number of the plurality of output feature values.
- the convolution operation is a depthwise convolution operation and the convolution operator is configured for each first matrix position to determine the input channels having a non-zero input feature value and to determine one or more output feature values of the output tensor by representing one or more matrix multiplications as a linear combination of one or more columns of the one or more filter tensors with the matrices of input feature values of the input channels having a non-zero input feature value and by taking into account a respective offset of the one or more filter tensors from the first matrix position.
- a number of the plurality of input feature values is larger than a number of the plurality of output feature values.
- the convolution operator is further configured to unfold the input tensor so that for each of the plurality of input channels the plurality of input feature values define a one-dimensional array.
- the convolution operator is further configured to determine the one or more output feature values of the output tensor based on one or more single input multiple data, SIMD, instructions.
- the convolution operator is configured to represent the one or more indexes identifying the one or more non-zero input feature values of the subset of the plurality of input feature values as one or more components of a vector of bits.
- the input channels having a non-zero input feature value are represented by a first bit value and the input channels having a zero input feature value are represented by a second bit value.
- the convolution operator is further configured to encode for each first matrix position the vector of bits as an integer number.
- the plurality of processing layers further comprises at least one sparsifying activation layer, for instance, a ReLU layer.
- the plurality of processing layers further comprises at least one batch normalization layer, wherein the at least one sparsifying activation layer precedes or follows the at least one convolutional layer and/or the at least one batch normalization layer.
- the plurality of input feature values is based on image, text, audio, video, and/or numerical data.
- the data processing apparatus further comprises an image capturing device, such as a camera, configured to capture the image and/or video data.
- an image capturing device such as a camera
- a computer-implemented data processing method comprising the steps of: implementing a neural network, wherein the neural network comprises a plurality of processing layers, each processing layer configured to process an input tensor of input feature values into an output tensor of output feature values, wherein the plurality of processing layers comprises at least one convolutional layer including a convolution operator, wherein the convolution operator is configured to perform a convolution operation of the input tensor using one or more filter tensors, determining one or more indexes identifying one or more non-zero input feature values of a subset of the plurality of input feature values; and determining the respective output feature value of the output tensor using the one or more non-zero input feature values of the subset of the plurality of input feature values identified by the one or more indexes.
- the data processing method according to the second aspect can be performed by the data processing apparatus according to the first aspect.
- further features of the data processing method according to the second aspect or the third aspect result directly from the functionality of the data processing apparatus according to the first aspect and its different implementation forms described above and below.
- a computer program or a computer program product comprising a computer-readable storage medium carrying program code which causes a computer or a processor to perform the method according to the second aspect when the program code is executed by the computer or the processor.
- Fig. 1 is a schematic diagram illustrating a data processing apparatus according to an embodiment implementing a neural network with a convolutional layer according to an embodiment
- Fig. 2 is a schematic diagram illustrating a convolutional layer operating on an input tensor for generating an output tensor implemented by a data processing apparatus according to an embodiment
- Fig. 3 is a schematic diagram illustrating an unfolding operation implemented by a data processing apparatus according to an embodiment
- Figs. 4, 5 and 6 are schematic diagrams illustrating a matrix multiplication operation implemented by a convolutional layer of a data processing apparatus according to an embodiment
- Fig. 7 is a schematic diagram illustrating a depthwise convolutional layer operating on an input tensor for generating an output tensor implemented by a data processing apparatus according to an embodiment
- Fig. 8 is a flow diagram illustrating processing stages performed by a pointwise convolutional layer of a neural network implemented by a data processing apparatus according to an embodiment
- Fig. 9 is a flow diagram illustrating processing stages performed by a depthwise convolutional layer of a neural network implemented by a data processing apparatus according to an embodiment
- Figs. 10a-c show exemplary source code for implementing different aspects of a convolutional layer of a neural network implemented by a data processing apparatus according to an embodiment
- Fig. 11 is a flow diagram illustrating a data processing method according to an embodiment.
- Figs. 12 and 13 show tables illustrating the performance of a neural network implemented by a data processing apparatus and method according to an embodiment in comparison with a conventional neural network.
- a disclosure in connection with a described method may also hold true for a corresponding device or system configured to perform the method and vice versa.
- a corresponding device may include one or a plurality of units, e.g. functional units, to perform the described one or plurality of method steps (e.g. one unit performing the one or plurality of steps, or a plurality of units each performing one or more of the plurality of steps), even if such one or more units are not explicitly described or illustrated in the figures.
- a specific apparatus is described based on one or a plurality of units, e.g.
- a corresponding method may include one step to perform the functionality of the one or plurality of units (e.g. one step performing the functionality of the one or plurality of units, or a plurality of steps each performing the functionality of one or more of the plurality of units), even if such one or plurality of steps are not explicitly described or illustrated in the figures. Further, it is understood that the features of the various exemplary embodiments and/or aspects described herein may be combined with each other, unless specifically noted otherwise.
- FIG. 1 is schematic diagram illustrating a data processing apparatus 100 according to an embodiment.
- the data processing apparatus 100 is, by way of example, a smartphone 100, i.e. a data processing apparatus 100 with reduced hardware capabilities with respect to computational power, memory storage and/or battery capacity.
- the data processing apparatus 100 may be implemented, for instance, as a server, a desktop computer, a laptop computer, a tablet computer or another device having the computational resources for implementing a neural network.
- the smartphone 100 may comprise a processing circuitry 101, such as one or more processors 101 for processing data, a memory 103 for storing and retrieving data and a battery 105 for providing an energy supply.
- the smartphone 100 may comprise a camera 107 for capturing image and/or video data, a user interface, such as a touch button 109 for allowing user interaction with the smartphone 100 and a display 111, e.g. a touch screen 111.
- the smartphone 100 may comprise a communication interface (not shown in figure 1), including, for instance, an antenna for exchanging data with other communication devices of a wireless communication network, such as a base station and/or a cloud server.
- the processing circuitry 101 of the smartphone 100 may be implemented in hardware and/or software.
- the hardware may comprise digital circuitry, or both analog and digital circuitry.
- Digital circuitry may comprise components such as application-specific integrated circuits (ASICs), field-programmable arrays (FPGAs), digital signal processors (DSPs), or general- purpose processors.
- ASICs application-specific integrated circuits
- FPGAs field-programmable arrays
- DSPs digital signal processors
- the memory 103 may store executable program code which, when executed by the processing circuitry 101, causes the smartphone 100 to perform the functions and methods described herein.
- the processing circuitry 101 is configured to implement a neural network, wherein the neural network comprises a plurality of processing layers configured to process a respective input tensor 201 of input feature values into a respective output tensor 203 of output feature values.
- the plurality of processing layers comprises at least one convolutional layer including a convolution operator, wherein the convolution operator is configured to perform a convolution operation of the input tensor 201 using one or more filter tensors 205 (also referred to as kernels 205).
- the three- dimensional input tensor 201 (referred to as input tensor x) shown in figure 2 comprises a plurality of matrices 201 a-n of input feature values having a width W in and a height H in for a plurality of input channels C in .
- the three-dimensional output tensor 203 shown in figure 2 comprises a plurality of matrices 203a-m of output feature values having a width W out and a height H out for a plurality of output channels C out .
- the convolution operator implemented by the data processing apparatus 100 may be defined at layer f as: where W denotes a structured multi-channel convolution (including the one or more filter tensors 205) on the input tensor 201 and i» c (t> denotes the bias.
- a convolution operation may be represented using a lower dimension.
- An approach for performing 2D convolutions is the GEneral Matrix Multiplication (GEMM) scheme.
- GEMM GEneral Matrix Multiplication
- a 3D input tensor e.g. an array image is transformed or unfolded into a 2D array, and hence, can be treated like a matrix (also known as “im2col process”).
- MCMK Multiple Channel Multiple Kernel
- the im2col operation of the GEMM scheme requires to load and store the image data as well as another memory block for storing the intermediate data.
- a sequential access to the memory is essential for performance in GEMM. That is, when the input is sparse, GEMM sequential access will contain many zeros that result in a plurality of unnecessary memory accesses.
- the convolution operator implemented by the data processing apparatus 100 is configured to determine one or more indexes identifying one or more non zero input feature values of a respective subset of the plurality of input feature values of the input tensor 201 and to determine the respective output feature value of the output tensor 203 using the one or more non-zero input feature values of the subset of the plurality of input feature values identified by the one or more indexes.
- the subset of the plurality of input feature values of the input tensor may be, for instance, a one-dimensional sub-array of the input tensor in case of a pointwise convolution, a two-dimensional sub-array of the input tensor in case of a depthwise convolution or a three-dimensional sub-array of the input tensor in case of a more general convolution.
- Embodiments disclosed herein implement a sparse convolution scheme (also referred to as “SpConv”) that may utilize the sparsity introduced by an activation function to accelerate both grouped and ungrouped convolution operations and reduce their complexity. As will be described in more detail below, a substantial acceleration can be achieved, especially when the activation function introduces significant sparsity levels into the input tensor 201 of the convolution operation.
- both the weights defined by the one or more filter tensors 205 and the input tensor 201 may be encoded in a standard dense layout. Such a dense representation may be more suitable for cases, where the weights defined by the one or more filter tensors 205 are not (sufficiently) sparse, as the convolution outcome is dense.
- embodiments disclosed herein may implement a fast on-the-fly compression scheme using Single Instruction Multiple Data (SIMD) instructions for operating on the input tensor 201 and may perform efficiently a direct convolution.
- SIMD Single Instruction Multiple Data
- embodiments disclosed herein may utilize a linear combination representation for performing the convolution using SIMD instructions without folding and unfolding the input tensor 201, as illustrated in figure 5.
- embodiments disclosed herein may utilize a transposed convolution to facilitate an implementation based on SIMD instructions.
- embodiments disclosed herein utilize locality to find and process non-zero input feature values over the input-channels requiring only a single index corresponding to the non-zero location.
- the convolutional operator of the neural network implemented by the processing circuitry 101 implements a kind of on-the-fly compression scheme that locates the non-zero entries in each pixel, i.e. at each first matrix position using SIMD (single input multiple data) instructions and represents these locations using a single index. More specifically, in an embodiment, the convolutional operator of the neural network implemented by the processing circuitry 101 may read sequential memory of the sparse input and compares this with a vector of zeros. In an embodiment, this comparison may yield a new vector, where the locations with a non-zero input feature value are marked with a 'T bit and the locations with a zero input feature value are marked with a O' bit.
- the convolutional operator of the neural network implemented by the processing circuitry 101 may translate this binary vector into an integer, which indicates the number of non-zeros. Further, in an embodiment, this integer may point to the bytes shuffle required in order to obtain a sequence of non-zero entries (using, e.g., lookup-table). In an embodiment, only the non-zero indexes that will be used in further processing steps are stored to obtain the input feature value and the corresponding filter weight column. As will be appreciated, this on-the- fly compression scheme implemented by the convolutional operator of the neural network may be used for both grouped and ungrouped convolutions.
- the convolutional operator of the neural network implemented by the processing circuitry 101 may utilize a linear combination representation for an 1x1 ungrouped convolution to facilitate sequential memory access to the filter weights, and hence, enables utilizing SIMD (single input multiple data) instructions. More specifically, in an embodiment, the indexes of the non-zero feature values are identical to the column indexes required for the convolution operation, as illustrated in figure 6. Thus, as will be appreciated, in an embodiment the convolutional operator of the neural network implemented by the processing circuitry 101 may efficiently skip multiplication operations involving zero input feature values.
- the convolutional operator of the neural network implemented by the processing circuitry 101 may adopt a transposed convolution technique that facilitates dense multiplications when the input is sparse. More specifically, in an embodiment, the convolutional operator of the neural network implemented by the processing circuitry 101 may be configured to multiple each non-zero input feature value entry by all the kernel weights of the corresponding channel index, while considering its offset shift in the output location, as illustrated in figure 7.
- a processing step 801 of figure 8 given the potentially sparse input tensor 201 in dense format, for each pixel, i.e. first matrix position the non-zero channels (i.e. input feature values) are determined by the convolutional operator of the neural network implemented by the processing circuitry 101 and these indexes are stored in the array nzjdx of size C in .
- the convolutional operator of the neural network implemented by the processing circuitry 101 uses these indexes to linearly combine the filter weight vectors, whose column index and the multiplying coefficient correspond to the stored index.
- a processing step 901 of figure 9 given the potentially sparse input tensor 201 in dense format, for each pixel i.e. first matrix position the non-zero channels (i.e. input feature values) are determined by the convolutional operator of the neural network implemented by the processing circuitry 101 and these indexes are stored in the array nzjdx of size C in . These indexes correspond to the kernel indexes.
- a further processing step 903 of figure 9 for generating the output feature values of the output tensor 203 the convolutional operator of the neural network implemented by the processing circuitry 101 is configured to multiply the non-zero values with the kernel weights (e.g., for a 3x3 grouped convolution there are 9 weights in each kernel), taking into account the respective offsets of the kernel relative to the currently processed pixel, i.e. first matrix position. Furthermore, the results are scattered according to the offset.
- steps 901 and 903 are repeated for each pixel, i.e. each of the plurality of first matrix positions of the input tensor 201 for providing the complete output tensor 203 (see processing steps 905 and 907 of figure 9).
- Figures 10a-c show exemplary source code for implementing different aspects of a convolutional layer of a neural network implemented by the data processing apparatus 100 according to an embodiment. More specifically, figures 10a-c show exemplary source code for implementing the on-the-fly compression scheme already described above that utilizes SIMD instructions for compressing 256 bits vectors.
- the compression relies on a look-up- table, where each entry (key) indicates the required shuffle that would pick non-zero indexes, and the resulting number of non-zeros, corresponding to that key.
- the generation of the look- up-table may be based on the exemplary code shown in figure 10a.
- the convolutional operator of the neural network implemented by the processing circuitry 101 may be configured to compare a vector of input feature values data with a zero vector and to translate this comparison into an integer number (key), where a 'T bit in this number indicates that this index is a non-zero one.
- This number is used as a key in the look-up-table, from which only the bits of interest (i.e., non zero indexes) are shuffled.
- the generation of the vector of non-zero indexes may be based on the exemplary code shown in figure 10b.
- the exemplary code shown in figure 10c may be implemented by the convolutional operator for retrieving the input tensor 201, a pointer to the index-array, and a pointer to the number of non-zero input feature values.
- the input tensor 201 is used to generate the vector which is compared with a zero vector.
- the key that indicates the non-zero indexes is computed, the required shuffle for retrieving non-zero indexes is performed, and the number of non-zeros is updated.
- Figure 11 is a flow diagram illustrating a computer-implemented data processing method 1100 according to an embodiment, which may be implemented by the data processing apparatus 100 or the embodiments thereof described above.
- the computer-implemented data processing method 1100 comprises a first step 1101 of implementing or operating a neural network, wherein the neural network comprises a plurality of processing layers, each processing layer configured to process the input tensor 201 of input feature values into the output tensor 203 of output feature values.
- the plurality of processing layers comprises at least one convolutional layer including a convolution operator, wherein the convolution operator is configured to perform a convolution operation using the one or more filter tensors 205.
- the computer-implemented data processing method 1100 comprises a further step 1103 of determining one or more indexes identifying one or more non-zero input feature values of a subset of the plurality of input feature values.
- the computer-implemented data processing method 1100 comprises a step 1105 of determining the respective output feature value of the output tensor 203 using the one or more non-zero input feature values of the subset of the plurality of input feature values identified by the one or more indexes.
- Figures 12 and 13 show tables illustrating the performance of a neural network implemented by a data processing apparatus and method according to an embodiment in comparison with a conventional neural network.
- embodiments of the convolutional operator of the data processing apparatus 100 have been implemented in C for both ARM64 and intelx86 platforms (columns “SpConvARM64” and “SPConvx86” of the table shown in figure 12) and the latency results thereof are compared with the conventional MNN library and the pytorch-1.8-dev (columns “MNN” and Pytoch 1.8- dev” of the table shown in figure 12), respectively.
- the latency has been estimated for a representative complex example, with an input of 128x128 pixels, each had 64 channels, where the sparsity level was 90%, and convolved it into 128x128 pixels with 128 channels.
- a conventional MobilenetV2 architecture has been chosen, where the conventional convolution has been replaced by the convolution according to one of the embodiments described herein for both an intelx86 and ARM64 implementation, where the convolutions that were consecutive to an activation function obtained 90% sparsity level.
- the neural network implemented by the data processing apparatus 100 provides improved results for both ungrouped as well as grouped convolutions.
- the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
- functional units in the embodiments of the invention may be integrated into one processing unit, or each of the units may exist alone physically, or two or more units are integrated into one unit.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Neurology (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Image Analysis (AREA)
Abstract
Un appareil de traitement de données (100) comprend un circuit de traitement (101) configuré pour mettre en œuvre un réseau neuronal ayant une ou plusieurs couches de traitement configurées pour traiter un tenseur d'entrée de valeurs de caractéristiques d'entrée en un tenseur de sortie de valeurs de caractéristiques de sortie. Les couches de traitement comprennent au moins une couche convolutive comprenant un opérateur de convolution, l'opérateur de convolution étant configuré pour effectuer une opération de convolution du tenseur d'entrée à l'aide d'un ou plusieurs tenseurs de filtre. Pour déterminer une valeur de caractéristique de sortie respective du tenseur de sortie, l'opérateur de convolution est configuré pour déterminer un ou plusieurs indices identifiant une ou plusieurs valeurs de caractéristiques d'entrée non nulles d'un sous-ensemble de valeurs de caractéristiques d'entrée et pour déterminer la valeur de caractéristiques de sortie respective du tenseur de sortie à l'aide de la ou des valeurs de caractéristiques d'entrée non nulles du sous-ensemble des valeurs de caractéristiques d'entrée identifiées par le ou les indices.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/EP2021/070803 WO2023006170A1 (fr) | 2021-07-26 | 2021-07-26 | Dispositifs et procédés d'utilisation de réseaux neuronaux efficaces sur le plan informatique |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/EP2021/070803 WO2023006170A1 (fr) | 2021-07-26 | 2021-07-26 | Dispositifs et procédés d'utilisation de réseaux neuronaux efficaces sur le plan informatique |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023006170A1 true WO2023006170A1 (fr) | 2023-02-02 |
Family
ID=77071582
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2021/070803 WO2023006170A1 (fr) | 2021-07-26 | 2021-07-26 | Dispositifs et procédés d'utilisation de réseaux neuronaux efficaces sur le plan informatique |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2023006170A1 (fr) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140185687A1 (en) * | 2012-12-27 | 2014-07-03 | Pexip AS | Simultaneous and loopless vector calculation of all run-level pairs in video compression |
US20150039852A1 (en) * | 2013-07-31 | 2015-02-05 | Oracle International Corporation | Data compaction using vectorized instructions |
-
2021
- 2021-07-26 WO PCT/EP2021/070803 patent/WO2023006170A1/fr active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140185687A1 (en) * | 2012-12-27 | 2014-07-03 | Pexip AS | Simultaneous and loopless vector calculation of all run-level pairs in video compression |
US20150039852A1 (en) * | 2013-07-31 | 2015-02-05 | Oracle International Corporation | Data compaction using vectorized instructions |
Non-Patent Citations (3)
Title |
---|
A. G. HOWARD ET AL.: "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications", ARXIV.ORG, CORNELL UNIVERSITY LIBRARY, 201 OLIN LIBRARY CORNELL UNIVERSITY ITHACA, NY 14853, 17 April 2017 (2017-04-17), XP080763381 * |
S. KANG ET AL.: "GANPU: An Energy-Efficient Multi-DNN Training Processor for GANs With Speculative Dual-Sparsity Exploitation", IEEE JOURNAL OF SOLID-STATE CIRCUITS, vol. 56, no. 9, 22 April 2021 (2021-04-22), pages 2845 - 2857, XP011874477, ISSN: 0018-9200, [retrieved on 20210825], DOI: 10.1109/JSSC.2021.3066572 * |
SHEN-FU HSIAO ET AL.: "Design of a Sparsity-Aware Reconfigurable Deep Learning Accelerator Supporting Various Types of Operations", IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, IEEE, PISCATAWAY, NJ, USA, vol. 10, no. 3, 11 August 2020 (2020-08-11), pages 376 - 387, XP011810149, ISSN: 2156-3357, [retrieved on 20200918], DOI: 10.1109/JETCAS.2020.3015238 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10096134B2 (en) | Data compaction and memory bandwidth reduction for sparse neural networks | |
US20210224125A1 (en) | Operation Accelerator, Processing Method, and Related Device | |
US20190340488A1 (en) | Compression of kernel data for neural network operations | |
CN107832845A (zh) | 一种信息处理方法及相关产品 | |
WO2018107383A1 (fr) | Procédé et dispositif de calcul de convolution d'un réseau de neurones artificiels, et support d'enregistrement lisible par ordinateur | |
CN105453132B (zh) | 实施图像处理的信息处理设备和图像处理方法 | |
KR20190107766A (ko) | 계산 장치 및 방법 | |
US11604975B2 (en) | Ternary mode of planar engine for neural processor | |
CN111831844A (zh) | 图像检索方法、图像检索装置、图像检索设备及介质 | |
US11822900B2 (en) | Filter processing device and method of performing convolution operation at filter processing device | |
US12079724B2 (en) | Texture unit circuit in neural network processor | |
JP2023541350A (ja) | 表畳み込みおよびアクセラレーション | |
CN114978189A (zh) | 一种数据编码方法以及相关设备 | |
CN112416433A (zh) | 一种数据处理装置、数据处理方法及相关产品 | |
US11853868B2 (en) | Multi dimensional convolution in neural network processor | |
US11630991B2 (en) | Broadcasting mode of planar engine for neural processor | |
JP2024028901A (ja) | ハードウェアにおけるスパース行列乗算 | |
CN116049691A (zh) | 模型转换方法、装置、电子设备和存储介质 | |
CN109740729B (zh) | 运算方法、装置及相关产品 | |
CN112784951A (zh) | Winograd卷积运算方法及相关产品 | |
CN109740730B (zh) | 运算方法、装置及相关产品 | |
WO2023006170A1 (fr) | Dispositifs et procédés d'utilisation de réseaux neuronaux efficaces sur le plan informatique | |
CN111382835B (zh) | 一种神经网络压缩方法、电子设备及计算机可读介质 | |
CN113469333A (zh) | 执行神经网络模型的人工智能处理器、方法及相关产品 | |
CN111125627A (zh) | 用于池化多维矩阵的方法及相关产品 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21746496 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21746496 Country of ref document: EP Kind code of ref document: A1 |