CN114897135A - Arithmetic device, chip, deconvolution method, electronic apparatus, and storage medium - Google Patents

Arithmetic device, chip, deconvolution method, electronic apparatus, and storage medium Download PDF

Info

Publication number
CN114897135A
CN114897135A CN202210474265.XA CN202210474265A CN114897135A CN 114897135 A CN114897135 A CN 114897135A CN 202210474265 A CN202210474265 A CN 202210474265A CN 114897135 A CN114897135 A CN 114897135A
Authority
CN
China
Prior art keywords
target
original
input data
data
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210474265.XA
Other languages
Chinese (zh)
Inventor
勾志宏
胡英俊
徐宁仪
田志仲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Power Tensors Intelligent Technology Co Ltd
Original Assignee
Shanghai Power Tensors Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Power Tensors Intelligent Technology Co Ltd filed Critical Shanghai Power Tensors Intelligent Technology Co Ltd
Priority to CN202210474265.XA priority Critical patent/CN114897135A/en
Publication of CN114897135A publication Critical patent/CN114897135A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Neurology (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Complex Calculations (AREA)

Abstract

The present disclosure relates to an arithmetic device, a chip, a deconvolution arithmetic method, an electronic apparatus, and a storage medium, the device including: the control unit is used for acquiring original parameters of deconvolution operation, determining target parameters of equivalent convolution operation corresponding to the deconvolution operation according to the original parameters of the deconvolution operation, sending the target parameters to the convolution operation unit, and determining the results of the equivalent convolution operation as the results of the deconvolution operation; and the convolution operation unit is used for performing the equivalent convolution operation according to the target parameter and sending the result of the equivalent convolution operation to the control unit.

Description

Arithmetic device, chip, deconvolution method, electronic apparatus, and storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to an arithmetic device, a chip, a deconvolution method, an electronic apparatus, and a storage medium.
Background
In recent years, the development of artificial intelligence technology has been changing day by day, and the artificial intelligence technology has become more and more widely applied in the fields of data identification and processing. Fundamentally, the artificial intelligence technology is established on the basis of convolution, deconvolution, pooling, sampling and the like, and therefore the processing capacity of the basic operations also determines the operation capacity of the artificial intelligence technology.
Taking the deconvolution operation as an example, in the related art, the deconvolution operation can only be implemented on general chips such as a GPU (graphics processing unit), a CPU (central processing unit), and the like, which support matrix operation and random copy storage, but cannot be implemented on other chips, especially on an acceleration chip dedicated to an artificial intelligence technology, which not only greatly reduces the efficiency of the deconvolution operation, but also limits the hardware matching range of the deconvolution operation.
Disclosure of Invention
The present disclosure provides an arithmetic device, a chip, a deconvolution arithmetic method, an electronic apparatus, and a storage medium to solve the drawbacks of the related art.
According to a first aspect of the embodiments of the present disclosure, there is provided an arithmetic device including:
the control unit is used for acquiring original parameters of deconvolution operation, determining target parameters of equivalent convolution operation corresponding to the deconvolution operation according to the original parameters of the deconvolution operation, sending the target parameters to the convolution operation unit, and determining the results of the equivalent convolution operation as the results of the deconvolution operation;
and the convolution operation unit is used for performing the equivalent convolution operation according to the target parameter and sending the result of the equivalent convolution operation to the control unit.
In one embodiment, the raw parameters include at least one of:
original input data, original convolution kernel, original convolution step size, original pad value, and size of output data.
In one embodiment, the target parameter comprises target input data;
the control unit is configured to, when determining a target parameter of equivalent convolution operation corresponding to the deconvolution operation according to the original parameter of the deconvolution operation, specifically: determining the size of target input data of the equivalent convolution operation and the position of each original element in the original input data in the target input data according to the size of the original input data, the size of the original convolution kernel, the original convolution step size, the original filling value and the size of the output data; and adding each original element in the original input data to a corresponding position of the target input data according to the position of each original element in the original input data in the target input data, and adding preset values to other positions in the target input data to obtain the target input data.
In one embodiment, the chip further comprises a vector operation unit and a storage unit;
the control unit is configured to, according to a position of each original element in the original input data in the target input data, add each original element in the original input data to a corresponding position of the target input data, and add a preset value to other positions in the target input data, to obtain the target input data, specifically configured to: according to the size of the target input data, acquiring a target data space with a corresponding size on a storage unit, and initializing a target element at each position in the target data space to the preset value; determining the position of each original element in the target data space according to the position of each original element in the original input data in the target input data; and sending the address of the original input data and the position of each original element in the target data space to the vector operation unit;
the storage unit is used for: storing the target input data;
the vector operation unit is configured to: and acquiring each original element in the original input data according to the address of the original input data, and updating the target element at the corresponding position in the target data space by using each original element.
In an embodiment, when the control unit is configured to send the address of the original input data and the position of each original element in the target data space to the vector operation unit, the control unit is specifically configured to: determining configuration information of the vector operation unit according to the address of the original input data, the address of the target data space and the position of each original element in the target data space, and sending the configuration information to the vector operation unit, wherein the configuration information of the vector operation unit comprises at least one of the following items: inputting a first address, an operation step size and a size value of each data dimension of data, and outputting the first address, the operation step size, the size value of each data dimension and a cycle value of each data dimension of the data;
the vector operation unit is configured to, when acquiring each original element in the original input data according to the address of the original input data and updating the target element at the corresponding position in the target data space using each original element, specifically: and updating the target element of each original element at the position in the target data space to the sum of the corresponding original element and the preset value according to the configuration information.
In one embodiment, the target parameters include a target convolution kernel;
the control unit is configured to, when determining a target parameter of equivalent convolution operation corresponding to the deconvolution operation according to the original parameter of the deconvolution operation, specifically: and performing central symmetry rotation on the data in the original convolution kernel to obtain a target convolution kernel of the equivalent convolution operation.
In one embodiment, the original convolution kernel includes a plurality of channels;
the control unit is configured to, when determining a target parameter of equivalent convolution operation corresponding to the deconvolution operation according to the original parameter of the deconvolution operation, specifically: and respectively carrying out central symmetry rotation on the data of each channel in the original convolution kernel to obtain the data of each channel of the target convolution kernel.
In one embodiment, the target parameters comprise a target convolution step size and/or a target fill value;
the control unit is configured to, when determining a target parameter of equivalent convolution operation corresponding to the deconvolution operation according to the original parameter of the deconvolution operation, specifically: determining the target convolution step length of the equivalent convolution operation as a preset step length; and/or determining the target filling value of the equivalent convolution operation as a preset filling value.
In one embodiment, when the control unit is configured to send the target parameter to the convolution operation unit, the control unit is specifically configured to: configuring the first address of the target input data and the size value of each data dimension, the first address of the target convolution kernel and the size value of each data dimension, the target convolution step length and the target filling value into a register corresponding to the convolution operation unit;
the vector operation unit is specifically configured to, when performing the equivalent convolution operation according to the target parameter: and performing equivalent convolution operation by using the configuration information in the register to obtain the result of the equivalent convolution operation.
According to a second aspect of the embodiments of the present disclosure, there is provided a chip including the arithmetic device of the first aspect.
According to a third aspect of the embodiments of the present disclosure, there is provided a deconvolution operation method, including:
acquiring original parameters of deconvolution operation;
determining a target parameter of equivalent convolution operation corresponding to the deconvolution operation according to the original parameter of the deconvolution operation;
and performing the equivalent convolution operation according to the target parameter by using a convolution operation unit, and determining the result of the equivalent convolution operation as the result of the deconvolution operation.
In one embodiment, the raw parameters include at least one of:
original input data, original convolution kernel, original convolution step size, original pad value, and size of output data.
In one embodiment, the determining, according to the original parameter of the deconvolution operation, a target parameter of an equivalent convolution operation corresponding to the deconvolution operation includes:
determining the size of target input data of the equivalent convolution operation and the position of each original element in the original input data in the target input data according to the size of the original input data, the size of the original convolution kernel, the original convolution step size, the original filling value and the size of the output data;
and adding each original element in the original input data to a corresponding position of the target input data according to the position of each original element in the original input data in the target input data, and adding preset values to other positions in the target input data to obtain the target input data.
In one embodiment, the adding each original element in the original input data to a corresponding position of the target input data according to a position of the original element in the original input data in the target input data, and adding preset values to other positions of the target input data to obtain the target input data includes:
acquiring a target data space with a corresponding size on a shared memory according to the size of the target input data, and initializing a target element at each position in the target data space to the preset value;
determining the position of each original element in the target data space according to the position of each original element in the original input data in the target input data;
and utilizing a vector operation unit to acquire each original element in the original input data according to the address of the original input data, and updating the target element at the corresponding position in the target data space by using each original element.
In one embodiment, the using a vector operation unit, acquiring each original element in the original input data according to an address of the original input data, and updating a target element at a corresponding position in the target data space using each original element, includes:
determining configuration information of the vector operation unit according to the address of the original input data, the address of the target data space, and the position of each original element in the target data space, wherein the configuration information of the vector operation unit includes at least one of: inputting a first address, an operation step size and a size value of each data dimension of data, and outputting the first address, the operation step size, the size value of each data dimension and a cycle value of each data dimension of the data;
and the vector operation unit updates the target element of each original element at the position in the target data space into the sum of the corresponding original element and the preset value according to the configuration information.
In one embodiment, the determining, according to the original parameter of the deconvolution operation, a target parameter of an equivalent convolution operation corresponding to the deconvolution operation includes:
and performing central symmetry rotation on the data in the original convolution kernel to obtain a target convolution kernel of the equivalent convolution operation.
In one embodiment, the original convolution kernel includes a plurality of channels;
the performing central symmetry rotation on the data in the original convolution kernel to obtain the target convolution kernel of the equivalent convolution operation includes:
and respectively carrying out central symmetry rotation on the data of each channel in the original convolution kernel to obtain the data of each channel of the target convolution kernel.
In one embodiment, the determining, according to the original parameter of the deconvolution operation, a target parameter of an equivalent convolution operation corresponding to the deconvolution operation includes:
determining the target convolution step length of the equivalent convolution operation as a preset step length; and/or the presence of a gas in the gas,
and determining the target filling value of the equivalent convolution operation as a preset filling value.
In one embodiment, the performing, by using a convolution operation unit, the equivalent convolution operation according to the target parameter, and determining a result of the equivalent convolution operation as a result of the deconvolution operation includes:
and configuring the initial address and the size value of each data dimension of the target input data, the initial address and the size value of each data dimension of the target convolution kernel, the target convolution step length and the target filling value into a register corresponding to the convolution operation unit, and performing equivalent convolution operation by the convolution operation unit by using configuration information in the register to obtain a result of the equivalent convolution operation.
According to a fourth aspect of embodiments of the present disclosure, there is provided an electronic device, the device comprising a memory for storing computer instructions executable on a processor, the processor being configured to implement the method of the third aspect when executing the computer instructions; alternatively, the first and second electrodes may be,
the apparatus comprises a chip according to the second aspect.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of the first aspect.
According to the embodiment, by setting the control unit, the original parameters of the deconvolution operation can be obtained, the target parameters of the equivalent convolution operation corresponding to the deconvolution operation are determined according to the original parameters of the deconvolution operation, the target parameters are sent to the convolution operation unit, and the result of the equivalent convolution operation is determined as the result of the deconvolution operation; by arranging the convolution operation unit, the equivalent convolution operation can be carried out according to the target parameter, and the result of the equivalent convolution operation is sent to the control unit. The operation device converts the deconvolution operation into corresponding equivalent convolution operation by using the control unit, and then executes the equivalent convolution operation by using the convolution operation unit, so that the convolution operation unit which does not support the deconvolution operation can calculate the operation result of the deconvolution operation, thereby not only improving the efficiency of the deconvolution operation, but also enlarging the hardware matching range of the deconvolution operation, for example, an artificial intelligent special acceleration chip can be used for executing the deconvolution operation, the parallel calculation capability of the acceleration chip is fully exerted, the operator type supported by the acceleration chip is expanded, and the deconvolution operation can be completed on the acceleration chip which does not support the random access capability.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.
Fig. 1 is a schematic structural diagram of an arithmetic unit shown in an exemplary embodiment of the present disclosure;
FIG. 2 is a schematic diagram illustrating a centrosymmetric rotation of a deconvolution kernel in accordance with an exemplary embodiment of the present disclosure;
FIG. 3 is a schematic diagram illustrating a convolution process of target input data according to an exemplary embodiment of the present disclosure;
FIG. 4 is a flow chart illustrating a method of deconvolution operation in accordance with an exemplary embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of an electronic device shown in an embodiment of the present disclosure.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. The word "if," as used herein, may be interpreted as "at … …" or "when … …" or "in response to a determination," depending on the context.
In the related art, when a deep learning framework (e.g., caffe) for performing deconvolution operation is used for deconvolution operation, the backward propagation of convolution is generally called for implementation, that is, the deconvolution operation is divided into two steps of matrix multiplication and col2im function processing, so that hardware is required to support the matrix multiplication and random copy storage capability is provided, and acceleration chips dedicated to artificial intelligence technologies such as AI accelerators lack matrix operation units and do not have random copy storage capability, so that an acceleration core dedicated to artificial intelligence technologies cannot perform deconvolution operation.
In a first aspect, at least one embodiment of the present disclosure provides a computing device, please refer to fig. 1, including: the control unit 101 is configured to obtain an original parameter of a deconvolution operation, determine a target parameter of an equivalent convolution operation corresponding to the deconvolution operation according to the original parameter of the deconvolution operation, send the target parameter to a convolution operation unit, and determine a result of the equivalent convolution operation as a result of the deconvolution operation; a convolution operation unit (MPU)102, configured to perform the equivalent convolution operation according to the target parameter, and send a result of the equivalent convolution operation to the control unit.
The arithmetic device may be an acceleration chip dedicated to artificial intelligence, for example, an AI accelerator, or may be a partial structure of the chip.
Wherein the raw parameters may comprise at least one of: original input data, original convolution kernel, original convolution step size, original pad value, and size of output data. Wherein, the original input data may be an input feature map (input vector) of a deconvolution operation, the original input data may include a first address of the original input data, a size value of each data dimension, and the like, the address may be a storage address of the original input data in a storage unit, the data dimension may include a number (n), a channel (c), a width (w), a height (h), and the like, the original input data may be read according to the first address of the original input data and the size value of each data dimension, for example, the first address of the original input data is an address a, and the sizes of the dimensions of the number, the channel, the width, and the height are 2, 3, and 3, respectively, then the first 3 data from the address a is a first row of data of a first channel of a first feature map, the 4 th to 6 th data from the address a is a second row of data of the first channel of the first feature map, the 7 th to 9 th data from the address A are the third row data of the first channel of the first characteristic diagram, the 10 th to 12 th data from the address A are the first row data of the second channel of the first characteristic diagram, the 13 th to 15 th data from the address A are the second row data of the second channel of the first characteristic diagram, the 16 th to 18 th data from the address A are the third row data of the second channel of the first characteristic diagram, the 19 th to 21 th data from the address A are the first row data of the first channel of the second characteristic diagram, and the like. The original convolution kernel may include a first address of the original convolution kernel, a size of each data dimension, and the like, the address may be a storage address of the original convolution kernel in a storage unit, and the dimensions may include a channel (c), a width (w), a height (h), and the like; the size of the output data may include the size of the output data in each data dimension, such as number (n), channel (c), width (w), and height (h).
The original parameters of the deconvolution operation may be input by a user or generated by an upstream computational process.
The equivalent convolution operation corresponding to the deconvolution operation means a convolution operation having the same operation result as that of the deconvolution operation. The target parameters of the equivalent convolution operation may include target input data, target convolution kernel, target convolution step size, target fill values, and the like.
The target input data may vary with the size, etc., of the original input data and the output data, and thus the generation of the target input data needs to be performed on-line. The weight in the original convolution kernel of the deconvolution operation is a fixed parameter of the deconvolution layer in the neural network, so that the method can be processed off line, namely, a compiler (tool chain) of a chip develops a special function for converting the original convolution kernel into a target convolution kernel, and then the compiler (tool chain) of the chip converts the original convolution kernel (old weight) when compiling the neural network, so that the target convolution kernel (new weight) is generated and stored, and becomes a parameter of the neural network (weight for performing convolution operation by hardware).
In one embodiment, the control unit may determine the target input data based on the original input data, the original convolution step size, the original pad value, and the size of the output data in the original parameters. Further, the arithmetic device may include a storage unit, and the control unit may generate the target input data in the storage unit, and determine a head address of the target input data, a size value in each dimension, and the like.
In another embodiment, the control unit may generate the target convolution kernel from the original convolution kernel. Illustratively, according to a relationship between a convolution operation and an equivalent convolution operation corresponding to the convolution operation, the control unit may perform central symmetry rotation on data in the original convolution kernel to obtain a target convolution kernel of the equivalent convolution operation. It can be understood that, in the case that the original convolution kernel includes a plurality of channels, the data of each channel in the original convolution kernel may be subjected to central symmetry rotation respectively to obtain the data of each channel of the target convolution kernel. Referring to fig. 2, the centrosymmetric rotation of the data can be realized by rotating 180 °.
In yet another embodiment, the control unit may determine a target convolution step size of the equivalent convolution operation as a preset step size. Illustratively, the preset step size may be 1.
In yet another embodiment, the control unit may determine the target padding value of the equivalent convolution operation as a preset padding value. For example, the preset padding value may be 0.
The convolution operation unit is provided with a corresponding register, the data and parameter configuration can be carried out on the register, so that the convolution operation unit reads the data from the storage unit according to the configuration information in the register and carries out convolution operation, and the configuration of the register can be completed by configuring an interface of the register. Based on this, when the controller sends the target parameter to the convolution operation unit, the first address of the target input data and the size value of each data dimension, the first address of the target convolution kernel and the size value of each data dimension, the target convolution step length, and the target fill value may be configured to a register corresponding to the convolution operation unit; and when the convolution operation unit performs the equivalent convolution operation according to the target parameter, the convolution operation unit may perform the equivalent convolution operation by using the configuration information in the register to obtain a result of the equivalent convolution operation.
The convolution operation unit may read the target input data from a memory (e.g., a storage unit) according to a first address of the target input data and a size value of each data dimension, may read a target convolution kernel from a parameter of the neural network according to the first address of the target convolution kernel and the size value of each data dimension, and may convolve the target input data by using the target convolution kernel according to a target convolution step and a target fill value, thereby obtaining a result of the convolution operation.
According to the embodiment, by setting the control unit, the original parameters of the deconvolution operation can be obtained, the target parameters of the equivalent convolution operation corresponding to the deconvolution operation are determined according to the original parameters of the deconvolution operation, the target parameters are sent to the convolution operation unit, and the result of the equivalent convolution operation is determined as the result of the deconvolution operation; by arranging the convolution operation unit, the equivalent convolution operation can be carried out according to the target parameter, and the result of the equivalent convolution operation is sent to the control unit. The operation device converts the deconvolution operation into corresponding equivalent convolution operation by using the control unit, and then executes the equivalent convolution operation by using the convolution operation unit, so that the convolution operation unit which does not support the deconvolution operation can calculate the operation result of the deconvolution operation, thereby not only improving the efficiency of the deconvolution operation, but also enlarging the hardware matching range of the deconvolution operation, for example, an artificial intelligent special acceleration chip can be used for executing the deconvolution operation, the parallel calculation capability of the acceleration chip is fully exerted, the operator type supported by the acceleration chip is expanded, and the deconvolution operation can be completed on the acceleration chip which does not support the random access capability. In some embodiments of the present disclosure, when the control unit is configured to determine, according to the original parameter of the deconvolution operation, a target parameter of an equivalent convolution operation corresponding to the deconvolution operation, the control unit is specifically configured to: determining the size of target input data of the equivalent convolution operation and the position of each original element in the original input data in the target input data according to the size of the original input data, the size of the original convolution kernel, the original convolution step size, the original filling value and the size of the output data; and adding each original element in the original input data to a corresponding position of the target input data according to the position of each original element in the original input data in the target input data, and adding preset values to other positions in the target input data to obtain the target input data.
The following briefly describes the size relationship between the deconvolution operation and its corresponding equivalent convolution operation.
From the previous perspective of the convolution operation, assuming that the size of the input data is i, and the convolution kernel size, convolution step size, and padding value (pad value) are k, s, p, respectively, the size calculation formula of o of the output data of the convolution operation is as follows:
Figure BDA0003624625150000121
assuming that the size of the input data of the deconvolution operation is o, and the convolution kernel size, convolution step size, and padding value (pad value) are also k, s, p, respectively, the size calculation formula of the output data o' of the deconvolution operation is as follows:
o′=(o-1)*s+k-2p+adj
adj=(i+2p-k)%s
based on the above dimensional relationship, the target input data can be obtained in the following manner: firstly, s-1 preset values (for example, 0) are inserted between adjacent original elements of original input data, then p' ═ k-p-1 preset values are filled in edges of all directions, and finally adj preset values are filled in the right side and the lower side, wherein s is an original convolution step size, k is a size value of each channel of an original convolution kernel, p is an original filling value, adj ═ 2 p-k)% s, and o is a size value of each channel of output data. For example, if the size of the original input data of the deconvolution operation is 3 × 3, the size of the original convolution kernel k, the original convolution step s, and the original padding value p are 3, 2, and 1, respectively, and the size of the output data of the deconvolution operation is 6 × 6, then adj ═ 6+2 × 1-3)% 2 ═ 1, fig. 4 shows the convolution process of the target input data obtained under the above parameters, where the lower data is the target input data (i.e., bottom tenor) and the upper data is the output data of the deconvolution operation (i.e., top tenor).
Further, the size of the target input data may be calculated according to the following formula: the size of the original input data marked with the deconvolution operation in each dimension is n × c × h × w, the size of the target input data in each dimension is n × c × h _ itm × w _ itm, the number of rows and columns of 0 inserted is marked as m (m ═ s-1), the number of the preset values filled up and down, left and right are pad _ u, pad _ d, pad _ l and pad _ r, respectively, and w _ itm is w + pad _ l + pad _ r + (w-1) × m, h _ itm ═ h + pad _ u + pad _ d + (h-1) × m.
Further, the location of each original element in the target input data may be determined according to the above-described filling process.
The arithmetic device further includes a storage unit for storing target input data and a vector operation unit (VPU). When the control unit adds each original element in the original input data to a corresponding position of the target input data according to a position of each original element in the original input data in the target input data, and adds a preset value to other positions in the target input data to obtain the target input data, the control unit may:
firstly, according to the size of the target input data, acquiring a target data space with a corresponding size on a storage unit, and initializing a target element at each position in the target data space to the preset value; illustratively, a bottom label of a corresponding size may be applied on the shared memory of the AI chip, and the bottom label is filled with 0 all using the FillZero operator developed on the AI chip.
Next, the position of each original element in the target data space is determined according to the position of each original element in the original input data in the target input data. For example, the relative position relationship between each original element and the head data of the target input data may be converted into the relative position relationship between each original element and the head position of the target data space, thereby being the position of the original element in the target data space.
Finally, the address of the original input data and the position of each original element in the target data space are sent to the vector arithmetic unit. For example, the configuration information of the vector operation unit may be determined according to the address of the original input data, the address of the target data space, and the position of each original element in the target data space, and sent to the vector operation unit, where the configuration information of the vector operation unit includes at least one of the following: the first address of the input data, the operation step size and the size value of each data dimension, and the first address of the output data, the operation step size, the size value of each data dimension and the cycle value of each data dimension.
Based on this, the vector operation unit may acquire each original element in the original input data according to the address of the original input data, and update the target element at the corresponding position in the target data space using each original element. Illustratively, the vector operation unit updates a target element at a position of each original element in the target data space to a sum of the corresponding original element and the preset value according to the configuration information.
Reference may be made to the target input data (i.e., the data located below) shown in fig. 3, where the corresponding original input data is continuously placed 3 × 3 dark-frame data, and the difference between the target input data and the dark-frame data is only that the address intervals of the dimensions are different, but the cycle values size of the dimensions (i.e., the cycle values size in the width dimension are all 3, and the cycle values size in the channel dimension are all 9) is unchanged.
The operations in the example can be performed by a Move operator implemented on the AI chip, where the Move operator is used to copy data on an SRAM (Static Random-Access Memory) of the chip, and copy the data from a location where a src tensor is located to a storage space corresponding to a dst tensor, so as to support discontinuous Access and storage of the data. The vector operation unit may determine an address of each original element according to a head address of the input data and a size value of each data dimension, and since each original element needs to be read and added to the target data space, an operation step size of the input data may be set to 1, and then the vector operation unit may sequentially read each original element according to the head address of the input data and the size value of each data dimension. Taking the target input data in fig. 3 as an example, the cycle value of the width dimension is 3, that is, after a certain row continuously outputs 3 data, the next row can be switched to, and the next output is started from the first data position of the next row, and the cycle value of the channel dimension is 9, that is, after a certain channel continuously outputs 9 data, the next output can be started from the first data position of the first row of the next channel; the operation step length of the output data refers to the distance between adjacent data in each dimension in the channel, taking the target input data in fig. 4 as an example, the step length is 2, then in the width dimension, the distance between two data continuously output in a certain row is 2, and in the height dimension, the distance between two adjacent rows with output data is 2; the size value of a certain data dimension of the output data refers to the number of data positions in the dimension, and the data dimension may be switched by using the size value of the data dimension, taking the target input data in fig. 4 as an example, and the size value of the width dimension is 8, then the first address of a certain row may be increased by 8 to obtain the first address of the next row when switching rows, and the first address of a certain channel may be increased by 64 to obtain the first address of the next channel when switching channels. Therefore, the vector operation unit can sequentially determine each output position in the target input data according to the head address of the output data, the operation step size, the cycle value of each data dimension, and the size value of each data dimension. In summary, the vector operation unit may sequentially read each original data, add the original data to a preset value (e.g. 0) after each time of reading the original data, and output the obtained sum to the corresponding output position to update the preset value at the output position, and when all the original data are read, operated and output, the data in the target data space on the storage unit is the target output data.
In this embodiment, the size of the target output data and the position of each original element in the target input data are first determined, then each original element is added to the corresponding position of the target input data as a target element, and the target elements at other positions of the target input data are set to preset values (for example, 0), so that the target input data is obtained, and the method is simple and accurate. Particularly, the vector operation unit is used for sequentially reading each original element and then performing operation, and outputting an operation result to a target data space which is applied in advance and initializes a target element to be a preset value, so that the method is simple and convenient, and the characteristics and advantages of the vector operation unit are utilized, so that the operation efficiency and accuracy are improved.
The deconvolution operation method provided by the disclosure realizes deconvolution operation on the acceleration chips special for artificial intelligence technologies such as AI chips and the like in a convolution operation mode, does not depend on the functions of general matrix multiplication and random memory read-write, is not limited by deconvolution parameters, can support larger-scale deconvolution operation through the branch of a tool chain, simultaneously exerts the parallel computing capability of the AI chips in the main computation in the execution process, and has high computing efficiency. Based on the method, the deconvolution operator can be constructed in the compiler corresponding to the AI chip, so that the AI chip has a deconvolution function.
In a second aspect, at least one embodiment of the present disclosure provides a chip including the arithmetic device of the first aspect.
In a third aspect, at least one embodiment of the present disclosure provides a deconvolution method, please refer to fig. 4, which shows a flow of the method, including steps S401 to S403.
The method can be applied to the computing device of the first aspect, the chip of the second aspect, or other hardware structures, such as an artificial intelligence dedicated acceleration chip (AI accelerator).
In step S401, the original parameters of the deconvolution operation are acquired.
In step S402, a target parameter of an equivalent convolution operation corresponding to the deconvolution operation is determined according to the original parameter of the deconvolution operation.
In step S403, the equivalent convolution operation is performed according to the target parameter by a convolution operation unit, and a result of the equivalent convolution operation is determined as a result of the deconvolution operation.
In some embodiments of the present disclosure, the raw parameters include at least one of:
original input data, original convolution kernel, original convolution step size, original pad value, and size of output data.
In some embodiments of the present disclosure, the determining, according to the original parameter of the deconvolution operation, a target parameter of an equivalent convolution operation corresponding to the deconvolution operation includes:
determining the size of target input data of the equivalent convolution operation and the position of each original element in the original input data in the target input data according to the size of the original input data, the size of the original convolution kernel, the original convolution step size, the original filling value and the size of the output data;
and adding each original element in the original input data to a corresponding position of the target input data according to the position of each original element in the original input data in the target input data, and adding preset values to other positions in the target input data to obtain the target input data.
In some embodiments of the present disclosure, the adding, according to a position of an original element in the original input data in the target input data, each original element in the original input data to a corresponding position of the target input data, and adding a preset value to other positions of the target input data to obtain the target input data includes:
acquiring a target data space with a corresponding size on a shared memory according to the size of the target input data, and initializing a target element at each position in the target data space to the preset value;
determining the position of each original element in the target data space according to the position of each original element in the original input data in the target input data;
and utilizing a vector operation unit to acquire each original element in the original input data according to the address of the original input data, and updating the target element at the corresponding position in the target data space by using each original element.
In some embodiments of the present disclosure, the using a vector operation unit, obtaining each original element in the original input data according to an address of the original input data, and updating a target element at a corresponding position in the target data space using each original element, includes:
determining configuration information of the vector operation unit according to the address of the original input data, the address of the target data space, and the position of each original element in the target data space, wherein the configuration information of the vector operation unit comprises at least one of the following items: inputting a first address, an operation step size and a size value of each data dimension of data, and outputting the first address, the operation step size, the size value of each data dimension and a cycle value of each data dimension of the data;
and the vector operation unit updates the target element of each original element at the position in the target data space into the sum of the corresponding original element and the preset value according to the configuration information.
In some embodiments of the present disclosure, the determining, according to the original parameter of the deconvolution operation, a target parameter of an equivalent convolution operation corresponding to the deconvolution operation includes:
and performing central symmetry rotation on the data in the original convolution kernel to obtain a target convolution kernel of the equivalent convolution operation.
In some embodiments of the present disclosure, the original convolution kernel includes a plurality of channels;
the performing central symmetry rotation on the data in the original convolution kernel to obtain the target convolution kernel of the equivalent convolution operation includes:
and respectively carrying out central symmetry rotation on the data of each channel in the original convolution kernel to obtain the data of each channel of the target convolution kernel.
In some embodiments of the present disclosure, the determining, according to the original parameter of the deconvolution operation, a target parameter of an equivalent convolution operation corresponding to the deconvolution operation includes:
determining the target convolution step length of the equivalent convolution operation as a preset step length; and/or the presence of a gas in the gas,
and determining the target filling value of the equivalent convolution operation as a preset filling value.
In some embodiments of the present disclosure, the performing, by using a convolution operation unit, the equivalent convolution operation according to the target parameter, and determining a result of the equivalent convolution operation as a result of the deconvolution operation includes:
and configuring the initial address and the size value of each data dimension of the target input data, the initial address and the size value of each data dimension of the target convolution kernel, the target convolution step length and the target filling value into a register corresponding to the convolution operation unit, and performing equivalent convolution operation by the convolution operation unit by using configuration information in the register to obtain a result of the equivalent convolution operation.
The details of the steps involved in the above method have been described in detail in the first aspect for the corresponding parts of the computing device, and are not repeated here.
According to the above embodiment, by obtaining the original parameters of the deconvolution operation, the target parameters of the equivalent convolution operation corresponding to the deconvolution operation can be determined according to the original parameters of the deconvolution operation, and finally, the convolution operation unit can be used to perform the equivalent convolution operation according to the target parameters, and determine the result of the equivalent convolution operation as the result of the deconvolution operation. The method converts the deconvolution operation into corresponding equivalent convolution operation, and then executes the equivalent convolution operation by using the convolution operation unit, so that the convolution operation unit which does not support the deconvolution operation can calculate the operation result of the deconvolution operation, thereby not only improving the efficiency of the deconvolution operation, but also enlarging the hardware matching range of the deconvolution operation, for example, an artificial intelligent special acceleration chip can be used for executing the deconvolution operation, the parallel calculation capability of the acceleration chip is fully exerted, the operator type supported by the acceleration chip is expanded, and the deconvolution operation can be completed on the acceleration chip which does not support the random access and memory capability.
The deconvolution operation method provided by the disclosure realizes deconvolution operation on the acceleration chip special for artificial intelligence technology such as AI chip and the like in a convolution operation mode, does not depend on the functions of general matrix multiplication and random memory reading and writing, is not limited by deconvolution parameters, can support larger-scale deconvolution operation through the cutting branch of a tool chain, simultaneously exerts the parallel computing capability of the AI chip in the execution process, and has high computing efficiency. Based on the method, the deconvolution operator can be constructed in the compiler corresponding to the AI chip, so that the AI chip has a deconvolution function.
In a fourth aspect, at least one embodiment of the present disclosure provides an apparatus, please refer to fig. 5, which illustrates a structure of the apparatus, the apparatus includes a memory for storing computer instructions executable on a processor, and the processor is configured to perform an operation based on the method according to any one of the third aspect when the computer instructions are executed; alternatively, the device comprises a chip according to the second aspect.
In a fifth aspect, at least one embodiment of the disclosure provides a computer readable storage medium having a computer program stored thereon, which when executed by a processor, implements the method of any of the third aspects.
In this disclosure, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The term "plurality" means two or more unless expressly limited otherwise.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice in the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (13)

1. An arithmetic device, comprising:
the control unit is used for acquiring original parameters of deconvolution operation, determining target parameters of equivalent convolution operation corresponding to the deconvolution operation according to the original parameters of the deconvolution operation, sending the target parameters to the convolution operation unit, and determining the results of the equivalent convolution operation as the results of the deconvolution operation;
and the convolution operation unit is used for performing the equivalent convolution operation according to the target parameter and sending the result of the equivalent convolution operation to the control unit.
2. The computing device of claim 1, wherein the raw parameters comprise at least one of:
original input data, original convolution kernel, original convolution step size, original pad value, and size of output data.
3. The computing device of claim 2, wherein the target parameter comprises target input data;
the control unit is configured to, when determining a target parameter of equivalent convolution operation corresponding to the deconvolution operation according to the original parameter of the deconvolution operation, specifically: determining the size of target input data of the equivalent convolution operation and the position of each original element in the original input data in the target input data according to the size of the original input data, the size of the original convolution kernel, the original convolution step size, the original filling value and the size of the output data; and adding each original element in the original input data to a corresponding position of the target input data according to the position of each original element in the original input data in the target input data, and adding preset values to other positions in the target input data to obtain the target input data.
4. The arithmetic device of claim 3, wherein the chip further comprises a vector arithmetic unit and a storage unit;
the control unit is configured to add each original element in the original input data to a corresponding position of the target input data according to a position of each original element in the original input data in the target input data, and add preset values to other positions in the target input data, to obtain the target input data, and specifically configured to: according to the size of the target input data, acquiring a target data space with a corresponding size on a storage unit, and initializing a target element at each position in the target data space to the preset value; determining the position of each original element in the target data space according to the position of each original element in the original input data in the target input data; and sending the address of the original input data and the position of each original element in the target data space to the vector operation unit;
the storage unit is used for: storing the target input data;
the vector operation unit is configured to: and acquiring each original element in the original input data according to the address of the original input data, and updating the target element at the corresponding position in the target data space by using each original element.
5. The arithmetic device according to claim 4, wherein the control unit, when sending the address of the original input data and the position of each original element in the target data space to the vector arithmetic unit, is specifically configured to: determining configuration information of the vector operation unit according to the address of the original input data, the address of the target data space and the position of each original element in the target data space, and sending the configuration information to the vector operation unit, wherein the configuration information of the vector operation unit comprises at least one of the following items: inputting a first address, an operation step size and a size value of each data dimension of data, and outputting the first address, the operation step size, the size value of each data dimension and a cycle value of each data dimension of the data;
the vector operation unit is configured to, when acquiring each original element in the original input data according to the address of the original input data and updating the target element at the corresponding position in the target data space using each original element, specifically: and updating the target element of each original element at the position in the target data space to the sum of the corresponding original element and the preset value according to the configuration information.
6. The arithmetic device of any of claims 2 to 5, wherein the target parameter comprises a target convolution kernel;
the control unit is configured to, when determining a target parameter of equivalent convolution operation corresponding to the deconvolution operation according to the original parameter of the deconvolution operation, specifically: and performing central symmetry rotation on the data in the original convolution kernel to obtain a target convolution kernel of the equivalent convolution operation.
7. The computing device of claim 6, wherein the original convolution kernel comprises a plurality of channels;
the control unit is configured to, when determining a target parameter of equivalent convolution operation corresponding to the deconvolution operation according to the original parameter of the deconvolution operation, specifically: and respectively carrying out central symmetry rotation on the data of each channel in the original convolution kernel to obtain the data of each channel of the target convolution kernel.
8. The arithmetic device of claim 2 wherein the target parameters comprise a target convolution step size and/or a target fill value;
the control unit is configured to, when determining a target parameter of equivalent convolution operation corresponding to the deconvolution operation according to the original parameter of the deconvolution operation, specifically: determining the target convolution step length of the equivalent convolution operation as a preset step length; and/or determining the target filling value of the equivalent convolution operation as a preset filling value.
9. The computing device according to claim 1, wherein the control unit, when sending the target parameter to the convolution computing unit, is specifically configured to: configuring the first address of the target input data and the size value of each data dimension, the first address of the target convolution kernel and the size value of each data dimension, the target convolution step length and the target filling value into a register corresponding to the convolution operation unit;
the vector operation unit is specifically configured to, when performing the equivalent convolution operation according to the target parameter: and performing equivalent convolution operation by using the configuration information in the register to obtain the result of the equivalent convolution operation.
10. A chip comprising the arithmetic device of any one of claims 1 to 9.
11. A method of deconvolution operation, comprising:
acquiring original parameters of deconvolution operation;
determining a target parameter of equivalent convolution operation corresponding to the deconvolution operation according to the original parameter of the deconvolution operation;
and performing the equivalent convolution operation according to the target parameter by using a convolution operation unit, and determining the result of the equivalent convolution operation as the result of the deconvolution operation.
12. An electronic device, comprising a memory for storing computer instructions executable on a processor, the processor for implementing the method of claim 11 when executing the computer instructions; alternatively, the first and second electrodes may be,
the device comprising the chip of claim 10.
13. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of claim 11.
CN202210474265.XA 2022-04-29 2022-04-29 Arithmetic device, chip, deconvolution method, electronic apparatus, and storage medium Pending CN114897135A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210474265.XA CN114897135A (en) 2022-04-29 2022-04-29 Arithmetic device, chip, deconvolution method, electronic apparatus, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210474265.XA CN114897135A (en) 2022-04-29 2022-04-29 Arithmetic device, chip, deconvolution method, electronic apparatus, and storage medium

Publications (1)

Publication Number Publication Date
CN114897135A true CN114897135A (en) 2022-08-12

Family

ID=82719699

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210474265.XA Pending CN114897135A (en) 2022-04-29 2022-04-29 Arithmetic device, chip, deconvolution method, electronic apparatus, and storage medium

Country Status (1)

Country Link
CN (1) CN114897135A (en)

Similar Documents

Publication Publication Date Title
KR101959376B1 (en) Systems and methods for a multi-core optimized recurrent neural network
Agullo et al. Task‐based FMM for heterogeneous architectures
CN110149802A (en) Compiler for being translated between the target hardware with two-dimensional shift array structure in Virtual Image Processor instruction set architecture (ISA)
US10984500B1 (en) Inline image preprocessing for convolution operations using a matrix multiplier on an integrated circuit
Meister et al. Parallel memory-efficient adaptive mesh refinement on structured triangular meshes with billions of grid cells
US10120717B2 (en) Method for optimizing the size of a data subset of a processing space for improved execution performance
CN110516316B (en) GPU acceleration method for solving Euler equation by interrupted Galerkin method
Rafique et al. Communication optimization of iterative sparse matrix-vector multiply on GPUs and FPGAs
CN109726441B (en) Body and surface mixed GPU parallel computing electromagnetism DGTD method
Amorim et al. Comparing CUDA and OpenGL implementations for a Jacobi iteration
JP2021022362A (en) Parallel extraction method of image data in plural convolution windows, device, equipment and computer-readable storage medium
Bernaschi et al. A factored sparse approximate inverse preconditioned conjugate gradient solver on graphics processing units
Mingas et al. Population-based mcmc on multi-core cpus, gpus and fpgas
US6792585B1 (en) Method and apparatus of relative datapath cell placement with structure bonding
CN114897135A (en) Arithmetic device, chip, deconvolution method, electronic apparatus, and storage medium
CN113554164A (en) Neural network model optimization method, neural network model data processing method, neural network model optimization device, neural network model data processing device and storage medium
Ibrahim et al. Implementing Wilson-Dirac operator on the cell broadband engine
Chen et al. GPU-MEME: Using graphics hardware to accelerate motif finding in DNA sequences
Li et al. Accelerating force-directed graph layout with processing-in-memory architecture
Gissler et al. Efficient Uniform Grids for Collision Handling in Medical Simulators.
CN111274023B (en) Data processing method, device, computer system and storage medium
Al-Mouhamed et al. SpMV and BiCG-Stab optimization for a class of hepta-diagonal-sparse matrices on GPU
Schmidtke et al. Chunked bounding volume hierarchies for fast digital prototyping using volumetric meshes
TWI844116B (en) Exploiting data sparsity at a machine-learning hardware accelerator
Markall et al. Accelerating unstructured mesh computational fluid dynamics on the NVidia Tesla GPU architecture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination