WO2021114903A1 - Procédé et appareil de traitement de données, dispositif informatique et support d'enregistrement - Google Patents

Procédé et appareil de traitement de données, dispositif informatique et support d'enregistrement Download PDF

Info

Publication number
WO2021114903A1
WO2021114903A1 PCT/CN2020/123832 CN2020123832W WO2021114903A1 WO 2021114903 A1 WO2021114903 A1 WO 2021114903A1 CN 2020123832 W CN2020123832 W CN 2020123832W WO 2021114903 A1 WO2021114903 A1 WO 2021114903A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
sub
result
convolution
elements
Prior art date
Application number
PCT/CN2020/123832
Other languages
English (en)
Chinese (zh)
Inventor
刘道福
黄迪
周诗怡
Original Assignee
中科寒武纪科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中科寒武纪科技股份有限公司 filed Critical 中科寒武纪科技股份有限公司
Publication of WO2021114903A1 publication Critical patent/WO2021114903A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present disclosure relates to the field of data processing technology, and in particular to a data processing method, device, computer equipment, and storage medium.
  • neural network algorithm is a very popular machine learning algorithm recently, and it has achieved very good results in various fields, such as image recognition, speech recognition, natural language processing, etc.
  • image recognition speech recognition
  • speech recognition natural language processing
  • the complexity of the algorithm is getting higher and higher.
  • the scale of the model is gradually increasing.
  • the embodiments of the present disclosure provide a data processing method, device, computer equipment, and storage medium that can improve hardware energy efficiency ratio, reduce computing time, and improve computing efficiency.
  • a data processing method applied to a processor includes:
  • the preset merge mode is the reverse process of the preset split mode.
  • a data processing device for a processor including:
  • the splitting module is used to split the first data according to a preset splitting method to obtain multiple second data;
  • the convolution module is configured to perform a winograd convolution operation on the second data and the weight for any of the second data to obtain multiple first convolution results;
  • a merging module configured to merge the multiple first convolution results according to a preset merging manner to obtain a hole convolution result of the first data and the weight
  • the preset merge mode is the reverse process of the preset split mode.
  • an artificial intelligence chip is provided, and the chip includes the data processing device as described in any one of the foregoing.
  • an electronic device including the aforementioned artificial intelligence chip.
  • a board card comprising: a storage device, an interface device, a control device, and the aforementioned artificial intelligence chip;
  • the artificial intelligence chip is connected to the storage device, the control device, and the interface device respectively;
  • the storage device is used to store data
  • the interface device is used to implement data transmission between the artificial intelligence chip and external equipment
  • the control device is used to monitor the state of the artificial intelligence chip.
  • an electronic device including:
  • a memory for storing processor executable instructions
  • the processor is configured to call instructions stored in the memory to execute the method described in any one of the foregoing.
  • a computer-readable storage medium having computer program instructions stored thereon, wherein the computer program instructions implement the method described in any one of the foregoing when the computer program instructions are executed by a processor .
  • the multiple obtained second data are respectively subjected to the winograd convolution operation with the weight, and the multiple obtained first convolution results are combined according to the preset
  • the preset combining method is the inverse process of the preset splitting method, the result of the hole convolution of the first data and the weight can be obtained after the combination.
  • the energy efficiency ratio of hardware can be improved, computing time can be reduced, and computing efficiency can be improved.
  • Fig. 1 shows a schematic diagram of an exemplary hole convolution according to the present disclosure
  • Fig. 2 shows a flowchart of a data processing method according to an embodiment of the present disclosure
  • Fig. 3 shows a schematic diagram of a data processing method according to an embodiment of the present disclosure
  • Fig. 4 shows a schematic diagram of a data processing method according to an embodiment of the present disclosure
  • Fig. 5 shows a block diagram of a data processing device according to an embodiment of the present disclosure
  • Figure 6 shows a structural block diagram of a board according to an embodiment of the present disclosure
  • FIG. 7 shows a block diagram of an electronic device 800 according to an embodiment of the present disclosure
  • FIG. 8 shows a block diagram of an electronic device 1900 according to an embodiment of the present disclosure.
  • the term “if” can be interpreted as “when” or “once” or “in response to determination” or “in response to detection” depending on the context.
  • the phrase “if determined” or “if detected [described condition or event]” can be interpreted as meaning “once determined” or “in response to determination” or “once detected [described condition or event]” depending on the context ]” or “in response to detection of [condition or event described]”.
  • Hole convolution can increase the receptive field of convolution, but at the same time it will also affect the hardware energy efficiency ratio and computing time, reduce the hardware energy efficiency ratio and increase the computing time.
  • the present disclosure provides a data processing method.
  • the data processing method of the embodiment of the present disclosure can be applied to a processor, and the processor can be a general-purpose processor, such as a CPU (Central Processing Unit, central processing unit), or an artificial intelligence processor for performing artificial intelligence operations. (IPU).
  • Artificial intelligence operations can include machine learning operations, brain-like operations, and so on. Among them, machine learning operations include neural network operations, k-means operations, support vector machine operations, and so on.
  • the artificial intelligence processor may include, for example, GPU (Graphics Processing Unit), NPU (Neural-Network Processing Unit, neural network processing unit), DSP (Digital Signal Process, digital signal processing unit), field programmable gate array (Field-Programmable Gate Array, FPGA) One or a combination of chips.
  • the present disclosure does not limit the specific types of processors.
  • the processor mentioned in the present disclosure may include multiple processing units, and each processing unit can independently run various tasks assigned to it, such as convolution computing tasks and pooling tasks. Or fully connected tasks, etc.
  • the present disclosure does not limit the processing unit and the tasks run by the processing unit.
  • Fig. 2 shows a flowchart of a data processing method according to an embodiment of the present disclosure. The method may be applied to a processor. As shown in Fig. 2, the method may include:
  • step S21 the first data is split according to a preset split mode to obtain a plurality of second data.
  • the foregoing preset splitting method may be a preset method for splitting the first data.
  • the preset splitting method may split the first data into four second data, for example: For the rows and columns of the data, the first data can be split by the principle of one element apart to obtain four second data. After splitting, all the second data and the weights are the elements in the first convolution result. It is consistent with the elements in the convolution result of the first data and the weight.
  • the plurality of second data may include the first sub-data, the second sub-data, the third sub-data, and the fourth sub-data.
  • the first data is divided in a preset manner. Split to obtain multiple second data, which can include:
  • the element corresponding to the even-numbered row and the even-numbered column in the first data is determined to form the fourth sub-data.
  • the odd-numbered rows in the first data can be determined, and the elements corresponding to the odd-numbered columns in each odd-numbered row can be determined to form the first sub-data, and the elements corresponding to the even-numbered columns in each odd-numbered row can be determined to be the second sub-data.
  • Sub-data; the even-numbered row in the first data can be determined, and the element corresponding to the odd-numbered column in each even-numbered row can be determined to form the third sub-data, and the element corresponding to the even-numbered column in each even-numbered row can be determined to form the fourth sub-data data.
  • elements corresponding to odd columns of odd rows in the first data may be used to form the first sub-data
  • the element corresponding to the odd-numbered column (identified as "3" in the first data shown in Figure 3) constitutes the third sub-data; the element corresponding to the even-numbered column of the even-numbered row in the first data is determined (in the figure
  • the first data shown in 3 is identified as "4"), which constitutes the fourth sub-data.
  • step S22 for any of the second data, a winograd convolution operation is performed on the second data and the weight to obtain multiple first convolution results.
  • the plurality of second data may be respectively subjected to a winograd convolution operation with weights to obtain a plurality of first convolution results.
  • the first sub-data and the weight can be subjected to a winograd convolution operation to obtain the first convolution result corresponding to the first sub-data
  • the second sub-data and the weight can be subjected to winograd convolution.
  • winograd convolution is a convolution acceleration implementation method based on polynomial interpolation algorithm. It passes the two inputs of the convolution operation: the input data and the convolution kernel are divided into a certain size and then linear transformation (winograd positive transformation) is performed respectively, and then the transformed input data and the convolution kernel are subjected to bitwise multiplication, and finally The linear transformation (winograd inverse transformation) is performed again on the result of the bit multiplication to obtain the convolution result equivalent to the original convolution operation.
  • linear transformation winograd positive transformation
  • g represents the convolution kernel
  • G represents the left multiplication positive transformation matrix corresponding to the convolution kernel
  • G T represents the right multiplication positive transformation matrix corresponding to the convolution kernel
  • d represents the input data
  • B represents the right multiplication positive transformation corresponding to the input data Matrix
  • B T represents the left multiplication positive transformation matrix corresponding to the input data
  • represents the bitwise multiplication operation
  • A represents the right multiplication inverse transformation matrix
  • AT represents the left multiplication and inverse transformation matrix.
  • Replacing the original convolution operation by winograd convolution can bring greater benefits in hardware energy efficiency and computing time, and at the same time, higher neural network performance can be achieved without increasing or increasing less hardware overhead.
  • step S23 the multiple first convolution results are merged according to a preset merging manner to obtain a hole convolution result of the first data and the weight value
  • the preset merge mode is the reverse process of the preset split mode.
  • the foregoing preset merging method is the inverse process of the foregoing preset splitting method, that is, the hole convolution result of the first data and the weight obtained after merging is split according to the preset splitting method, and each can be obtained.
  • the first convolution result is the inverse process of the foregoing preset splitting method, that is, the hole convolution result of the first data and the weight obtained after merging is split according to the preset splitting method, and each can be obtained.
  • the first convolution result is the inverse process of the foregoing preset splitting method, that is, the hole convolution result of the first data and the weight obtained after merging is split according to the preset splitting method, and each can be obtained.
  • the foregoing combining the multiple first convolution results according to a preset combining manner to obtain a hole convolution result of the first data and the weight includes:
  • the elements in the first convolution result corresponding to the fourth sub-data are sequentially used as the elements corresponding to the even-numbered rows and the even-numbered columns in the hole convolution result.
  • each row in the first convolution result corresponding to the first sub-data is used as the odd row of the hole convolution result in turn, and each element in each row is used as the odd row of the odd row in the hole convolution result in turn
  • each row in the first convolution result corresponding to the second sub-data is used as an odd row of the hole convolution result in turn
  • each element in each row is used as an even column of the odd row in the hole convolution result in turn.
  • the first convolution result corresponding to the first sub-data is 2*2
  • the first convolution result corresponding to the second sub-data is 1*2
  • the first convolution result corresponding to the fourth sub-data is 1*1.
  • each row of the first convolution result corresponding to the first sub-data Take each row of the first convolution result corresponding to the first sub-data as the odd-numbered column of the odd-numbered row of the hole convolution result (that is, the two columns of the first row in the first convolution result corresponding to the first sub-data).
  • the elements are respectively as the elements in the first column and the third column of the first row of the hole convolution result, and the two elements in the second row are respectively used as the elements in the first column and the third column of the third row of the hole convolution result , Marked as "1" in Figure 4).
  • the elements in the first convolution result corresponding to the second sub-data are sequentially used as the even-numbered columns of the odd rows of the hole convolution result (that is, an element in the first row of the first convolution result corresponding to the second sub-data).
  • the elements in the first convolution result corresponding to the third sub-data are sequentially used as the odd-numbered columns of the even rows of the hole convolution result (that is, an element in the first row of the first convolution result corresponding to the third sub-data).
  • the element in the first convolution result corresponding to the fourth sub-data is sequentially used as the even-numbered column of the even-numbered row of the hole convolution result (that is, an element in the first row of the first convolution result corresponding to the fourth sub-data
  • the elements in the second row and the second column as the result of the hole convolution are identified as "4" in FIG. 4).
  • the multiple obtained second data are respectively subjected to the winograd convolution operation with the weight, and the multiple obtained first convolution results are combined according to the preset
  • the preset combining method is the inverse process of the preset splitting method, the result of the hole convolution of the first data and the weight can be obtained after the combination.
  • the energy efficiency ratio of hardware can be improved, computing time can be reduced, and computing efficiency can be improved.
  • the convolution scale that the processor can handle is: 5*5 neurons and 3*3 weights
  • the processing can only process a convolution of a 5*5 kernel and a 3*3 weight at a time, and output a convolution result.
  • the processor needs to perform 9 operations to complete this hollow convolution.
  • the processor can get 9 through one operation As a result, the hole convolution of the first data and the weight can be completed.
  • the data processing method provided by the present disclosure improves the energy efficiency ratio of hardware, reduces computing time, and improves computing efficiency.
  • the foregoing first data may include neurons and/or gradients.
  • the hole convolution can be performed through the first gradient and the weight of the current convolution layer to determine the second gradient of the next convolution layer.
  • the gradient of the current convolutional layer can be split according to the preset splitting method to obtain four first sub-gradients, and the four first sub-gradients and weights can be subjected to winograd convolution processing respectively to obtain Four convolution results, the four convolution results are combined according to a preset combination method, and the second gradient of the next convolution layer can be obtained.
  • the foregoing first data may include a first neuron and a first gradient
  • the splitting of the first data according to a preset splitting manner to obtain a plurality of second data may include:
  • the first gradient is split according to the preset split mode to obtain a plurality of second gradients.
  • the first neuron can be split according to a preset splitting method to obtain multiple second neurons.
  • the second neuron may include: a first sub-neuron, a second sub-neuron, a third sub-neuron, and a fourth sub-neuron, then it can be determined that the odd-numbered column of the odd-numbered row of the first neuron corresponds to The elements of to form the first sub-neuron, determine the elements corresponding to the odd-numbered rows and the even-numbered columns in the first neuron, form the second sub-neuron, and determine the even-numbered rows in the first neuron Elements corresponding to odd-numbered columns of to form the third sub-neuron, and elements corresponding to even-numbered rows of even-numbered columns in the first neuron are determined to form the fourth sub-neuron.
  • the first gradient can be split according to a preset split mode to obtain multiple second gradients.
  • the second gradient may include: a first sub-gradient, a second sub-gradient, a third sub-gradient, and a fourth sub-gradient, then the elements corresponding to the odd-numbered columns of the odd-numbered rows of the first gradient may be determined to form the The first sub-gradient determines the elements corresponding to the odd-numbered rows and the even-numbered columns in the first gradient to form the second sub-gradient, and determines the elements corresponding to the odd-numbered columns in the even-numbered rows in the first gradient to form the The third sub-gradient determines the elements corresponding to the even-numbered rows and the even-numbered columns in the first gradient to form the fourth sub-gradient.
  • the above method may further include:
  • the parity property of the row and column corresponding to the position of the element in the second neuron in the first neuron corresponds to the position of the element in the corresponding second gradient in the first gradient
  • the parity of the rows and columns is consistent.
  • the parity properties of the row and column corresponding to the position of the element in the second gradient corresponding to the second neuron in the first gradient correspond to the position of the element in the second neuron in the first neuron
  • the rows and columns of is consistent with the parity properties, for example: all elements in the second neuron are in odd rows and odd columns in the first neuron, then all elements in the second gradient corresponding to the second neuron are in the first gradient All elements in the second neuron are in odd rows and odd columns; or, all elements in the second neuron are in odd rows and even columns in the first neuron, then all elements in the second gradient corresponding to the second neuron are in the first gradient Are located in odd rows and even columns, or all elements in the second neuron are located in even rows and odd columns in the first neuron, then all elements in the second gradient corresponding to the second neuron are in the first gradient.
  • all elements in the second neuron are in even-numbered rows and even-numbered columns in the first neuron, then all elements in the second gradient corresponding to the second neuron are located in the first gradient Even rows and even columns.
  • the second neuron includes: a first sub-neuron, a second sub-neuron, a third sub-neuron, and a fourth sub-neuron
  • the second gradient includes: the first sub-gradient, The second sub-gradient, the third sub-gradient and the fourth sub-gradient
  • the first sub-neuron corresponds to the first sub-gradient
  • the first sub-neuron performs convolution processing with the first sub-gradient to obtain the first sub-neuron
  • the second sub-neuron corresponds to the second sub-gradient
  • the second sub-neuron performs convolution processing with the second sub-gradient to obtain the convolution result corresponding to the second sub-neuron
  • the third sub-neuron The third sub-neuron corresponds to the third sub-gradient, and the third sub-neuron performs convolution processing with the third sub-gradient to obtain the convolution result corresponding to the third sub-neuron
  • the third convolution result corresponding to each second neuron is obtained, the third convolution result corresponding to each second neuron is added, and the obtained sum is determined as the residual of the weight.
  • the energy efficiency ratio of the hardware can be improved, the calculation time can be reduced, and the calculation efficiency can be improved.
  • the above method may further include:
  • the weight value is adjusted according to the residual error of the weight value.
  • the weight of the current convolutional layer can be adjusted according to the residual of the weight. For example, it is determined that the sum of the residual of the weight and the weight is the new weight.
  • performing a winograd convolution operation on the second data and the weight to obtain multiple first convolution results may include:
  • the winograd inverse transform of the result of the alignment multiplication is disassembled into a summation operation to obtain a first convolution result of the second data and the weight.
  • the above-mentioned disassembling the winograd forward transformation of the second data into a summation operation, and performing calculation to obtain the winograd forward transformation result of the second data may include:
  • the number of the plurality of first sub-tensors is the same as the number of non-zero elements of the second data, and each first sub-tensor in the plurality of first sub-tensors has an element with all the elements The elements at the corresponding positions in the second data are the same, and other elements are all 0.
  • the second data is a 4 ⁇ 4 matrix including 16 elements. Therefore, the second data can be decomposed into 16 first sub-tensors.
  • the 16 first sub-tensors are:
  • each first sub-tensor is the same as the element at the corresponding position in the second data, and other elements are all 0.
  • the element of is the same as the element at the position of the second data in the first row and first column, and the other elements are all 0, and the other first subtensors also have the same attributes.
  • the above disassembly methods are only some examples of the present disclosure, and do not limit the present disclosure in any way.
  • the number of first sub-tensors obtained by the disassembly It may be less than the number of elements of the second data.
  • the number of multiple first sub-tensors is the same as the number of non-zero elements of the second data.
  • performing winograd forward transformation on the multiple first subtensors and summing them to obtain the winograd forward transformation result of the second data may include the following process:
  • the winograd positive transformation result of the first sub-tensor corresponding to the first sub-tensor is: the value of the element at the first position in the first sub-tensor Is 1, where the position of the first position in the first sub-tensor is the same as the position of the non-zero element in the first sub-tensor;
  • winograd positive transformation results of the multiple first subtensors are added to obtain the winograd positive transformation result of the second data.
  • the first sub-tensor corresponding to d 00 can be In other words, the first sub-tensor is to extract the values of non-zero elements in the first sub-tensor, and the values of non-zero elements can be used as coefficients of the first sub-tensor.
  • the winograd positive transformation result of the first sub-tensor corresponding to the first sub-tensor can be obtained in advance through the following process: For each first sub-tensor, the first sub-tensor corresponding to the first sub-tensor The left side of the sub-tensor is multiplied by the positive transformation and the left-multiplied matrix, and the right is multiplied by the positive transformation.
  • the form of the corresponding first-element sub-tensor is determined, and the corresponding positive transformation left-multiplication matrix and forward transformation right-multiplication matrix are also determined.
  • the winograd positive transformation result of the first sub-tensor can be calculated in advance, and the specific process is as described above.
  • the winograd positive transformation result of the corresponding first-element sub-tensor is:
  • the winograd positive transformation result of the corresponding first-element sub-tensor is:
  • the matrix multiplication operation can be broken down into an addition operation.
  • the process of calculating the winograd forward transformation result of the first element sub-tensor involves more multiplication operations.
  • the pre-calculated winograd forward transformation results of the first element subtensor of various scales can be saved. In this way, in the actual calculation process, it can be directly obtained without repeated calculations, thereby shortening calculation time and saving calculation resources.
  • the value of the non-zero element in the first sub-tensor can be multiplied by the winograd positive transformation result of the corresponding first sub-tensor, then The winograd positive transformation result of the first subtensor can be obtained.
  • the corresponding winograd positive transformation result is:
  • the winograd positive transformation results of all the first sub-tensors are calculated through the above process, and the winograd positive transformation results of multiple first sub-tensors are added to obtain the winograd positive transformation results of the second input data.
  • the foregoing disassembling the winograd positive transformation of the weight into a summation operation, and performing calculation to obtain the winograd positive transformation result of the weight may include:
  • the number of the plurality of second sub-tensors is the same as the number of elements of the weight, and each second sub-tensor in the plurality of second sub-tensors has an element that is the same as the number of elements in the weight.
  • the elements of the corresponding positions are the same, and the other elements are all 0.
  • the weight core is disassembled into multiple second sub-tensors, and winograd positive transformation is performed on the multiple second sub-tensors and summed to obtain the winograd positive transformation result of the weight.
  • a matrix with a weight of 3 ⁇ 3 includes 9 elements. Therefore, the weight can be decomposed into 9 second sub-tensors.
  • the 9 second sub-tensors are:
  • each second subtensor is the same as the element at the corresponding position in the weight, and the other elements are all zero.
  • the process of performing winograd forward transformation on the plurality of second sub-tensors and summing to obtain the winograd positive transformation result of the weights can refer to the foregoing process of performing winograd forward transformation on the plurality of first sub-tensors and summing to obtain the second data
  • the process of winograd transforming the result is not repeated here in this disclosure.
  • an alignment multiplication operation of the winograd forward transformation result of the second data and the weighted winograd forward transformation result can be performed to obtain the alignment multiplication. result.
  • bitwise multiplication may refer to the data obtained by multiplying the data at the corresponding positions of the two tensors as the value of the corresponding position in the bitwise multiplication result.
  • the present disclosure can divide A T (G 4 ⁇ 4 ⁇ D 4 ⁇ 4 )A
  • the solution is a summation operation, and calculation is performed to obtain the winograd convolution result of the second data, thereby further saving calculation time and reducing energy consumption.
  • the above-mentioned disassembling the winograd inverse transform of the alignment multiplication result into a summation operation to obtain the first convolution result of the second data and the weight may include:
  • the result of the alignment multiplication is disassembled into a plurality of third sub-tensors, and the inverse winograd transformation is performed on the plurality of third sub-tensors and summed to obtain the first data of the second data and the weight.
  • Convolution result
  • the number of the plurality of third sub-tensors is the same as the number of non-zero elements of the result of the alignment multiplication, and each third sub-tensor in the plurality of third sub-tensors has one element It is the same as the element at the corresponding position in the alignment multiplication result, and other elements are all 0.
  • the result of the alignment multiplication is disassembled into multiple third sub-tensors, for example, it can be disassembled into 16, and the 16 third sub-tensors are:
  • winograd inverse transformation may be performed on the plurality of third sub-tensors and summed to obtain the first convolution result of the second data.
  • performing winograd inverse transformation on the multiple third subtensors and summing them to obtain the first convolution result of the second data may include the following process:
  • the third-element sub-tensor corresponding to the third sub-tensor is: the value of the element at the second position in the third-element sub-tensor Is 1, where the position of the second position in the third sub-tensor is the same as the position of the non-zero element in the third sub-tensor;
  • the winograd inverse transform results of the multiple third subtensors are added to obtain the first convolution result of the second data.
  • the method for determining the third-element sub-tensor corresponding to the third sub-tensor is the same as the method for determining the first-element sub-tensor above, and will not be repeated here.
  • the winograd inverse transform result of the third sub-tensor is obtained in advance through the following process: For each third sub-tensor, the left side of the third sub-tensor corresponding to the third sub-tensor is multiplied by the inverse transform Multiplying the matrix on the left, multiplying by the inverse transformation on the right, and multiplying the matrix on the right to get the winograd inverse transformation result of the third-element subtensor.
  • the form of the corresponding third-element sub-tensor is determined, and the corresponding inverse transform left multiplication matrix and inverse transform right multiplication matrix are also determined. Therefore, the winograd inverse transformation result of the third-element subtensor can be calculated in advance, and the specific process is as described above.
  • the left multiplication matrix of the inverse transformation is a 2 ⁇ 4 matrix, for example:
  • the inverse transformation right multiplication matrix is a 4 ⁇ 2 matrix, for example:
  • the dimension of the inverse transformation matrix can be determined according to the dimension of the second data, the dimension of the weight value, and the convolution step length. The above is only an example and does not limit the present disclosure in any way.
  • the inverse transformation matrix is given by Therefore, the matrix multiplication operation of the inverse transformation can be broken down into addition and shift operations. Multiply the inverse transformation matrix by the third-element sub-tensor to obtain the winograd inverse transformation result of the third-element sub-tensor.
  • the element value in the winograd inverse transformation result of the third-element sub-tensor is The score can be calculated by a simple shift operation, which can still save calculation time compared to the multiplication operation.
  • the winograd inverse transform result of multiplying the element values of the third subtensor that are not 0 as coefficients by the corresponding third subtensor is obtained, and the multiple third subtensors
  • the winograd positive transformation result of the first sub-tensor is obtained, and the winograd positive transformation results of multiple first sub-tensors are added to obtain the winograd positive transformation result of the input data, but the third element
  • the result of the winograd inverse transformation of tensor is not completely composed of 0 and ⁇ 1, but the score can be calculated by a simple shift operation. Compared with the multiplication operation, the present disclosure can still save calculation time after disassembling the ordinary inverse transformation process
  • multiple third sub-tensors are obtained by disassembling the bit multiplication results, and the winograd inverse transform results of the third-element sub-tensors corresponding to the third sub-tensors obtained in advance and The non-zero element values of the third subtensor can be summed to obtain the first convolution result of the second data.
  • disassembling the multiplication operation into a summation operation can save calculation time and reduce energy consumption.
  • steps in the flowcharts of FIGS. 1-4 are displayed in sequence as indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless specifically stated in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least some of the steps in Figures 1-4 may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. These sub-steps or stages The execution order of is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
  • Fig. 5 shows a block diagram of a data processing device according to an embodiment of the present disclosure. As shown in Figure 5, the device may include:
  • the splitting module 501 can be used to split the first data according to a preset splitting manner to obtain multiple second data;
  • the convolution module 502 can be used to perform a winograd convolution operation on any of the second data and the weight to obtain multiple first convolution results;
  • the merging module 503 may be used to merge the multiple first convolution results according to a preset merging manner to obtain a hole convolution result of the first data and the weight value,
  • the preset merge mode is the reverse process of the preset split mode.
  • the multiple obtained second data are respectively subjected to the winograd convolution operation with the weight, and the multiple obtained first convolution results are combined according to the preset
  • the preset combining method is the inverse process of the preset splitting method, the result of the hole convolution of the first data and the weight can be obtained after the combination.
  • the energy efficiency ratio of hardware can be improved, computing time can be reduced, and computing efficiency can be improved.
  • the plurality of second data includes first sub-data, second sub-data, third sub-data, and fourth sub-data, and the above-mentioned splitting module may also be used for:
  • the element corresponding to the even-numbered row and the even-numbered column in the first data is determined to form the fourth sub-data.
  • the above-mentioned merging module can also be used for:
  • the elements in the first convolution result corresponding to the fourth sub-data are sequentially used as the elements corresponding to the even rows and the even columns in the hole convolution result.
  • the foregoing first data may include neurons and/or gradients.
  • the above convolution module can also be used for:
  • the winograd inverse transform of the result of the alignment multiplication is disassembled into a summation operation to obtain a first convolution result of the second data and the weight.
  • the above convolution module can also be used for:
  • the number of the plurality of first sub-tensors is the same as the number of non-zero elements of the second data, and each first sub-tensor in the plurality of first sub-tensors has an element with all the elements The elements at the corresponding positions in the second data are the same, and other elements are all 0.
  • the above convolution module can also be used for:
  • the number of the plurality of second sub-tensors is the same as the number of elements of the weight, and each second sub-tensor in the plurality of second sub-tensors has an element that is the same as the number of elements in the weight.
  • the elements of the corresponding positions are the same, and the other elements are all 0.
  • the above convolution module can also be used for:
  • the result of the alignment multiplication is disassembled into a plurality of third sub-tensors, and the inverse winograd transformation is performed on the plurality of third sub-tensors and summed to obtain the first data of the second data and the weight.
  • Convolution result
  • the number of the plurality of third sub-tensors is the same as the number of non-zero elements of the result of the alignment multiplication, and each third sub-tensor in the plurality of third sub-tensors has one element It is the same as the element at the corresponding position in the alignment multiplication result, and other elements are all 0.
  • the first data includes a first neuron and a first gradient
  • the above-mentioned splitting module may also be used for:
  • the first gradient is split according to the preset split mode to obtain a plurality of second gradients.
  • the device may further include:
  • a processing module configured to perform a convolution operation on any of the second neuron and the corresponding second gradient to obtain a third convolution result
  • a determining module configured to determine that the sum of the third convolution results corresponding to each of the second neurons is the residual of the weight
  • the parity property of the row and column corresponding to the position of the element in the second neuron in the first neuron corresponds to the position of the element in the corresponding second gradient in the first gradient
  • the parity of the rows and columns is consistent.
  • the device may further include:
  • the adjustment module is configured to adjust the weight value according to the residual error of the weight value.
  • the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
  • the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
  • the foregoing device embodiments are only illustrative, and the device of the present disclosure may also be implemented in other ways.
  • the division of units/modules in the above-mentioned embodiments is only a logical function division, and there may be other division methods in actual implementation.
  • multiple units, modules or components may be combined or integrated into another system, or some features may be omitted or not implemented.
  • the functional units/modules in the various embodiments of the present disclosure may be integrated into one unit/module, or each unit/module may exist alone physically, or two or more units/modules may exist.
  • the modules are integrated together.
  • the above-mentioned integrated unit/module can be realized in the form of hardware or software program module.
  • the hardware may be a digital circuit, an analog circuit, and so on.
  • the physical realization of the hardware structure includes but is not limited to transistors, memristors and so on.
  • the artificial intelligence processor may be any appropriate hardware processor, such as CPU, GPU, FPGA, DSP, ASIC, and so on.
  • the storage unit may be any suitable magnetic storage medium or magneto-optical storage medium, such as RRAM (Resistive Random Access Memory), DRAM (Dynamic Random Access Memory), Static random access memory SRAM (Static Random-Access Memory), enhanced dynamic random access memory EDRAM (Enhanced Dynamic Random Access Memory), high-bandwidth memory HBM (High-Bandwidth Memory), hybrid storage cube HMC (Hybrid Memory Cube), etc. Wait.
  • RRAM Resistive Random Access Memory
  • DRAM Dynamic Random Access Memory
  • Static random access memory SRAM Static Random-Access Memory
  • enhanced dynamic random access memory EDRAM Enhanced Dynamic Random Access Memory
  • high-bandwidth memory HBM High-Bandwidth Memory
  • hybrid storage cube HMC Hybrid Memory Cube
  • the integrated unit/module is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer readable memory.
  • the technical solution of the present disclosure essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory, A number of instructions are included to enable a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the various embodiments of the present disclosure.
  • the aforementioned memory includes: U disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.
  • an artificial intelligence chip is also disclosed, which includes the above-mentioned data processing device.
  • a board card which includes a storage device, an interface device, a control device, and the above-mentioned artificial intelligence chip; wherein the artificial intelligence chip is related to the storage device and the control device.
  • the interface devices are respectively connected; the storage device is used to store data; the interface device is used to realize data transmission between the artificial intelligence chip and an external device; the control device is used to The state of the artificial intelligence chip is monitored.
  • Fig. 6 shows a structural block diagram of a board card according to an embodiment of the present disclosure.
  • the board card may include other supporting components in addition to the chip 389 described above.
  • the supporting components include, but are not limited to: a storage device 390, Interface device 391 and control device 392;
  • the storage device 390 is connected to the artificial intelligence chip through a bus for storing data.
  • the storage device may include multiple groups of storage units 393. Each group of the storage unit and the artificial intelligence chip are connected through a bus. It can be understood that each group of the storage units may be DDR SDRAM (English: Double Data Rate SDRAM, double-rate synchronous dynamic random access memory).
  • the storage device may include 4 groups of the storage units. Each group of the storage unit may include a plurality of DDR4 particles (chips).
  • the artificial intelligence chip may include four 72-bit DDR4 controllers. In the 72-bit DDR4 controller, 64 bits are used for data transmission and 8 bits are used for ECC verification. It can be understood that when DDR4-3200 particles are used in each group of the storage units, the theoretical bandwidth of data transmission can reach 25600MB/s.
  • each group of the storage unit includes a plurality of double-rate synchronous dynamic random access memories arranged in parallel.
  • DDR can transmit data twice in one clock cycle.
  • a controller for controlling the DDR is provided in the chip for controlling the data transmission and data storage of each storage unit.
  • the interface device is electrically connected with the artificial intelligence chip.
  • the interface device is used to implement data transmission between the artificial intelligence chip and an external device (such as a server or a computer).
  • the interface device may be a standard PCIE interface.
  • the data to be processed is transferred from the server to the chip through a standard PCIE interface to realize data transfer.
  • the interface device may also be other interfaces. The present disclosure does not limit the specific manifestations of the above other interfaces, as long as the interface unit can realize the switching function.
  • the calculation result of the artificial intelligence chip is still transmitted by the interface device back to an external device (such as a server).
  • the control device is electrically connected with the artificial intelligence chip.
  • the control device is used to monitor the state of the artificial intelligence chip.
  • the artificial intelligence chip and the control device may be electrically connected through an SPI interface.
  • the control device may include a single-chip microcomputer (Micro Controller Unit, MCU).
  • MCU Micro Controller Unit
  • the artificial intelligence chip may include multiple processing chips, multiple processing cores, or multiple processing circuits, and can drive multiple loads. Therefore, the artificial intelligence chip can be in different working states such as multi-load and light-load.
  • the control device can realize the regulation and control of the working states of multiple processing chips, multiple processing and or multiple processing circuits in the artificial intelligence chip.
  • an electronic device which includes the aforementioned artificial intelligence chip.
  • Electronic equipment includes data processing devices, robots, computers, printers, scanners, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cloud servers, cameras, cameras, projectors, watches, headsets , Mobile storage, wearable devices, vehicles, household appliances, and/or medical equipment.
  • the transportation means include airplanes, ships, and/or vehicles;
  • the household appliances include TVs, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods;
  • the medical equipment includes nuclear magnetic resonance, B-ultrasound and/or electrocardiograph.
  • the embodiments of the present disclosure also provide a computer-readable storage medium on which computer program instructions are stored, and the computer program instructions implement the above-mentioned method when executed by a processor.
  • the computer-readable storage medium may be a non-volatile computer-readable storage medium.
  • An embodiment of the present disclosure also provides an electronic device, including: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to call the instructions stored in the memory to execute the above method.
  • the electronic device can be provided as a terminal, server or other form of device.
  • FIG. 7 shows a block diagram of an electronic device 800 according to an embodiment of the present disclosure.
  • the electronic device 800 may be a mobile phone, a computer, a digital broadcasting terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, and other terminals.
  • the electronic device 800 may include one or more of the following components: a processing component 802, a memory 804, a power supply component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, and a sensor component 814 , And communication component 816.
  • the processing component 802 generally controls the overall operations of the electronic device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations.
  • the processing component 802 may include one or more processors 820 to execute instructions to complete all or part of the steps of the foregoing method.
  • the processing component 802 may include one or more modules to facilitate the interaction between the processing component 802 and other components.
  • the processing component 802 may include a multimedia module to facilitate the interaction between the multimedia component 808 and the processing component 802.
  • the memory 804 is configured to store various types of data to support operations in the electronic device 800. Examples of these data include instructions for any application or method to operate on the electronic device 800, contact data, phone book data, messages, pictures, videos, etc.
  • the memory 804 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable and Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Magnetic Disk or Optical Disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM erasable and Programmable Read Only Memory
  • PROM Programmable Read Only Memory
  • ROM Read Only Memory
  • Magnetic Memory Flash Memory
  • Magnetic Disk Magnetic Disk or Optical Disk.
  • the power supply component 806 provides power for various components of the electronic device 800.
  • the power supply component 806 may include a power management system, one or more power supplies, and other components associated with the generation, management, and distribution of power for the electronic device 800.
  • the multimedia component 808 includes a screen that provides an output interface between the electronic device 800 and the user.
  • the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from the user.
  • the touch panel includes one or more touch sensors to sense touch, sliding, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure related to the touch or slide operation.
  • the multimedia component 808 includes a front camera and/or a rear camera. When the electronic device 800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front camera and rear camera can be a fixed optical lens system or have focal length and optical zoom capabilities.
  • the audio component 810 is configured to output and/or input audio signals.
  • the audio component 810 includes a microphone (MIC), and when the electronic device 800 is in an operation mode, such as a call mode, a recording mode, and a voice recognition mode, the microphone is configured to receive an external audio signal.
  • the received audio signal may be further stored in the memory 804 or transmitted via the communication component 816.
  • the audio component 810 further includes a speaker for outputting audio signals.
  • the I/O interface 812 provides an interface between the processing component 802 and a peripheral interface module.
  • the above-mentioned peripheral interface module may be a keyboard, a click wheel, a button, and the like. These buttons may include, but are not limited to: home button, volume button, start button, and lock button.
  • the sensor component 814 includes one or more sensors for providing the electronic device 800 with various aspects of state evaluation.
  • the sensor component 814 can detect the on/off status of the electronic device 800 and the relative positioning of the components.
  • the component is the display and the keypad of the electronic device 800.
  • the sensor component 814 can also detect the electronic device 800 or the electronic device 800.
  • the position of the component changes, the presence or absence of contact between the user and the electronic device 800, the orientation or acceleration/deceleration of the electronic device 800, and the temperature change of the electronic device 800.
  • the sensor component 814 may include a proximity sensor configured to detect the presence of nearby objects when there is no physical contact.
  • the sensor component 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications.
  • the sensor component 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
  • the communication component 816 is configured to facilitate wired or wireless communication between the electronic device 800 and other devices.
  • the electronic device 800 can access a wireless network based on a communication standard, such as WiFi, 2G, or 3G, or a combination thereof.
  • the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component 816 further includes a near field communication (NFC) module to facilitate short-range communication.
  • the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID radio frequency identification
  • IrDA infrared data association
  • UWB ultra-wideband
  • Bluetooth Bluetooth
  • the electronic device 800 may be implemented by one or more application-specific integrated circuits (ASIC), digital signal processors (DSP), digital signal processing devices (DSPD), programmable logic devices (PLD), field-available A programmable gate array (FPGA), controller, microcontroller, microprocessor, or other electronic components are implemented to implement the above methods.
  • ASIC application-specific integrated circuits
  • DSP digital signal processors
  • DSPD digital signal processing devices
  • PLD programmable logic devices
  • FPGA field-available A programmable gate array
  • controller microcontroller, microprocessor, or other electronic components are implemented to implement the above methods.
  • a non-volatile computer-readable storage medium such as the memory 804 including computer program instructions, which can be executed by the processor 820 of the electronic device 800 to complete the foregoing method.
  • FIG. 8 shows a block diagram of an electronic device 1900 according to an embodiment of the present disclosure.
  • the electronic device 1900 may be provided as a server.
  • the electronic device 1900 includes a processing component 1922, which further includes one or more processors, and a memory resource represented by a memory 1932 for storing instructions executable by the processing component 1922, such as application programs.
  • the application program stored in the memory 1932 may include one or more modules each corresponding to a set of instructions.
  • the processing component 1922 is configured to execute instructions to perform the above-described methods.
  • the electronic device 1900 may also include a power supply component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input output (I/O) interface 1958 .
  • the electronic device 1900 can operate based on an operating system stored in the memory 1932, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or the like.
  • a non-volatile computer-readable storage medium is also provided, such as the memory 1932 including computer program instructions, which can be executed by the processing component 1922 of the electronic device 1900 to complete the foregoing method.
  • the preset merge mode is the reverse process of the preset split mode.
  • the plurality of second data includes the first sub-data, the second sub-data, the third sub-data, and the fourth sub-data, and the first data is split in a preset manner Split to get multiple second data, including:
  • the element corresponding to the even-numbered row and the even-numbered column in the first data is determined to form the fourth sub-data.
  • the combining the multiple first convolution results in a preset combining manner to obtain the hole convolution result of the first data and the weight includes:
  • the elements in the first convolution result corresponding to the fourth sub-data are sequentially used as the elements corresponding to the even rows and the even columns in the hole convolution result.
  • the first data includes neurons and/or gradients.
  • Clause A5 for any of the second data, perform a winograd convolution operation on the second data and the weight to obtain multiple first convolution results ,include:
  • the winograd inverse transform of the result of the alignment multiplication is disassembled into a summation operation to obtain a first convolution result of the second data and the weight.
  • the disassembling the winograd positive transformation of the second data into a summation operation, and performing calculation to obtain the winograd positive transformation result of the second data includes:
  • the number of the plurality of first sub-tensors is the same as the number of non-zero elements of the second data, and each first sub-tensor in the plurality of first sub-tensors has an element with all the elements The elements at the corresponding positions in the second data are the same, and other elements are all 0.
  • the disassembling the winograd positive transformation of the weight into a summation operation, and performing calculation to obtain the winograd positive transformation result of the weight includes:
  • the number of the plurality of second sub-tensors is the same as the number of elements of the weight, and each second sub-tensor in the plurality of second sub-tensors has an element that is the same as the number of elements in the weight.
  • the elements of the corresponding positions are the same, and the other elements are all 0.
  • the result of the alignment multiplication is disassembled into a plurality of third sub-tensors, and the inverse winograd transformation is performed on the plurality of third sub-tensors and summed to obtain the first data of the second data and the weight.
  • Convolution result
  • the number of the plurality of third sub-tensors is the same as the number of non-zero elements of the result of the alignment multiplication, and each third sub-tensor in the plurality of third sub-tensors has one element It is the same as the element at the corresponding position in the alignment multiplication result, and other elements are all 0.
  • the first data includes a first neuron and a first gradient
  • the first data is split according to a preset split mode to obtain multiple
  • the second data includes:
  • the first gradient is split according to the preset split mode to obtain a plurality of second gradients.
  • the method further includes:
  • the parity property of the row and column corresponding to the position of the element in the second neuron in the first neuron corresponds to the position of the element in the corresponding second gradient in the first gradient
  • the parity of the rows and columns is consistent.
  • Clause A11 according to the method described in clause A10, adjust the weight value according to the residual error of the weight value.
  • the splitting module is used to split the first data according to a preset splitting method to obtain multiple second data;
  • the convolution module is configured to perform a winograd convolution operation on the second data and the weight for any of the second data to obtain multiple first convolution results;
  • a merging module configured to merge the multiple first convolution results according to a preset merging manner to obtain a hole convolution result of the first data and the weight
  • the preset merge mode is the reverse process of the preset split mode.
  • the splitting module is further configured to:
  • the element corresponding to the even-numbered row and the even-numbered column in the first data is determined to form the fourth sub-data.
  • the merging module is further used for:
  • the elements in the first convolution result corresponding to the fourth sub-data are sequentially used as the elements corresponding to the even rows and the even columns in the hole convolution result.
  • the device according to clauses A12 to A14, the first data includes neurons and/or gradients.
  • the convolution module is further used for:
  • the winograd inverse transform of the result of the alignment multiplication is disassembled into a summation operation to obtain a first convolution result of the second data and the weight.
  • the convolution module is further used for:
  • the number of the plurality of first sub-tensors is the same as the number of non-zero elements of the second data, and each first sub-tensor in the plurality of first sub-tensors has an element with all the elements The elements at the corresponding positions in the second data are the same, and other elements are all 0.
  • the convolution module is further used for:
  • the number of the plurality of second sub-tensors is the same as the number of elements of the weight, and each second sub-tensor in the plurality of second sub-tensors has an element that is the same as the number of elements in the weight.
  • the elements of the corresponding positions are the same, and the other elements are all 0.
  • the convolution module is further used for:
  • the result of the alignment multiplication is disassembled into a plurality of third sub-tensors, and the inverse winograd transformation is performed on the plurality of third sub-tensors and summed to obtain the first data of the second data and the weight.
  • Convolution result
  • the number of the plurality of third sub-tensors is the same as the number of non-zero elements of the result of the alignment multiplication, and each third sub-tensor in the plurality of third sub-tensors has one element It is the same as the element at the corresponding position in the alignment multiplication result, and other elements are all 0.
  • the splitting module is further used for:
  • the first gradient is split according to the preset split mode to obtain multiple second gradients.
  • a processing module configured to perform a convolution operation on any of the second neuron and the corresponding second gradient to obtain a third convolution result
  • a determining module configured to determine that the sum of the third convolution results corresponding to each of the second neurons is the residual of the weight
  • the parity property of the row and column corresponding to the position of the element in the second neuron in the first neuron corresponds to the position of the element in the corresponding second gradient in the first gradient
  • the parity of the rows and columns is consistent.
  • the adjustment module is configured to adjust the weight value according to the residual error of the weight value.
  • Clause A24 an electronic device including the artificial intelligence chip as described in Clause A23.
  • a board card comprising: a storage device, an interface device, a control device, and the artificial intelligence chip as described in clause A23;
  • the artificial intelligence chip is connected to the storage device, the control device, and the interface device respectively;
  • the storage device is used to store data
  • the interface device is used to implement data transmission between the artificial intelligence chip and external equipment
  • the control device is used to monitor the state of the artificial intelligence chip.
  • the storage device includes: multiple groups of storage units, each group of the storage unit is connected to the artificial intelligence chip through a bus, and the storage unit is: DDR SDRAM;
  • the chip includes: a DDR controller, which is used to control the data transmission and data storage of each of the storage units;
  • the interface device is: a standard PCIE interface.
  • an electronic device including: a processor
  • a memory for storing processor executable instructions
  • the processor is configured to call instructions stored in the memory to execute the method described in any one of clauses A1 to A11.
  • Clause A28 a computer-readable storage medium with computer program instructions stored thereon, which, when executed by a processor, implement the method described in any one of clauses A1 to A11.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Image Processing (AREA)

Abstract

L'invention concerne un procédé et un appareil de traitement de données, un dispositif informatique et un support d'enregistrement. Le procédé consiste à : diviser des premières données selon un mode de division prédéfini pour obtenir une pluralité d'éléments des secondes données (S21) ; pour l'une quelconque des secondes données, effectuer une opération de convolution winograd sur les secondes données et une pondération pour obtenir une pluralité de premiers résultats de convolution (S22) ; et fusionner la pluralité de premiers résultats de convolution selon un mode de fusion prédéfini pour obtenir un résultat de convolution dilaté des premières données et de la pondération, le mode de fusion prédéfini étant un processus inverse du mode de division prédéfini. Le procédé permet d'améliorer l'efficacité de fonctionnement de produits apparentés pendant le fonctionnement d'un modèle de réseau neuronal.
PCT/CN2020/123832 2019-12-09 2020-10-27 Procédé et appareil de traitement de données, dispositif informatique et support d'enregistrement WO2021114903A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911251492.0A CN113033813B (zh) 2019-12-09 2019-12-09 数据处理方法、装置、计算机设备和存储介质
CN201911251492.0 2019-12-09

Publications (1)

Publication Number Publication Date
WO2021114903A1 true WO2021114903A1 (fr) 2021-06-17

Family

ID=76328836

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/123832 WO2021114903A1 (fr) 2019-12-09 2020-10-27 Procédé et appareil de traitement de données, dispositif informatique et support d'enregistrement

Country Status (2)

Country Link
CN (1) CN113033813B (fr)
WO (1) WO2021114903A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610223A (zh) * 2021-08-03 2021-11-05 安谋科技(中国)有限公司 乘法电路、卷积运算方法、介质、片上系统和电子设备
CN115019166A (zh) * 2022-05-24 2022-09-06 深圳大学 基于深度网络模型的沼泽湿地信息提取方法、装置、介质及终端

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107993186A (zh) * 2017-12-14 2018-05-04 中国人民解放军国防科技大学 一种基于Winograd算法的3D CNN加速方法及系统
CN108229654A (zh) * 2016-12-14 2018-06-29 上海寒武纪信息科技有限公司 神经网络卷积运算装置及方法
CN110222760A (zh) * 2019-06-04 2019-09-10 东南大学 一种基于winograd算法的快速图像处理方法
CN110334803A (zh) * 2019-07-18 2019-10-15 南京风兴科技有限公司 基于稀疏化Winograd算法的卷积计算方法和卷积神经网络加速器

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097172B (zh) * 2019-03-18 2021-10-29 中国科学院计算技术研究所 一种基于winograd卷积运算的卷积神经网络数据处理方法及装置
CN110533164B (zh) * 2019-08-05 2023-04-07 西安交通大学 一种面向卷积神经网络加速器的Winograd卷积拆分方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229654A (zh) * 2016-12-14 2018-06-29 上海寒武纪信息科技有限公司 神经网络卷积运算装置及方法
CN107993186A (zh) * 2017-12-14 2018-05-04 中国人民解放军国防科技大学 一种基于Winograd算法的3D CNN加速方法及系统
CN110222760A (zh) * 2019-06-04 2019-09-10 东南大学 一种基于winograd算法的快速图像处理方法
CN110334803A (zh) * 2019-07-18 2019-10-15 南京风兴科技有限公司 基于稀疏化Winograd算法的卷积计算方法和卷积神经网络加速器

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KIM MINSIK; PARK CHEONJUN; KIM SUNGJUN; HONG TAEYOUNG; RO WON WOO: "Efficient Dilated-Winograd Convolutional Neural Networks", 2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), IEEE, 22 September 2019 (2019-09-22), pages 2711 - 2715, XP033647231, DOI: 10.1109/ICIP.2019.8803277 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610223A (zh) * 2021-08-03 2021-11-05 安谋科技(中国)有限公司 乘法电路、卷积运算方法、介质、片上系统和电子设备
CN113610223B (zh) * 2021-08-03 2023-12-26 安谋科技(中国)有限公司 乘法电路、卷积运算方法、介质、片上系统和电子设备
CN115019166A (zh) * 2022-05-24 2022-09-06 深圳大学 基于深度网络模型的沼泽湿地信息提取方法、装置、介质及终端
CN115019166B (zh) * 2022-05-24 2024-02-09 深圳大学 沼泽湿地信息提取方法、装置、介质及终端

Also Published As

Publication number Publication date
CN113033813B (zh) 2024-04-26
CN113033813A (zh) 2021-06-25

Similar Documents

Publication Publication Date Title
WO2021114903A1 (fr) Procédé et appareil de traitement de données, dispositif informatique et support d'enregistrement
CN110692038A (zh) 多功能矢量处理器电路
WO2021036893A1 (fr) Procédé et appareil de traitement de données, dispositif informatique et support de stockage
CN109670581B (zh) 一种计算装置及板卡
WO2021114904A1 (fr) Procédé et appareil de traitement de données, dispositif informatique et support d'enregistrement
JP2022541721A (ja) 効率的な乗算のための代替数字形式をサポートするシステムおよび方法
CN110059797B (zh) 一种计算装置及相关产品
WO2021082725A1 (fr) Procédé d'opération de convolution winograd et produit associé
WO2021185262A1 (fr) Appareil de calcul et procédé, carte de panneau et support de stockage lisible par ordinateur
US20220405349A1 (en) Data processing method and apparatus, and related product
WO2021082746A1 (fr) Appareil d'exploitation et produit associé
WO2021082747A1 (fr) Appareil d'exploitation et produit associé
WO2021083097A1 (fr) Appareil et procédé de traitement de données, et dispositif informatique et support de stockage associés
WO2021083100A1 (fr) Procédé et dispositif de traitement de données, équipement informatique et support de stockage
WO2021082654A1 (fr) Appareil et procédé de traitement de données, et dispositif informatique et support de stockage
WO2021082723A1 (fr) Appareil d'execution
WO2021082653A1 (fr) Procédé et appareil de traitement de données, dispositif informatique, et support de stockage
CN113762488B (zh) 处理器、数据处理方法、计算机设备和存储介质
CN113297128B (zh) 数据处理方法、装置、计算机设备和存储介质
CN113298223B (zh) 数据处理方法、装置、计算机设备和存储介质
WO2021169914A1 (fr) Procédé et appareil de traitement par quantification de données, dispositif électronique et support de stockage
CN112784207B (zh) 运算方法及相关产品
EP4148561A1 (fr) Procédé et appareil de traitement de données, et produit associé
US20230135306A1 (en) Crossbar circuit for unaligned memory access in neural network processor
CN112306949B (zh) 数据处理方法及装置以及相关产品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20900018

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20900018

Country of ref document: EP

Kind code of ref document: A1