WO2021082724A1 - Procédé d'opération et produit associé - Google Patents

Procédé d'opération et produit associé Download PDF

Info

Publication number
WO2021082724A1
WO2021082724A1 PCT/CN2020/113166 CN2020113166W WO2021082724A1 WO 2021082724 A1 WO2021082724 A1 WO 2021082724A1 CN 2020113166 W CN2020113166 W CN 2020113166W WO 2021082724 A1 WO2021082724 A1 WO 2021082724A1
Authority
WO
WIPO (PCT)
Prior art keywords
result
transformation
sub
feature
tensor
Prior art date
Application number
PCT/CN2020/113166
Other languages
English (en)
Chinese (zh)
Inventor
张英男
曾洪博
张尧
刘少礼
黄迪
周诗怡
张曦珊
刘畅
郭家明
高钰峰
Original Assignee
中科寒武纪科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中科寒武纪科技股份有限公司 filed Critical 中科寒武纪科技股份有限公司
Publication of WO2021082724A1 publication Critical patent/WO2021082724A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • G06F17/156Correlation function computation including computation of convolution operations using a domain transform, e.g. Fourier transform, polynomial transform, number theoretic transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Definitions

  • This application relates to the field of deep learning technology, in particular to a neural network-based calculation method and related products.
  • the neural network model is an operation model in deep learning technology, which uses a multi-layer architecture to process the input data and output the corresponding operation results.
  • training a neural network model is a necessary step for calculations using the neural network model.
  • the neural network to be trained needs to repeatedly perform iterative operations on massive training data to obtain the trained neural network model. .
  • this application provides an operation method, including:
  • the result of the operation is output to the lower layer convolutional network.
  • this application provides a computing device, including:
  • An acquisition module used to acquire the feature data output by the upper layer convolutional network and the feature transformation matrix used to perform positive transformation on the feature data;
  • the feature transformation module is used to transform the feature data according to the feature transformation matrix to obtain a feature transformation result; wherein the transformation operation of the feature data is disassembled into a summation operation, and the summation operation is determined according to the summation operation.
  • the result of feature transformation is used to transform the feature data according to the feature transformation matrix to obtain a feature transformation result;
  • the bit multiplication module is used to obtain the weight conversion result of the positive transformation of the convolutional network of this layer, and perform bit multiplication on the feature conversion result and the weight conversion result to obtain the multiplication result;
  • the inverse transformation module is used to obtain an inverse transformation matrix for inverse transformation of the multiplication operation result, and transform the multiplication operation result according to the inverse transformation matrix to obtain the operation result; wherein The transformation operation is disassembled into a summation operation, and the operation result is determined according to the summation operation;
  • the transmission module is used to output the calculation result to the lower layer convolutional network.
  • this application provides an artificial intelligence chip, which includes the computing device described in any one of the preceding items.
  • the present application provides an electronic device including the artificial intelligence chip as described above.
  • the present application provides a board card, the board card includes: a storage device, an interface device, a control device, and the aforementioned artificial intelligence chip;
  • the artificial intelligence chip is connected to the storage device, the control device, and the interface device respectively;
  • the storage device is used to store data
  • the interface device is used to implement data transmission between the artificial intelligence chip and external equipment
  • the control device is used to monitor the state of the artificial intelligence chip.
  • the operation method and related products provided by this application include obtaining the feature data output by the upper-layer convolutional network and the feature transformation matrix used for forward transformation of the feature data; transforming the feature data according to the feature transformation matrix to obtain the feature transformation result; wherein , The transformation operation of feature data is disassembled into a summation operation, and the feature transformation result is determined according to the summation operation; the weight transformation result of the positive transformation of the convolutional network of this layer is obtained, and the feature transformation result and the weight transformation result are compared Bit multiplication operation, the multiplication operation result is obtained; the inverse transformation matrix used for inverse transformation of the multiplication operation result is obtained, and the multiplication operation result is transformed according to the inverse transformation matrix to obtain the operation result; among them, the transformation operation of the multiplication operation result is disassembled into Sum operation, and determine the operation result according to the sum operation; output the operation result to the lower layer convolutional network.
  • the winograd algorithm When performing convolution processing on feature data, the winograd algorithm is used. This algorithm can convert multiplication to addition. At the same time, the data transformation operation in the algorithm is converted to a summation operation, which can further reduce the number of multiplications, thereby reducing the computer system The performance loss of the system increases the computing speed.
  • Fig. 1 is a structural diagram of a processing system shown in an exemplary embodiment
  • Fig. 2 is a flowchart of an operation method shown in an exemplary embodiment of the present invention
  • Fig. 3 is a flowchart of an operation method shown in an exemplary embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a master-slave processing architecture shown in an exemplary embodiment of the present invention.
  • Fig. 5 is a schematic diagram of an arithmetic device shown in an exemplary embodiment of the present invention.
  • Fig. 6 is a structural block diagram of a board according to an exemplary embodiment of the present invention.
  • the term “if” can be interpreted as “when” or “once” or “in response to determination” or “in response to detection” depending on the context.
  • the phrase “if determined” or “if detected [described condition or event]” can be interpreted as meaning “once determined” or “in response to determination” or “once detected [described condition or event]” depending on the context ]” or “in response to detection of [condition or event described]”.
  • Convolution operation refers to opening an active window with the same size as the template from the upper left corner of the image.
  • the active window corresponds to a window image, which is the convolution kernel
  • the window image corresponds to the pixels in the image Multiply and add, and use the calculation result as the first pixel value of the new image after the convolution operation.
  • the active window moves one column to the right, the window image corresponding to the active window and the pixels in the image are multiplied and then added, and the calculation result is used as the second pixel value of the new image after the convolution operation.
  • Winograd convolution operation is a convolution acceleration implementation method based on polynomial interpolation algorithm. It passes the two inputs of the convolution operation: the first target matrix and the second target matrix are respectively subjected to Winograd convolution positive transformation, and then the first target matrix and the second target matrix after the positive transformation are subjected to bitwise multiplication, and finally The Winograd convolution inverse transformation is performed again on the result of the bit multiplication, and the convolution result equivalent to the original convolution operation is obtained.
  • Convolutional neural network model is a type of feedforward neural network model that includes convolution calculation and has a deep structure, and is one of the representative models of deep learning.
  • Convolutional layer fully connected layer and other network layers in the convolutional neural network model, it is necessary to perform convolution operations on neurons and convolution kernels to obtain feature data, which is widely used in image classification and image recognition.
  • the operation method according to the embodiment of the present disclosure can be applied to any one processor of a processing system (for example, an artificial intelligence chip) including multiple processors (multi-core).
  • the processor may be a general-purpose processor, such as a CPU (Central Processing Unit, central processing unit), or an artificial intelligence processor (IPU) for performing artificial intelligence operations.
  • Artificial intelligence operations can include machine learning operations, brain-like operations, and so on. Among them, machine learning operations include neural network operations, k-means operations, support vector machine operations, and so on.
  • the artificial intelligence processor may, for example, include GPU (Graphics Processing Unit), NPU (Neural-Network Processing Unit, neural network processing unit), DSP (Digital Signal Process, digital signal processing unit), field programmable gate array (Field-Programmable Gate Array, FPGA) One or a combination of chips.
  • GPU Graphics Processing Unit
  • NPU Neuro-Network Processing Unit
  • DSP Digital Signal Process, digital signal processing unit
  • field programmable gate array Field-Programmable Gate Array, FPGA
  • the processor mentioned in the present disclosure may include multiple processing units, and each processing unit can independently run various tasks assigned to it, such as convolution computing tasks and pooling tasks. Or fully connected tasks, etc.
  • the present disclosure does not limit the processing unit and the tasks run by the processing unit.
  • Fig. 1 shows a schematic diagram of a processing system of an arithmetic method according to an embodiment of the present disclosure.
  • the processing system 100 includes multiple processors 101 and a memory 102.
  • the multiple processors 101 are used to execute instruction sequences.
  • the memory 102 is used to store data, and may include random access memory (RAM, Random Access Memory) and registers. heap.
  • RAM random access memory
  • the multiple processors 101 in the processing system 100 can not only share part of the storage space, for example, share part of the RAM storage space and the register file, but also have their own storage space at the same time.
  • the processing system can perform convolution operations on the feature data input to the layer according to the weight of a convolution layer, and obtain the convolution result, and then use the convolution result as the next convolution
  • the next convolutional layer continues to use the weight data of this layer to perform convolution calculation on the input feature data.
  • the convolution method can extract the features in the original data, for example, extract the image features, so as to output the required results according to these features.
  • the winograd algorithm when convolution operation is performed on the input feature data according to the weight data of the convolution layer, the winograd algorithm is used, and the transformation operation is divided into calculations. Sum operation to reduce the amount of multiplication processing, thereby reducing the performance loss of the computing system and improving computing efficiency.
  • Fig. 2 is a flowchart of an operation method shown in an exemplary embodiment of the present invention.
  • the calculation method provided by this embodiment includes:
  • Step 201 Obtain feature data output by the upper-layer convolutional network and a feature transformation matrix used to perform positive transformation on the feature data.
  • the computer system used to execute the solution of the embodiment may be connected to a terminal device.
  • the terminal device can send the original data to the computer system.
  • the computer system can use the method provided in this embodiment to process the original data, extract the features in the original data, and then feed back the recognition results to the terminal device based on these features, such as feeding back the original data correspondence Information.
  • the original data may be picture data
  • the terminal device may upload the original picture to the computer system, the computer system extracts the features included in the picture, determines a recognition result based on the characteristics, and then feeds back the recognition result to the terminal device.
  • a layer of convolutional network can output the operation result obtained by convolution to the next layer of convolutional network, so that the convolutional network can obtain the feature data output by the upper layer of convolutional network.
  • This layer of convolutional network can perform convolution calculation on the feature data according to the weight of this layer, so as to obtain the calculation result.
  • the winograd algorithm may be used to transform the characteristic data.
  • a winograd domain refers to a domain that has been transformed by winograd.
  • feature transformation matrices B and B T used to perform positive transformation on feature data d can also be obtained.
  • the number of multiplication operations is relatively large.
  • the winograd algorithm for convolution processing the number of multiplications can be reduced, thereby reducing the performance loss caused by the operation.
  • the method provided in this embodiment can obtain the feature transformation matrix for forward transformation of the feature data.
  • the A, A T, B, B T, G, G T is a fixed matrix.
  • the size of d can be determined according to the size of the required output result Y, the weight data g, and the sliding step length of the convolution process, and then the corresponding A, AT , B, B T , G, G T.
  • Step 202 Transform the feature data according to the feature transformation matrix to obtain a feature transformation result; wherein, the transformation operation of the feature data is disassembled into a summation operation, and the feature transformation result is determined according to the summation operation.
  • the characteristic data d can be transformed by the characteristic transformation matrix B and B T to obtain the characteristic change result, that is, the result of determining the B T dB.
  • the transformation operation of the characteristic data is disassembled into a summation operation, and the characteristic transformation result is determined according to the summation operation.
  • the transformation operation of the feature data can be disassembled into multiple sub-transformation results, and then the sum result of the sub-transformation results can be determined as the feature transformation result.
  • the replacement matrix corresponding to each element in d can be preset, for example, d 00 corresponds to a matrix D 00 , and d 01 corresponds to a D 01 whereasD 33 corresponds to a D 33 .
  • the replacement matrix can be a matrix including 0, 1, -1.
  • the replacement matrix corresponding to d can be directly read, and each element in d can be extracted, and the element can be multiplied by the corresponding replacement matrix and then added to obtain the transformation result.
  • Specific characteristic data can be converted according to the size, feature matrix B, wherein determining an inverse transformation matrix B T substitution matrix, when the conversion characteristic data can be read directly converting the stored characteristics beforehand substitution matrix.
  • the multiplication operation of a single element and the replacement matrix can reduce the number of multiplications. Especially when the replacement matrix is composed of 0, 1, -1, the amount of calculation can be greatly reduced.
  • the feature data is a 4 ⁇ 4 matrix, which includes 16 elements d 00 , d 01 ... d 33 altogether. At this time, there may be 16 replacement matrices D 01 , D 01 ... D 33 corresponding to these elements. . In specific calculations,
  • the multiplication of the element and the replacement matrix is changed to the process of directly writing data.
  • the 1 in the replacement matrix is multiplied by d 00 , and the actual result is directly written into d 00 . Therefore, based on the method provided in this embodiment, the transformation process in the winograd algorithm can be converted into an addition algorithm, thereby further reducing the computational complexity of the convolution process.
  • Step 203 Obtain the weight transformation result of the positive transformation of the convolutional network of this layer, and perform a bitwise multiplication operation on the feature transformation result and the weight transformation result to obtain the multiplication operation result.
  • the weight data g of the convolutional network of the current layer and the weight transformation matrices G T , G used to transform the weight data can be obtained. Then use the weight transformation matrix to transform the weight data to obtain the weight transformation result.
  • the replacement matrix corresponding to each element in g can be stored in advance, and then the positive transformation operation of the weight can be converted into a summation operation through these replacement matrices.
  • the weight transformation matrix corresponding to the weight data can be determined in advance, and The weight transformation result is determined in advance according to the weight data and the corresponding weight transformation matrix.
  • the predetermined weight change result can be directly read.
  • the predetermined weight transformation result can be stored in the storage unit, and the weight transformation result can be directly read when needed. Thereby, the performance loss caused by the positive transformation of the weight data is further reduced.
  • the time sequence for obtaining the weight change result and determining the feature transformation result is not limited.
  • the two results can be multiplied by bit. That is, after obtaining B T dB and G T gG, the two matrices can be multiplied by bit to determine the result of (GgG T ) ⁇ (B T dB).
  • the values at the corresponding positions of the two transformation results can be multiplied to obtain a new matrix as the result of the multiplication operation.
  • the result of feature data transformation is:
  • Step 204 Obtain an inverse transformation matrix for inversely transforming the result of the multiplication operation, and transform the result of the multiplication operation according to the inverse transformation matrix to obtain the operation result; wherein, the transformation operation of the multiplication operation result is disassembled into a summation operation, and according to The summation operation determines the result of the operation.
  • the inverse transformation matrices A and AT used for inverse transformation of the multiplication operation result can also be obtained.
  • the inverse transformation matrix can be determined according to the size of the operation result.
  • the inverse transformation matrix can be used to inversely transform the result of the multiplication operation, that is, to determine
  • the replacement matrix corresponding to each element included in the multiplication operation can be determined in advance, so that the inverse transformation operation can be disassembled into a summation operation according to these replacement matrices, and the operation result can be determined according to the summation operation.
  • the specific disassembly method is similar to the disassembly method for the feature transformation operation, and the convolution operation result can be obtained through fewer multiplication methods.
  • Step 205 Output the calculation result to the lower layer convolutional network.
  • the convolutional network of this layer can output the determined calculation result to the lower layer convolutional network, so as to use it as the input feature of the lower layer convolutional network, and the lower layer convolutional network can be based on the The weight data performs convolution calculation on the input data.
  • the above-mentioned calculation method can also be used, that is, the winograd algorithm is adopted, and the change operation in the algorithm is converted into a summation operation.
  • the method provided in this embodiment is used to perform convolution operations, and the method is executed by a device provided with the method provided in this embodiment, and the device is usually implemented in hardware and/or software.
  • the operation method provided by this embodiment includes obtaining the feature data output by the upper-layer convolutional network and the feature transformation matrix used for forward transformation of the feature data; transforming the feature data according to the feature transformation matrix to obtain the feature transformation result; where The transformation operation of feature data is disassembled into a summation operation, and the feature transformation result is determined according to the summation operation; the weight transformation result of the positive transformation of the convolutional network of this layer is obtained, and the feature transformation result and the weight transformation result are subjected to bitwise multiplication Operate to obtain the result of the multiplication operation; obtain the inverse transformation matrix for inverse transformation of the result of the multiplication operation, and transform the result of the multiplication operation according to the inverse transformation matrix to obtain the operation result; among them, the transformation operation of the multiplication operation result is disassembled into a summation Calculate, and determine the result of the operation according to the summation operation; output the result of the operation to the lower layer convolutional network.
  • the computer system uses the winograd algorithm when performing convolution processing on the feature data, which can convert multiplication to addition, and convert the change process in the algorithm to a summation operation, thereby further Reduce the multiplication operation in the data processing process, can reduce the performance loss of the computer system, and increase the calculation speed.
  • Fig. 3 is a flowchart of an operation method shown in an exemplary embodiment of the present invention.
  • the calculation method provided by this embodiment includes:
  • Step 301 Obtain feature data output by the upper-layer convolutional network and a feature transformation matrix used for positive transformation of the feature data.
  • step 301 and step 201 are similar, and will not be described again.
  • Step 302 Disassemble the feature data into multiple feature sub-tensors.
  • the positive transformation of the characteristic data can be disassembled into a summation operation, thereby reducing the number of multiplication operations.
  • the feature data can be disassembled into multiple feature sub-tensors.
  • the sum of multiple feature sub-tensors is feature data
  • the number of multiple feature sub-tensors is the same as the number of non-zero elements in the feature data
  • each feature sub-tensor has a single non-zero element
  • the feature The non-zero elements in the sub-tensor are the same as the non-zero elements in the corresponding position in the feature data.
  • the characteristic data d is:
  • the feature data can be divided into 16 (assuming that the elements in the feature data are all non-zero), and the feature sub-tensors are:
  • Step 303 Perform a transformation operation on the multiple feature sub-tensors according to the feature transformation matrix and sum them to obtain a feature transformation result.
  • the feature transformation matrix can be used to transform each feature sub-tensor, and then the transformation results of the feature sub-tensors can be added to obtain the feature transformation result.
  • the result of transforming the feature sub-data and adding the transformation results is the same as the transformation result of the feature data.
  • the above transformation can be performed, and then the transformation results of each feature sub-tensor are added to obtain the feature transformation result.
  • the feature element sub-tensor is a tensor in which the non-zero elements of the feature sub-tensor are set to 1;
  • the result of the feature transformation is obtained by summing the transformation results of the feature subtensors.
  • the non-zero elements in the feature sub-tensor can be identified, and the position corresponding to the non-zero elements can be set to 1, to obtain the feature element sub-tensor, for example, for the feature sub-tensor
  • the corresponding feature element sub-tensor is:
  • the corresponding feature element sub-tensor can be determined.
  • the transformation result of the feature sub-tensor can be determined according to the feature transformation matrix, the feature element sub-tensor and the corresponding non-zero elements.
  • the left side of the eigen-element sub-tensor can be multiplied by the left multiplication matrix in the feature transformation matrix
  • the right side can be multiplied by the right-multiplication matrix in the feature transformation matrix
  • the result can be multiplied with the non-zero elements corresponding to the eigen-element sub-tensor to get
  • the transformation result of the feature sub-tensor among them, the left multiplication matrix and the right multiplication matrix in the feature sub-tensor are both determined by the size of the feature sub tensor
  • B T and B can be determined according to the size of the feature data, and the feature element sub-tensor can also be determined in advance according to the feature data. Therefore, the replacement matrix corresponding to the position of each element in the feature data can also be determined in advance according to B T, B, and feature element sub-tensors.
  • the replacement matrix is:
  • the corresponding replacement matrix can be determined for each element position in the feature data.
  • the corresponding replacement matrix set can be determined directly according to the data size, and then the feature transformation result can be determined according to the replacement matrix set.
  • Step 304 Obtain the weight data of the convolutional network of the current layer and the weight transformation matrix used for positive transformation of the weight data.
  • Step 305 Transform the weight data according to the weight transformation matrix to obtain a weight transformation result; wherein, the transformation operation of the weight data is disassembled into a summation operation, and the weight transformation result is determined according to the summation operation.
  • the weight data can be transformed according to the weight transformation matrix to obtain the weight transformation result.
  • the transformation operation of the weight data in order to reduce the multiplication operation, can be disassembled into a summation operation, and the weight transformation result can be determined according to the summation operation.
  • weight data can be transformed based on the following formula:
  • G T and G are weight transformation matrices
  • g is weight data.
  • the weight data can be disassembled into multiple weight sub-tensors; then the multiple weight sub-tensors are transformed and summed according to the weight transformation matrix, Get the result of weight transformation.
  • the sum of multiple weight sub-tensors is weight data
  • the number of multiple weight sub-tensors is the same as the number of non-zero elements in the weight data
  • each weight sub-tensor has a single non-zero value.
  • the non-zero element in the weight sub-tensor is the same as the non-zero element in the corresponding position in the weight data.
  • the weight data g is:
  • weight data can also be split into 16 weight sub-tensors, which are:
  • weight sub-tensors For example, for one of the weight sub-tensors, it can be transformed based on the following formula:
  • the above transformation can be performed, and then the transformation results of each weight sub-tensor are added to obtain the weight transformation result.
  • the corresponding weight element sub-tensor is:
  • the corresponding weight element sub-tensor can be determined.
  • the transformation result of the weight sub-tensor can be determined according to the weight transformation matrix, the weight element sub-tensor and the corresponding non-zero elements.
  • the left side of the weight element sub-tensor can be multiplied by the left multiplication matrix in the weight transformation matrix, and the right side can be multiplied by the right multiplication matrix in the weight transformation matrix, and the result can be the non-zero element corresponding to the weight element sub-tensor.
  • G T and G can be determined according to the size of the weight data, and the weight element sub-tensor can also be determined in advance according to the weight data. Therefore, the replacement matrix corresponding to the position of each element in the weight data can also be determined in advance according to G T, G and the weight element sub-tensor.
  • the replacement matrix is:
  • the corresponding replacement matrix can be determined for each element position in the weight data.
  • the corresponding replacement matrix set can be determined directly according to the data size, and then the weight transformation result can be determined according to the replacement matrix set.
  • G T gG g 00 ⁇ D' 01 +g 01 ⁇ D' 01 ...g 33 ⁇ D' 33
  • Step 306 Perform a bitwise multiplication operation on the feature transformation result and the weight transformation result to obtain the multiplication operation result.
  • Step 306 is similar to the implementation principle and method of performing bit multiplication on the feature transformation result and the weight transformation result in step 203, and will not be repeated here.
  • Step 307 Decompose the result of the multiplication operation into multiple resultant sub-tensors.
  • Step 308 Perform a transformation operation on the multiple resultant sub-tensors according to the inverse transformation matrix and sum them to obtain the operation result.
  • the multiplication operation result data can be transformed according to the inverse transformation matrix to obtain the operation result.
  • the transformation operation of the multiplication operation result can be disassembled into a summation operation, and the operation result can be determined according to the summation operation.
  • a T and A are inverse transformation matrices
  • p is the result of the multiplication operation.
  • the sum of multiple resultant subtensors is the result of the multiplication operation
  • the number of multiple resultant subtensors is the same as the number of non-zero elements in the result of the multiplication operation
  • each resultant subtensor has a single non-zero element
  • the non-zero elements in the resulting sub-tensor are the same as the non-zero elements in the corresponding position in the result of the multiplication operation.
  • the multiplication operation result p is:
  • the result of the multiplication operation can also be split into 16 result sub-tensors, which are:
  • the inverse transformation matrix can be used to transform each resultant sub-tensor, and then the transformation results of the resultant sub-tensors can be added to obtain the result of the operation.
  • the above transformation can be performed, and then the transformation results of each result sub-tensor are added to obtain the operation result.
  • the non-zero elements in the result sub-tensor can be identified, and the position corresponding to the non-zero elements can be set to 1, and the result element sub-tensor can be obtained, for example, for the result sub-tensor
  • the corresponding result element sub-tensor can be determined.
  • the result of the operation can be determined according to the inverse transformation matrix, the resultant sub-tensor and its corresponding non-zero elements.
  • the left side of the resultant sub-tensor can be multiplied by the left multiplication matrix in the inverse transformation matrix, and the right side can be multiplied by the right multiplication matrix in the inverse transformation matrix, and the result can be multiplied by the non-zero elements corresponding to the resultant sub-tensor to get The transformation result of the resultant sub-tensor; where, the left-multiplication matrix and the right-multiplication matrix in the resultant sub-tensor are both determined by the scale of the resultant sub-tensor.
  • a T and A can be determined according to the size of the operation result, and the result element sub-tensor can also be determined in advance according to the size of the operation result. Therefore, the replacement matrix corresponding to the position of each element in the multiplication data can also be determined in advance according to AT, A, and the result element sub-tensor.
  • the replacement matrix is:
  • the corresponding replacement matrix can be determined for the position of each element in the multiplication operation result.
  • the corresponding replacement matrix set can be determined directly according to the result or the final operation result size, and then determined according to the replacement matrix set The result of the calculation.
  • Step 309 Output the calculation result to the lower layer convolutional network.
  • step 309 and step 205 are similar, and will not be repeated here.
  • Fig. 4 is a schematic diagram showing a master-slave processing architecture according to an exemplary embodiment of the present invention.
  • the solution of this embodiment also provides a master-slave processing architecture, which can be used to implement the calculation method provided in this embodiment.
  • the master-slave processing structure includes a master functional unit 41 and at least one slave functional unit 42.
  • the main functional unit 41 transforms the characteristic data according to the characteristic transformation matrix to obtain the characteristic transformation result; wherein, the transformation operation of the characteristic data is disassembled into a summation operation, and the characteristic transformation result is determined according to the summation operation.
  • a main storage unit (not shown in the figure) can also be provided, and the main storage unit can be connected to the main function unit 41.
  • a main control unit (not shown in the figure) can respectively send instructions to the main storage unit and the main function unit 41, so that the main storage unit can send characteristic data to the main function unit.
  • the functional unit 42 performs a bitwise multiplication operation on the feature data transformation result and the weight transformation result, and obtains the multiplication operation result.
  • the slave functional unit 42 may perform a bitwise multiplication operation on the received feature data transformation result and the weight transformation result, so as to obtain the multiplication operation result.
  • the data processing process is similar to the foregoing embodiment, and will not be repeated.
  • one main functional unit 41 can be connected to multiple slave functional units 42, and allocation rules can be preset for allocating characteristic data transformation results to the slave functional units 42.
  • the operation process of the main function unit 41 and the slave function unit 42 is a parallel operation.
  • the slave function unit 42 transforms the calculated feature For the element position of the result value, perform the alignment multiplication operation of the feature transformation result and the weight transformation result under the element position, until the alignment multiplication operation value of each element position is calculated, and the multiplication operation result is obtained.
  • steps in the flowchart are displayed in sequence according to the directions of the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless there is a clear description in this article, there is no strict order for the execution of these steps, and these steps can be executed in other orders. Moreover, at least part of the steps in the flowchart may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. The execution of these sub-steps or stages The sequence is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
  • Fig. 5 is a schematic diagram of an arithmetic device shown in an exemplary embodiment of the present invention.
  • the computing device provided in this embodiment includes:
  • the obtaining module 51 is configured to obtain the feature data output by the upper layer convolutional network and the feature transformation matrix used to perform positive transformation on the feature data;
  • the feature transformation module 52 is configured to transform the feature data according to the feature transformation matrix to obtain a feature transformation result; wherein, the transformation operation of the feature data is disassembled into a summation operation, and determined according to the summation operation The feature transformation result;
  • the bit multiplication module 53 is used to obtain the weight conversion result of the positive transformation of the convolutional network of the current layer, and perform bit multiplication on the feature conversion result and the weight conversion result to obtain the multiplication result;
  • the inverse transformation module 54 is configured to obtain an inverse transformation matrix used to inversely transform the multiplication operation result, and transform the multiplication operation result according to the inverse transformation matrix to obtain an operation result; wherein, the multiplication operation result is The transformation operation of is disassembled into a summation operation, and the operation result is determined according to the summation operation;
  • the transmission module 55 is configured to output the calculation result to the lower layer convolutional network.
  • the feature transformation module 52 is specifically configured to:
  • the inverse transform module 54 is specifically configured to:
  • the sum of a plurality of the feature sub-tensors is the feature data; the sum of a plurality of the result sub-tensors is the result of the multiplication operation;
  • the number of the multiple feature sub-tensors is the same as the number of non-zero elements in the feature data, each feature sub-tensor has a single non-zero element, and the non-zero elements in the feature sub-tensor are the same as The non-zero elements at corresponding positions in the feature data are the same;
  • the number of the multiple resultant sub-tensors is the same as the number of non-zero elements in the result of the multiplication operation, each of the resultant sub-tensors has a single non-zero element, and there are non-zero elements in the resultant sub-tensor It is the same as the non-zero element in the corresponding position in the multiplication operation result.
  • the feature transformation module 52 is specifically configured to:
  • the inverse transform module 54 is specifically configured to:
  • the result element sub-tensor is a tensor in which a non-zero element of the result sub-tensor is set to 1;
  • the feature transformation module 52 is specifically configured to:
  • the left multiplication matrix and the right multiplication matrix in the feature element sub-tensor are both determined by the scale of the feature sub-tensor
  • the inverse transform module 54 is specifically configured to:
  • the left multiplication matrix and the right multiplication matrix in the result element sub-tensor are both determined by the scale of the result sub-tensor.
  • the alignment multiplication module 53 is specifically used for:
  • the weight data is transformed according to the weight transformation matrix to obtain the weight transformation result; wherein, the transformation operation of the weight data is disassembled into a summation operation, and the weight is determined according to the summation operation.
  • the result of the value transformation
  • the alignment multiplication module 53 is specifically used for:
  • the alignment multiplication module 53 is specifically used for:
  • the alignment multiplication module 53 is specifically used for:
  • the left multiplication matrix and the right multiplication matrix in the weight sub-tensor are both determined by the scale of the weight sub-tensor.
  • the foregoing device embodiments are only illustrative, and the device of the present disclosure may also be implemented in other ways.
  • the division of units/modules in the above-mentioned embodiments is only a logical function division, and there may be other division methods in actual implementation.
  • multiple units, modules or components may be combined or integrated into another system, or some features may be omitted or not implemented.
  • the functional units/modules in the various embodiments of the present disclosure may be integrated into one unit/module, or each unit/module may exist alone physically, or two or more units/modules may exist.
  • the modules are integrated together.
  • the above-mentioned integrated unit/module can be implemented in the form of hardware or software program module.
  • the hardware may be a digital circuit, an analog circuit, and so on.
  • the physical realization of the hardware structure includes but is not limited to transistors, memristors and so on.
  • the artificial intelligence processor may be any appropriate hardware processor, such as CPU, GPU, FPGA, DSP, ASIC, and so on.
  • the storage unit may be any suitable magnetic storage medium or magneto-optical storage medium, such as RRAM (Resistive Random Access Memory), DRAM (Dynamic Random Access Memory), Static random access memory SRAM (Static Random-Access Memory), enhanced dynamic random access memory EDRAM (Enhanced Dynamic Random Access Memory), high-bandwidth memory HBM (High-Bandwidth Memory), hybrid storage cube HMC (Hybrid Memory Cube), etc. Wait.
  • RRAM Resistive Random Access Memory
  • DRAM Dynamic Random Access Memory
  • Static random access memory SRAM Static Random-Access Memory
  • enhanced dynamic random access memory EDRAM Enhanced Dynamic Random Access Memory
  • high-bandwidth memory HBM High-Bandwidth Memory
  • hybrid storage cube HMC Hybrid Memory Cube
  • the integrated unit/module is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer readable memory.
  • the technical solution of the present disclosure essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory, It includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present disclosure.
  • the aforementioned memory includes: U disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.
  • an artificial intelligence chip is also disclosed, which includes the aforementioned computing device.
  • a board card which includes a storage device, an interface device, a control device, and the aforementioned artificial intelligence chip; wherein, the artificial intelligence chip is connected to the storage device and the control device And the interface devices are respectively connected; the storage device is used to store data; the interface device is used to realize data transmission between the artificial intelligence chip and an external device; the control device is used to The state of the artificial intelligence chip is monitored.
  • Fig. 6 is a structural block diagram of a board card shown in an exemplary embodiment of the present invention.
  • the board card may include other supporting components in addition to the chip 389 described above.
  • the supporting components include, but are not limited to: a storage device 390. Interface device 391 and control device 392;
  • the storage device 390 is connected to the artificial intelligence chip through a bus for storing data.
  • the storage device may include multiple groups of storage units 393. Each group of the storage unit and the artificial intelligence chip are connected through a bus. It can be understood that each group of the storage units may be DDR SDRAM (English: Double Data Rate SDRAM, double-rate synchronous dynamic random access memory).
  • the storage device may include 4 groups of the storage units. Each group of the storage unit may include a plurality of DDR4 particles (chips).
  • the artificial intelligence chip may include four 72-bit DDR4 controllers. In the 72-bit DDR4 controller, 64 bits are used for data transmission and 8 bits are used for ECC verification. It can be understood that when DDR3200 particles are used in each group of the storage units, the theoretical bandwidth of data transmission can reach 25600MB/s.
  • each group of the storage unit includes a plurality of double-rate synchronous dynamic random access memories arranged in parallel.
  • DDR can transmit data twice in one clock cycle.
  • a controller for controlling the DDR is provided in the chip, which is used to control the data transmission and data storage of each storage unit.
  • the interface device is electrically connected with the artificial intelligence chip.
  • the interface device is used to implement data transmission between the artificial intelligence chip and an external device (such as a server or a computer).
  • the interface device may be a standard PCIE interface.
  • the data to be processed is transferred from the server to the chip through a standard PCIE interface to realize data transfer.
  • the interface device may also be other interfaces. The present disclosure does not limit the specific manifestations of the other interfaces mentioned above, as long as the interface unit can realize the switching function.
  • the calculation result of the artificial intelligence chip is still transmitted by the interface device back to an external device (such as a server).
  • the control device is electrically connected with the artificial intelligence chip.
  • the control device is used to monitor the state of the artificial intelligence chip.
  • the artificial intelligence chip and the control device may be electrically connected through an SPI interface.
  • the control device may include a single-chip microcomputer (Micro Controller Unit, MCU).
  • MCU Micro Controller Unit
  • the artificial intelligence chip may include multiple processing chips, multiple processing cores, or multiple processing circuits, and can drive multiple loads. Therefore, the artificial intelligence chip can be in different working states such as multi-load and light-load.
  • the control device can realize the regulation and control of the working states of multiple processing chips, multiple processing and or multiple processing circuits in the artificial intelligence chip.
  • an electronic device which includes the aforementioned artificial intelligence chip.
  • Electronic equipment includes data processing devices, robots, computers, printers, scanners, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cloud servers, cameras, cameras, projectors, watches, headsets , Mobile storage, wearable devices, vehicles, household appliances, and/or medical equipment.
  • the transportation means include airplanes, ships, and/or vehicles;
  • the household appliances include TVs, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods;
  • the medical equipment includes nuclear magnetic resonance, B-ultrasound and/or electrocardiograph.
  • An operation method comprising:
  • the disassembling the transformation operation of the multiplication operation result into a summation operation and determining the operation result according to the summation operation includes:
  • the number of the multiple feature sub-tensors is the same as the number of non-zero elements in the feature data, each feature sub-tensor has a single non-zero element, and the non-zero elements in the feature sub-tensor are the same as The non-zero elements at corresponding positions in the feature data are the same;
  • the number of the multiple resultant sub-tensors is the same as the number of non-zero elements in the result of the multiplication operation, each of the resultant sub-tensors has a single non-zero element, and there are non-zero elements in the resultant sub-tensor Same as the non-zero element in the corresponding position in the result of the multiplication operation
  • the performing a transformation operation on the multiple feature sub-tensors according to the feature transformation matrix and summing them to obtain the feature transformation result includes:
  • the performing a transformation operation on the multiple result levy tensors according to the inverse transformation matrix and summing them to obtain the operation result includes:
  • the result element sub-tensor is a tensor in which a non-zero element of the result sub-tensor is set to 1;
  • the left multiplication matrix and the right multiplication matrix in the feature element sub-tensor are both determined by the scale of the feature sub-tensor
  • the determining the transformation result of the result sub-tensor according to the inverse transformation matrix, the result element sub-tensor and the corresponding non-zero elements includes:
  • the left multiplication matrix and the right multiplication matrix in the result element sub-tensor are both determined by the scale of the result sub-tensor.
  • the obtaining the weight transformation result of the current layer of the convolutional network after the positive transformation includes:
  • the weight data is transformed according to the weight transformation matrix to obtain the weight transformation result; wherein, the transformation operation of the weight data is disassembled into a summation operation, and the weight is determined according to the summation operation.
  • the result of the value transformation
  • the determining the transformation result of the weight sub-tensor according to the weight transformation matrix, the weight element sub-tensor and its corresponding non-zero elements includes:
  • the left multiplication matrix and the right multiplication matrix in the weight sub-tensor are both determined by the scale of the weight sub-tensor.
  • the master-slave processing architecture includes: a master functional unit and at least one slave functional unit.
  • the slave functional unit obtains an inverse transformation matrix for inversely transforming the multiplication operation result, and transforms the multiplication operation result according to the inverse transformation matrix to obtain an operation result; wherein, the multiplication operation result is transformed
  • the operation is disassembled into a summation operation, and the operation result is determined according to the summation operation.
  • the operation process of the main functional unit and the slave functional unit is parallel operation. Before the main functional unit calculates the transformation result value of each element position in the feature data, all The said slave function unit executes the alignment multiplication operation of the feature transformation result and the weight transformation result under the element position for the element position of the calculated feature transformation result value, until the alignment multiplication operation value of each element position is calculated, Obtain the result of the multiplication operation.
  • An arithmetic device including:
  • An acquisition module used to acquire the feature data output by the upper layer convolutional network and the feature transformation matrix used to perform positive transformation on the feature data;
  • the feature transformation module is used to transform the feature data according to the feature transformation matrix to obtain a feature transformation result; wherein the transformation operation of the feature data is disassembled into a summation operation, and the summation operation is determined according to the summation operation.
  • the result of feature transformation is used to transform the feature data according to the feature transformation matrix to obtain a feature transformation result;
  • the bit multiplication module is used to obtain the weight conversion result of the positive transformation of the convolutional network of this layer, and perform bit multiplication on the feature conversion result and the weight conversion result to obtain the multiplication result;
  • the inverse transformation module is used to obtain an inverse transformation matrix for inverse transformation of the multiplication operation result, and transform the multiplication operation result according to the inverse transformation matrix to obtain the operation result; wherein The transformation operation is disassembled into a summation operation, and the operation result is determined according to the summation operation;
  • the transmission module is used to output the calculation result to the lower layer convolutional network.
  • the feature transformation module is specifically configured to:
  • the inverse transform module is specifically used for:
  • the number of the multiple feature sub-tensors is the same as the number of non-zero elements in the feature data, each feature sub-tensor has a single non-zero element, and the non-zero elements in the feature sub-tensor are the same as The non-zero elements at corresponding positions in the feature data are the same;
  • the number of the multiple resultant sub-tensors is the same as the number of non-zero elements in the result of the multiplication operation, each of the resultant sub-tensors has a single non-zero element, and there are non-zero elements in the resultant sub-tensor It is the same as the non-zero element in the corresponding position in the multiplication operation result.
  • the feature transformation module is specifically configured to:
  • the inverse transform module is specifically used for:
  • the result element sub-tensor is a tensor in which a non-zero element of the result sub-tensor is set to 1;
  • the feature transformation module is specifically configured to:
  • the left multiplication matrix and the right multiplication matrix in the feature element sub-tensor are both determined by the scale of the feature sub-tensor
  • the inverse transform module is specifically used for:
  • the left multiplication matrix and the right multiplication matrix in the result element sub-tensor are both determined by the scale of the result sub-tensor.
  • the alignment multiplication module is specifically configured to:
  • the weight data is transformed according to the weight transformation matrix to obtain the weight transformation result; wherein, the transformation operation of the weight data is disassembled into a summation operation, and the weight is determined according to the summation operation.
  • the result of the value transformation
  • the alignment multiplication module is specifically configured to:
  • the alignment multiplication module is specifically configured to:
  • the alignment multiplication module is specifically configured to:
  • the left multiplication matrix and the right multiplication matrix in the weight sub-tensor are both determined by the scale of the weight sub-tensor.
  • An artificial intelligence chip comprising the computing device according to any one of clauses A13-A21.
  • a board card comprising: a storage device, an interface device, a control device, and the artificial intelligence chip as described in clause A22;
  • the artificial intelligence chip is connected to the storage device, the control device, and the interface device respectively;
  • the storage device is used to store data
  • the interface device is used to implement data transmission between the artificial intelligence chip and external equipment
  • the control device is used to monitor the state of the artificial intelligence chip.
  • the storage device includes: multiple groups of storage units, each group of the storage units is connected to the artificial intelligence chip through a bus, and the storage unit is: DDR SDRAM;
  • the chip includes: a DDR controller, which is used to control the data transmission and data storage of each storage unit;
  • the interface device is: a standard PCIE interface.
  • the efficiency of neural network processing can be effectively improved.
  • the above solutions can be implemented based on various architectures, for example, through a master-slave architecture or a general architecture.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Neurology (AREA)
  • Complex Calculations (AREA)

Abstract

L'invention concerne un procédé d'opération et un produit associé, le procédé consistant : à acquérir des données de caractéristiques émises par un réseau convolutif de couche supérieure et une matrice de transformation de caractéristiques utilisée pour effectuer une transformation directe sur les données de caractéristiques (201) ; sur la base de la matrice de transformation de caractéristiques, à transformer les données de caractéristiques pour obtenir un résultat de transformation de caractéristiques, l'opération de transformation des données de caractéristiques étant désassemblée en une opération de sommation, et le résultat de transformation de caractéristiques étant déterminé sur la base de l'opération de sommation (202) ; à acquérir le résultat de transformation pondérale du réseau convolutif de la présente couche après transformation directe et à effectuer une opération de multiplication bit à bit sur le résultat de transformation de caractéristiques et le résultat de transformation pondérale pour obtenir un résultat d'opération de multiplication (203) ; à acquérir une matrice de transformation inverse utilisée pour effectuer une transformation inverse sur le résultat d'opération de multiplication et, sur la base de la matrice de transformation inverse, à effectuer une transformation sur le résultat d'opération de multiplication pour obtenir un résultat d'opération, l'opération de transformation du résultat d'opération de multiplication étant désassemblée en une opération de sommation, et le résultat de l'opération étant déterminé sur la base de l'opération de sommation (204) ; et à émettre le résultat de l'opération vers un réseau convolutif de couche inférieure (205).
PCT/CN2020/113166 2019-11-01 2020-09-03 Procédé d'opération et produit associé WO2021082724A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911061118.4A CN112784207B (zh) 2019-11-01 2019-11-01 运算方法及相关产品
CN201911061118.4 2019-11-01

Publications (1)

Publication Number Publication Date
WO2021082724A1 true WO2021082724A1 (fr) 2021-05-06

Family

ID=75715766

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/113166 WO2021082724A1 (fr) 2019-11-01 2020-09-03 Procédé d'opération et produit associé

Country Status (2)

Country Link
CN (1) CN112784207B (fr)
WO (1) WO2021082724A1 (fr)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160019456A1 (en) * 2014-07-16 2016-01-21 Qualcomm Incorporated Decomposing convolution operation in neural networks
CN108229654A (zh) * 2016-12-14 2018-06-29 上海寒武纪信息科技有限公司 神经网络卷积运算装置及方法
CN108229656A (zh) * 2016-12-14 2018-06-29 上海寒武纪信息科技有限公司 神经网络运算装置及方法
CN108549931A (zh) * 2018-04-25 2018-09-18 济南浪潮高新科技投资发展有限公司 一种卷积神经网络的加速装置及方法
CN109523020A (zh) * 2017-10-30 2019-03-26 上海寒武纪信息科技有限公司 一种运算装置和方法
CN109754064A (zh) * 2017-11-07 2019-05-14 三星电子株式会社 执行解卷积的神经网络的方法和装置

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11055063B2 (en) * 2016-05-02 2021-07-06 Marvell Asia Pte, Ltd. Systems and methods for deep learning processor
US20170344876A1 (en) * 2016-05-31 2017-11-30 Samsung Electronics Co., Ltd. Efficient sparse parallel winograd-based convolution scheme
WO2018107383A1 (fr) * 2016-12-14 2018-06-21 上海寒武纪信息科技有限公司 Procédé et dispositif de calcul de convolution d'un réseau de neurones artificiels, et support d'enregistrement lisible par ordinateur
US10482155B2 (en) * 2016-12-30 2019-11-19 Intel Corporation Winograd algorithm on a matrix processing architecture
US10990648B2 (en) * 2017-08-07 2021-04-27 Intel Corporation System and method for an optimized winograd convolution accelerator
US10372787B2 (en) * 2017-12-12 2019-08-06 Facebook, Inc. Hardware accelerator pre-configured with coefficients for matrix-transform operations
CN108388446A (zh) * 2018-02-05 2018-08-10 上海寒武纪信息科技有限公司 运算模块以及方法
CN109325591B (zh) * 2018-09-26 2020-12-29 中国科学院计算技术研究所 面向Winograd卷积的神经网络处理器
CN109685201B (zh) * 2018-12-14 2020-10-30 安徽寒武纪信息科技有限公司 运算方法、装置及相关产品
CN110097172B (zh) * 2019-03-18 2021-10-29 中国科学院计算技术研究所 一种基于winograd卷积运算的卷积神经网络数据处理方法及装置
CN110188869B (zh) * 2019-05-05 2021-08-10 北京中科汇成科技有限公司 一种基于卷积神经网络算法的集成电路加速计算的方法及系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160019456A1 (en) * 2014-07-16 2016-01-21 Qualcomm Incorporated Decomposing convolution operation in neural networks
CN108229654A (zh) * 2016-12-14 2018-06-29 上海寒武纪信息科技有限公司 神经网络卷积运算装置及方法
CN108229656A (zh) * 2016-12-14 2018-06-29 上海寒武纪信息科技有限公司 神经网络运算装置及方法
CN109523020A (zh) * 2017-10-30 2019-03-26 上海寒武纪信息科技有限公司 一种运算装置和方法
CN109754064A (zh) * 2017-11-07 2019-05-14 三星电子株式会社 执行解卷积的神经网络的方法和装置
CN108549931A (zh) * 2018-04-25 2018-09-18 济南浪潮高新科技投资发展有限公司 一种卷积神经网络的加速装置及方法

Also Published As

Publication number Publication date
CN112784207A (zh) 2021-05-11
CN112784207B (zh) 2024-02-02

Similar Documents

Publication Publication Date Title
CN109543832B (zh) 一种计算装置及板卡
CN109522052B (zh) 一种计算装置及板卡
TWI795519B (zh) 計算裝置、機器學習運算裝置、組合處理裝置、神經網絡芯片、電子設備、板卡及執行機器學習計算的方法
US20230026006A1 (en) Convolution computation engine, artificial intelligence chip, and data processing method
WO2021082725A1 (fr) Procédé d'opération de convolution winograd et produit associé
WO2021083101A1 (fr) Procédé et appareil de traitement de données, et produit connexe
WO2021082747A1 (fr) Appareil d'exploitation et produit associé
CN109711540B (zh) 一种计算装置及板卡
CN111124995A (zh) 通过人工智能处理器处理一维复数数组的方法和设备
WO2021185262A1 (fr) Appareil de calcul et procédé, carte de panneau et support de stockage lisible par ordinateur
WO2021082746A1 (fr) Appareil d'exploitation et produit associé
CN109740730B (zh) 运算方法、装置及相关产品
WO2021082723A1 (fr) Appareil d'execution
CN111143766A (zh) 人工智能处理器处理二维复数矩阵的方法和设备
WO2021082724A1 (fr) Procédé d'opération et produit associé
CN111061507A (zh) 运算方法、装置、计算机设备和存储介质
WO2021082721A1 (fr) Procédé, appareil et dispositif de fonctionnement de convolution de winograd, et support de stockage
WO2021223642A1 (fr) Procédé et appareil de traitement de données, et produit associé
WO2021169914A1 (fr) Procédé et appareil de traitement par quantification de données, dispositif électronique et support de stockage
CN111382852B (zh) 数据处理装置、方法、芯片及电子设备
WO2021082722A1 (fr) Dispositif et procédé de calcul, et produit associé
CN111047030A (zh) 运算方法、装置、计算机设备和存储介质
WO2021223644A1 (fr) Procédé et dispositif de traitement de données, et produit associé
WO2021037083A1 (fr) Procédé et appareil de traitement de données, et produit associé
CN111222632B (zh) 计算装置、计算方法及相关产品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20881646

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20881646

Country of ref document: EP

Kind code of ref document: A1