WO2021082724A1 - Operation method and related product - Google Patents

Operation method and related product Download PDF

Info

Publication number
WO2021082724A1
WO2021082724A1 PCT/CN2020/113166 CN2020113166W WO2021082724A1 WO 2021082724 A1 WO2021082724 A1 WO 2021082724A1 CN 2020113166 W CN2020113166 W CN 2020113166W WO 2021082724 A1 WO2021082724 A1 WO 2021082724A1
Authority
WO
WIPO (PCT)
Prior art keywords
result
transformation
sub
feature
tensor
Prior art date
Application number
PCT/CN2020/113166
Other languages
French (fr)
Chinese (zh)
Inventor
张英男
曾洪博
张尧
刘少礼
黄迪
周诗怡
张曦珊
刘畅
郭家明
高钰峰
Original Assignee
中科寒武纪科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中科寒武纪科技股份有限公司 filed Critical 中科寒武纪科技股份有限公司
Publication of WO2021082724A1 publication Critical patent/WO2021082724A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • G06F17/156Correlation function computation including computation of convolution operations using a domain transform, e.g. Fourier transform, polynomial transform, number theoretic transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Definitions

  • This application relates to the field of deep learning technology, in particular to a neural network-based calculation method and related products.
  • the neural network model is an operation model in deep learning technology, which uses a multi-layer architecture to process the input data and output the corresponding operation results.
  • training a neural network model is a necessary step for calculations using the neural network model.
  • the neural network to be trained needs to repeatedly perform iterative operations on massive training data to obtain the trained neural network model. .
  • this application provides an operation method, including:
  • the result of the operation is output to the lower layer convolutional network.
  • this application provides a computing device, including:
  • An acquisition module used to acquire the feature data output by the upper layer convolutional network and the feature transformation matrix used to perform positive transformation on the feature data;
  • the feature transformation module is used to transform the feature data according to the feature transformation matrix to obtain a feature transformation result; wherein the transformation operation of the feature data is disassembled into a summation operation, and the summation operation is determined according to the summation operation.
  • the result of feature transformation is used to transform the feature data according to the feature transformation matrix to obtain a feature transformation result;
  • the bit multiplication module is used to obtain the weight conversion result of the positive transformation of the convolutional network of this layer, and perform bit multiplication on the feature conversion result and the weight conversion result to obtain the multiplication result;
  • the inverse transformation module is used to obtain an inverse transformation matrix for inverse transformation of the multiplication operation result, and transform the multiplication operation result according to the inverse transformation matrix to obtain the operation result; wherein The transformation operation is disassembled into a summation operation, and the operation result is determined according to the summation operation;
  • the transmission module is used to output the calculation result to the lower layer convolutional network.
  • this application provides an artificial intelligence chip, which includes the computing device described in any one of the preceding items.
  • the present application provides an electronic device including the artificial intelligence chip as described above.
  • the present application provides a board card, the board card includes: a storage device, an interface device, a control device, and the aforementioned artificial intelligence chip;
  • the artificial intelligence chip is connected to the storage device, the control device, and the interface device respectively;
  • the storage device is used to store data
  • the interface device is used to implement data transmission between the artificial intelligence chip and external equipment
  • the control device is used to monitor the state of the artificial intelligence chip.
  • the operation method and related products provided by this application include obtaining the feature data output by the upper-layer convolutional network and the feature transformation matrix used for forward transformation of the feature data; transforming the feature data according to the feature transformation matrix to obtain the feature transformation result; wherein , The transformation operation of feature data is disassembled into a summation operation, and the feature transformation result is determined according to the summation operation; the weight transformation result of the positive transformation of the convolutional network of this layer is obtained, and the feature transformation result and the weight transformation result are compared Bit multiplication operation, the multiplication operation result is obtained; the inverse transformation matrix used for inverse transformation of the multiplication operation result is obtained, and the multiplication operation result is transformed according to the inverse transformation matrix to obtain the operation result; among them, the transformation operation of the multiplication operation result is disassembled into Sum operation, and determine the operation result according to the sum operation; output the operation result to the lower layer convolutional network.
  • the winograd algorithm When performing convolution processing on feature data, the winograd algorithm is used. This algorithm can convert multiplication to addition. At the same time, the data transformation operation in the algorithm is converted to a summation operation, which can further reduce the number of multiplications, thereby reducing the computer system The performance loss of the system increases the computing speed.
  • Fig. 1 is a structural diagram of a processing system shown in an exemplary embodiment
  • Fig. 2 is a flowchart of an operation method shown in an exemplary embodiment of the present invention
  • Fig. 3 is a flowchart of an operation method shown in an exemplary embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a master-slave processing architecture shown in an exemplary embodiment of the present invention.
  • Fig. 5 is a schematic diagram of an arithmetic device shown in an exemplary embodiment of the present invention.
  • Fig. 6 is a structural block diagram of a board according to an exemplary embodiment of the present invention.
  • the term “if” can be interpreted as “when” or “once” or “in response to determination” or “in response to detection” depending on the context.
  • the phrase “if determined” or “if detected [described condition or event]” can be interpreted as meaning “once determined” or “in response to determination” or “once detected [described condition or event]” depending on the context ]” or “in response to detection of [condition or event described]”.
  • Convolution operation refers to opening an active window with the same size as the template from the upper left corner of the image.
  • the active window corresponds to a window image, which is the convolution kernel
  • the window image corresponds to the pixels in the image Multiply and add, and use the calculation result as the first pixel value of the new image after the convolution operation.
  • the active window moves one column to the right, the window image corresponding to the active window and the pixels in the image are multiplied and then added, and the calculation result is used as the second pixel value of the new image after the convolution operation.
  • Winograd convolution operation is a convolution acceleration implementation method based on polynomial interpolation algorithm. It passes the two inputs of the convolution operation: the first target matrix and the second target matrix are respectively subjected to Winograd convolution positive transformation, and then the first target matrix and the second target matrix after the positive transformation are subjected to bitwise multiplication, and finally The Winograd convolution inverse transformation is performed again on the result of the bit multiplication, and the convolution result equivalent to the original convolution operation is obtained.
  • Convolutional neural network model is a type of feedforward neural network model that includes convolution calculation and has a deep structure, and is one of the representative models of deep learning.
  • Convolutional layer fully connected layer and other network layers in the convolutional neural network model, it is necessary to perform convolution operations on neurons and convolution kernels to obtain feature data, which is widely used in image classification and image recognition.
  • the operation method according to the embodiment of the present disclosure can be applied to any one processor of a processing system (for example, an artificial intelligence chip) including multiple processors (multi-core).
  • the processor may be a general-purpose processor, such as a CPU (Central Processing Unit, central processing unit), or an artificial intelligence processor (IPU) for performing artificial intelligence operations.
  • Artificial intelligence operations can include machine learning operations, brain-like operations, and so on. Among them, machine learning operations include neural network operations, k-means operations, support vector machine operations, and so on.
  • the artificial intelligence processor may, for example, include GPU (Graphics Processing Unit), NPU (Neural-Network Processing Unit, neural network processing unit), DSP (Digital Signal Process, digital signal processing unit), field programmable gate array (Field-Programmable Gate Array, FPGA) One or a combination of chips.
  • GPU Graphics Processing Unit
  • NPU Neuro-Network Processing Unit
  • DSP Digital Signal Process, digital signal processing unit
  • field programmable gate array Field-Programmable Gate Array, FPGA
  • the processor mentioned in the present disclosure may include multiple processing units, and each processing unit can independently run various tasks assigned to it, such as convolution computing tasks and pooling tasks. Or fully connected tasks, etc.
  • the present disclosure does not limit the processing unit and the tasks run by the processing unit.
  • Fig. 1 shows a schematic diagram of a processing system of an arithmetic method according to an embodiment of the present disclosure.
  • the processing system 100 includes multiple processors 101 and a memory 102.
  • the multiple processors 101 are used to execute instruction sequences.
  • the memory 102 is used to store data, and may include random access memory (RAM, Random Access Memory) and registers. heap.
  • RAM random access memory
  • the multiple processors 101 in the processing system 100 can not only share part of the storage space, for example, share part of the RAM storage space and the register file, but also have their own storage space at the same time.
  • the processing system can perform convolution operations on the feature data input to the layer according to the weight of a convolution layer, and obtain the convolution result, and then use the convolution result as the next convolution
  • the next convolutional layer continues to use the weight data of this layer to perform convolution calculation on the input feature data.
  • the convolution method can extract the features in the original data, for example, extract the image features, so as to output the required results according to these features.
  • the winograd algorithm when convolution operation is performed on the input feature data according to the weight data of the convolution layer, the winograd algorithm is used, and the transformation operation is divided into calculations. Sum operation to reduce the amount of multiplication processing, thereby reducing the performance loss of the computing system and improving computing efficiency.
  • Fig. 2 is a flowchart of an operation method shown in an exemplary embodiment of the present invention.
  • the calculation method provided by this embodiment includes:
  • Step 201 Obtain feature data output by the upper-layer convolutional network and a feature transformation matrix used to perform positive transformation on the feature data.
  • the computer system used to execute the solution of the embodiment may be connected to a terminal device.
  • the terminal device can send the original data to the computer system.
  • the computer system can use the method provided in this embodiment to process the original data, extract the features in the original data, and then feed back the recognition results to the terminal device based on these features, such as feeding back the original data correspondence Information.
  • the original data may be picture data
  • the terminal device may upload the original picture to the computer system, the computer system extracts the features included in the picture, determines a recognition result based on the characteristics, and then feeds back the recognition result to the terminal device.
  • a layer of convolutional network can output the operation result obtained by convolution to the next layer of convolutional network, so that the convolutional network can obtain the feature data output by the upper layer of convolutional network.
  • This layer of convolutional network can perform convolution calculation on the feature data according to the weight of this layer, so as to obtain the calculation result.
  • the winograd algorithm may be used to transform the characteristic data.
  • a winograd domain refers to a domain that has been transformed by winograd.
  • feature transformation matrices B and B T used to perform positive transformation on feature data d can also be obtained.
  • the number of multiplication operations is relatively large.
  • the winograd algorithm for convolution processing the number of multiplications can be reduced, thereby reducing the performance loss caused by the operation.
  • the method provided in this embodiment can obtain the feature transformation matrix for forward transformation of the feature data.
  • the A, A T, B, B T, G, G T is a fixed matrix.
  • the size of d can be determined according to the size of the required output result Y, the weight data g, and the sliding step length of the convolution process, and then the corresponding A, AT , B, B T , G, G T.
  • Step 202 Transform the feature data according to the feature transformation matrix to obtain a feature transformation result; wherein, the transformation operation of the feature data is disassembled into a summation operation, and the feature transformation result is determined according to the summation operation.
  • the characteristic data d can be transformed by the characteristic transformation matrix B and B T to obtain the characteristic change result, that is, the result of determining the B T dB.
  • the transformation operation of the characteristic data is disassembled into a summation operation, and the characteristic transformation result is determined according to the summation operation.
  • the transformation operation of the feature data can be disassembled into multiple sub-transformation results, and then the sum result of the sub-transformation results can be determined as the feature transformation result.
  • the replacement matrix corresponding to each element in d can be preset, for example, d 00 corresponds to a matrix D 00 , and d 01 corresponds to a D 01 whereasD 33 corresponds to a D 33 .
  • the replacement matrix can be a matrix including 0, 1, -1.
  • the replacement matrix corresponding to d can be directly read, and each element in d can be extracted, and the element can be multiplied by the corresponding replacement matrix and then added to obtain the transformation result.
  • Specific characteristic data can be converted according to the size, feature matrix B, wherein determining an inverse transformation matrix B T substitution matrix, when the conversion characteristic data can be read directly converting the stored characteristics beforehand substitution matrix.
  • the multiplication operation of a single element and the replacement matrix can reduce the number of multiplications. Especially when the replacement matrix is composed of 0, 1, -1, the amount of calculation can be greatly reduced.
  • the feature data is a 4 ⁇ 4 matrix, which includes 16 elements d 00 , d 01 ... d 33 altogether. At this time, there may be 16 replacement matrices D 01 , D 01 ... D 33 corresponding to these elements. . In specific calculations,
  • the multiplication of the element and the replacement matrix is changed to the process of directly writing data.
  • the 1 in the replacement matrix is multiplied by d 00 , and the actual result is directly written into d 00 . Therefore, based on the method provided in this embodiment, the transformation process in the winograd algorithm can be converted into an addition algorithm, thereby further reducing the computational complexity of the convolution process.
  • Step 203 Obtain the weight transformation result of the positive transformation of the convolutional network of this layer, and perform a bitwise multiplication operation on the feature transformation result and the weight transformation result to obtain the multiplication operation result.
  • the weight data g of the convolutional network of the current layer and the weight transformation matrices G T , G used to transform the weight data can be obtained. Then use the weight transformation matrix to transform the weight data to obtain the weight transformation result.
  • the replacement matrix corresponding to each element in g can be stored in advance, and then the positive transformation operation of the weight can be converted into a summation operation through these replacement matrices.
  • the weight transformation matrix corresponding to the weight data can be determined in advance, and The weight transformation result is determined in advance according to the weight data and the corresponding weight transformation matrix.
  • the predetermined weight change result can be directly read.
  • the predetermined weight transformation result can be stored in the storage unit, and the weight transformation result can be directly read when needed. Thereby, the performance loss caused by the positive transformation of the weight data is further reduced.
  • the time sequence for obtaining the weight change result and determining the feature transformation result is not limited.
  • the two results can be multiplied by bit. That is, after obtaining B T dB and G T gG, the two matrices can be multiplied by bit to determine the result of (GgG T ) ⁇ (B T dB).
  • the values at the corresponding positions of the two transformation results can be multiplied to obtain a new matrix as the result of the multiplication operation.
  • the result of feature data transformation is:
  • Step 204 Obtain an inverse transformation matrix for inversely transforming the result of the multiplication operation, and transform the result of the multiplication operation according to the inverse transformation matrix to obtain the operation result; wherein, the transformation operation of the multiplication operation result is disassembled into a summation operation, and according to The summation operation determines the result of the operation.
  • the inverse transformation matrices A and AT used for inverse transformation of the multiplication operation result can also be obtained.
  • the inverse transformation matrix can be determined according to the size of the operation result.
  • the inverse transformation matrix can be used to inversely transform the result of the multiplication operation, that is, to determine
  • the replacement matrix corresponding to each element included in the multiplication operation can be determined in advance, so that the inverse transformation operation can be disassembled into a summation operation according to these replacement matrices, and the operation result can be determined according to the summation operation.
  • the specific disassembly method is similar to the disassembly method for the feature transformation operation, and the convolution operation result can be obtained through fewer multiplication methods.
  • Step 205 Output the calculation result to the lower layer convolutional network.
  • the convolutional network of this layer can output the determined calculation result to the lower layer convolutional network, so as to use it as the input feature of the lower layer convolutional network, and the lower layer convolutional network can be based on the The weight data performs convolution calculation on the input data.
  • the above-mentioned calculation method can also be used, that is, the winograd algorithm is adopted, and the change operation in the algorithm is converted into a summation operation.
  • the method provided in this embodiment is used to perform convolution operations, and the method is executed by a device provided with the method provided in this embodiment, and the device is usually implemented in hardware and/or software.
  • the operation method provided by this embodiment includes obtaining the feature data output by the upper-layer convolutional network and the feature transformation matrix used for forward transformation of the feature data; transforming the feature data according to the feature transformation matrix to obtain the feature transformation result; where The transformation operation of feature data is disassembled into a summation operation, and the feature transformation result is determined according to the summation operation; the weight transformation result of the positive transformation of the convolutional network of this layer is obtained, and the feature transformation result and the weight transformation result are subjected to bitwise multiplication Operate to obtain the result of the multiplication operation; obtain the inverse transformation matrix for inverse transformation of the result of the multiplication operation, and transform the result of the multiplication operation according to the inverse transformation matrix to obtain the operation result; among them, the transformation operation of the multiplication operation result is disassembled into a summation Calculate, and determine the result of the operation according to the summation operation; output the result of the operation to the lower layer convolutional network.
  • the computer system uses the winograd algorithm when performing convolution processing on the feature data, which can convert multiplication to addition, and convert the change process in the algorithm to a summation operation, thereby further Reduce the multiplication operation in the data processing process, can reduce the performance loss of the computer system, and increase the calculation speed.
  • Fig. 3 is a flowchart of an operation method shown in an exemplary embodiment of the present invention.
  • the calculation method provided by this embodiment includes:
  • Step 301 Obtain feature data output by the upper-layer convolutional network and a feature transformation matrix used for positive transformation of the feature data.
  • step 301 and step 201 are similar, and will not be described again.
  • Step 302 Disassemble the feature data into multiple feature sub-tensors.
  • the positive transformation of the characteristic data can be disassembled into a summation operation, thereby reducing the number of multiplication operations.
  • the feature data can be disassembled into multiple feature sub-tensors.
  • the sum of multiple feature sub-tensors is feature data
  • the number of multiple feature sub-tensors is the same as the number of non-zero elements in the feature data
  • each feature sub-tensor has a single non-zero element
  • the feature The non-zero elements in the sub-tensor are the same as the non-zero elements in the corresponding position in the feature data.
  • the characteristic data d is:
  • the feature data can be divided into 16 (assuming that the elements in the feature data are all non-zero), and the feature sub-tensors are:
  • Step 303 Perform a transformation operation on the multiple feature sub-tensors according to the feature transformation matrix and sum them to obtain a feature transformation result.
  • the feature transformation matrix can be used to transform each feature sub-tensor, and then the transformation results of the feature sub-tensors can be added to obtain the feature transformation result.
  • the result of transforming the feature sub-data and adding the transformation results is the same as the transformation result of the feature data.
  • the above transformation can be performed, and then the transformation results of each feature sub-tensor are added to obtain the feature transformation result.
  • the feature element sub-tensor is a tensor in which the non-zero elements of the feature sub-tensor are set to 1;
  • the result of the feature transformation is obtained by summing the transformation results of the feature subtensors.
  • the non-zero elements in the feature sub-tensor can be identified, and the position corresponding to the non-zero elements can be set to 1, to obtain the feature element sub-tensor, for example, for the feature sub-tensor
  • the corresponding feature element sub-tensor is:
  • the corresponding feature element sub-tensor can be determined.
  • the transformation result of the feature sub-tensor can be determined according to the feature transformation matrix, the feature element sub-tensor and the corresponding non-zero elements.
  • the left side of the eigen-element sub-tensor can be multiplied by the left multiplication matrix in the feature transformation matrix
  • the right side can be multiplied by the right-multiplication matrix in the feature transformation matrix
  • the result can be multiplied with the non-zero elements corresponding to the eigen-element sub-tensor to get
  • the transformation result of the feature sub-tensor among them, the left multiplication matrix and the right multiplication matrix in the feature sub-tensor are both determined by the size of the feature sub tensor
  • B T and B can be determined according to the size of the feature data, and the feature element sub-tensor can also be determined in advance according to the feature data. Therefore, the replacement matrix corresponding to the position of each element in the feature data can also be determined in advance according to B T, B, and feature element sub-tensors.
  • the replacement matrix is:
  • the corresponding replacement matrix can be determined for each element position in the feature data.
  • the corresponding replacement matrix set can be determined directly according to the data size, and then the feature transformation result can be determined according to the replacement matrix set.
  • Step 304 Obtain the weight data of the convolutional network of the current layer and the weight transformation matrix used for positive transformation of the weight data.
  • Step 305 Transform the weight data according to the weight transformation matrix to obtain a weight transformation result; wherein, the transformation operation of the weight data is disassembled into a summation operation, and the weight transformation result is determined according to the summation operation.
  • the weight data can be transformed according to the weight transformation matrix to obtain the weight transformation result.
  • the transformation operation of the weight data in order to reduce the multiplication operation, can be disassembled into a summation operation, and the weight transformation result can be determined according to the summation operation.
  • weight data can be transformed based on the following formula:
  • G T and G are weight transformation matrices
  • g is weight data.
  • the weight data can be disassembled into multiple weight sub-tensors; then the multiple weight sub-tensors are transformed and summed according to the weight transformation matrix, Get the result of weight transformation.
  • the sum of multiple weight sub-tensors is weight data
  • the number of multiple weight sub-tensors is the same as the number of non-zero elements in the weight data
  • each weight sub-tensor has a single non-zero value.
  • the non-zero element in the weight sub-tensor is the same as the non-zero element in the corresponding position in the weight data.
  • the weight data g is:
  • weight data can also be split into 16 weight sub-tensors, which are:
  • weight sub-tensors For example, for one of the weight sub-tensors, it can be transformed based on the following formula:
  • the above transformation can be performed, and then the transformation results of each weight sub-tensor are added to obtain the weight transformation result.
  • the corresponding weight element sub-tensor is:
  • the corresponding weight element sub-tensor can be determined.
  • the transformation result of the weight sub-tensor can be determined according to the weight transformation matrix, the weight element sub-tensor and the corresponding non-zero elements.
  • the left side of the weight element sub-tensor can be multiplied by the left multiplication matrix in the weight transformation matrix, and the right side can be multiplied by the right multiplication matrix in the weight transformation matrix, and the result can be the non-zero element corresponding to the weight element sub-tensor.
  • G T and G can be determined according to the size of the weight data, and the weight element sub-tensor can also be determined in advance according to the weight data. Therefore, the replacement matrix corresponding to the position of each element in the weight data can also be determined in advance according to G T, G and the weight element sub-tensor.
  • the replacement matrix is:
  • the corresponding replacement matrix can be determined for each element position in the weight data.
  • the corresponding replacement matrix set can be determined directly according to the data size, and then the weight transformation result can be determined according to the replacement matrix set.
  • G T gG g 00 ⁇ D' 01 +g 01 ⁇ D' 01 ...g 33 ⁇ D' 33
  • Step 306 Perform a bitwise multiplication operation on the feature transformation result and the weight transformation result to obtain the multiplication operation result.
  • Step 306 is similar to the implementation principle and method of performing bit multiplication on the feature transformation result and the weight transformation result in step 203, and will not be repeated here.
  • Step 307 Decompose the result of the multiplication operation into multiple resultant sub-tensors.
  • Step 308 Perform a transformation operation on the multiple resultant sub-tensors according to the inverse transformation matrix and sum them to obtain the operation result.
  • the multiplication operation result data can be transformed according to the inverse transformation matrix to obtain the operation result.
  • the transformation operation of the multiplication operation result can be disassembled into a summation operation, and the operation result can be determined according to the summation operation.
  • a T and A are inverse transformation matrices
  • p is the result of the multiplication operation.
  • the sum of multiple resultant subtensors is the result of the multiplication operation
  • the number of multiple resultant subtensors is the same as the number of non-zero elements in the result of the multiplication operation
  • each resultant subtensor has a single non-zero element
  • the non-zero elements in the resulting sub-tensor are the same as the non-zero elements in the corresponding position in the result of the multiplication operation.
  • the multiplication operation result p is:
  • the result of the multiplication operation can also be split into 16 result sub-tensors, which are:
  • the inverse transformation matrix can be used to transform each resultant sub-tensor, and then the transformation results of the resultant sub-tensors can be added to obtain the result of the operation.
  • the above transformation can be performed, and then the transformation results of each result sub-tensor are added to obtain the operation result.
  • the non-zero elements in the result sub-tensor can be identified, and the position corresponding to the non-zero elements can be set to 1, and the result element sub-tensor can be obtained, for example, for the result sub-tensor
  • the corresponding result element sub-tensor can be determined.
  • the result of the operation can be determined according to the inverse transformation matrix, the resultant sub-tensor and its corresponding non-zero elements.
  • the left side of the resultant sub-tensor can be multiplied by the left multiplication matrix in the inverse transformation matrix, and the right side can be multiplied by the right multiplication matrix in the inverse transformation matrix, and the result can be multiplied by the non-zero elements corresponding to the resultant sub-tensor to get The transformation result of the resultant sub-tensor; where, the left-multiplication matrix and the right-multiplication matrix in the resultant sub-tensor are both determined by the scale of the resultant sub-tensor.
  • a T and A can be determined according to the size of the operation result, and the result element sub-tensor can also be determined in advance according to the size of the operation result. Therefore, the replacement matrix corresponding to the position of each element in the multiplication data can also be determined in advance according to AT, A, and the result element sub-tensor.
  • the replacement matrix is:
  • the corresponding replacement matrix can be determined for the position of each element in the multiplication operation result.
  • the corresponding replacement matrix set can be determined directly according to the result or the final operation result size, and then determined according to the replacement matrix set The result of the calculation.
  • Step 309 Output the calculation result to the lower layer convolutional network.
  • step 309 and step 205 are similar, and will not be repeated here.
  • Fig. 4 is a schematic diagram showing a master-slave processing architecture according to an exemplary embodiment of the present invention.
  • the solution of this embodiment also provides a master-slave processing architecture, which can be used to implement the calculation method provided in this embodiment.
  • the master-slave processing structure includes a master functional unit 41 and at least one slave functional unit 42.
  • the main functional unit 41 transforms the characteristic data according to the characteristic transformation matrix to obtain the characteristic transformation result; wherein, the transformation operation of the characteristic data is disassembled into a summation operation, and the characteristic transformation result is determined according to the summation operation.
  • a main storage unit (not shown in the figure) can also be provided, and the main storage unit can be connected to the main function unit 41.
  • a main control unit (not shown in the figure) can respectively send instructions to the main storage unit and the main function unit 41, so that the main storage unit can send characteristic data to the main function unit.
  • the functional unit 42 performs a bitwise multiplication operation on the feature data transformation result and the weight transformation result, and obtains the multiplication operation result.
  • the slave functional unit 42 may perform a bitwise multiplication operation on the received feature data transformation result and the weight transformation result, so as to obtain the multiplication operation result.
  • the data processing process is similar to the foregoing embodiment, and will not be repeated.
  • one main functional unit 41 can be connected to multiple slave functional units 42, and allocation rules can be preset for allocating characteristic data transformation results to the slave functional units 42.
  • the operation process of the main function unit 41 and the slave function unit 42 is a parallel operation.
  • the slave function unit 42 transforms the calculated feature For the element position of the result value, perform the alignment multiplication operation of the feature transformation result and the weight transformation result under the element position, until the alignment multiplication operation value of each element position is calculated, and the multiplication operation result is obtained.
  • steps in the flowchart are displayed in sequence according to the directions of the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless there is a clear description in this article, there is no strict order for the execution of these steps, and these steps can be executed in other orders. Moreover, at least part of the steps in the flowchart may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. The execution of these sub-steps or stages The sequence is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
  • Fig. 5 is a schematic diagram of an arithmetic device shown in an exemplary embodiment of the present invention.
  • the computing device provided in this embodiment includes:
  • the obtaining module 51 is configured to obtain the feature data output by the upper layer convolutional network and the feature transformation matrix used to perform positive transformation on the feature data;
  • the feature transformation module 52 is configured to transform the feature data according to the feature transformation matrix to obtain a feature transformation result; wherein, the transformation operation of the feature data is disassembled into a summation operation, and determined according to the summation operation The feature transformation result;
  • the bit multiplication module 53 is used to obtain the weight conversion result of the positive transformation of the convolutional network of the current layer, and perform bit multiplication on the feature conversion result and the weight conversion result to obtain the multiplication result;
  • the inverse transformation module 54 is configured to obtain an inverse transformation matrix used to inversely transform the multiplication operation result, and transform the multiplication operation result according to the inverse transformation matrix to obtain an operation result; wherein, the multiplication operation result is The transformation operation of is disassembled into a summation operation, and the operation result is determined according to the summation operation;
  • the transmission module 55 is configured to output the calculation result to the lower layer convolutional network.
  • the feature transformation module 52 is specifically configured to:
  • the inverse transform module 54 is specifically configured to:
  • the sum of a plurality of the feature sub-tensors is the feature data; the sum of a plurality of the result sub-tensors is the result of the multiplication operation;
  • the number of the multiple feature sub-tensors is the same as the number of non-zero elements in the feature data, each feature sub-tensor has a single non-zero element, and the non-zero elements in the feature sub-tensor are the same as The non-zero elements at corresponding positions in the feature data are the same;
  • the number of the multiple resultant sub-tensors is the same as the number of non-zero elements in the result of the multiplication operation, each of the resultant sub-tensors has a single non-zero element, and there are non-zero elements in the resultant sub-tensor It is the same as the non-zero element in the corresponding position in the multiplication operation result.
  • the feature transformation module 52 is specifically configured to:
  • the inverse transform module 54 is specifically configured to:
  • the result element sub-tensor is a tensor in which a non-zero element of the result sub-tensor is set to 1;
  • the feature transformation module 52 is specifically configured to:
  • the left multiplication matrix and the right multiplication matrix in the feature element sub-tensor are both determined by the scale of the feature sub-tensor
  • the inverse transform module 54 is specifically configured to:
  • the left multiplication matrix and the right multiplication matrix in the result element sub-tensor are both determined by the scale of the result sub-tensor.
  • the alignment multiplication module 53 is specifically used for:
  • the weight data is transformed according to the weight transformation matrix to obtain the weight transformation result; wherein, the transformation operation of the weight data is disassembled into a summation operation, and the weight is determined according to the summation operation.
  • the result of the value transformation
  • the alignment multiplication module 53 is specifically used for:
  • the alignment multiplication module 53 is specifically used for:
  • the alignment multiplication module 53 is specifically used for:
  • the left multiplication matrix and the right multiplication matrix in the weight sub-tensor are both determined by the scale of the weight sub-tensor.
  • the foregoing device embodiments are only illustrative, and the device of the present disclosure may also be implemented in other ways.
  • the division of units/modules in the above-mentioned embodiments is only a logical function division, and there may be other division methods in actual implementation.
  • multiple units, modules or components may be combined or integrated into another system, or some features may be omitted or not implemented.
  • the functional units/modules in the various embodiments of the present disclosure may be integrated into one unit/module, or each unit/module may exist alone physically, or two or more units/modules may exist.
  • the modules are integrated together.
  • the above-mentioned integrated unit/module can be implemented in the form of hardware or software program module.
  • the hardware may be a digital circuit, an analog circuit, and so on.
  • the physical realization of the hardware structure includes but is not limited to transistors, memristors and so on.
  • the artificial intelligence processor may be any appropriate hardware processor, such as CPU, GPU, FPGA, DSP, ASIC, and so on.
  • the storage unit may be any suitable magnetic storage medium or magneto-optical storage medium, such as RRAM (Resistive Random Access Memory), DRAM (Dynamic Random Access Memory), Static random access memory SRAM (Static Random-Access Memory), enhanced dynamic random access memory EDRAM (Enhanced Dynamic Random Access Memory), high-bandwidth memory HBM (High-Bandwidth Memory), hybrid storage cube HMC (Hybrid Memory Cube), etc. Wait.
  • RRAM Resistive Random Access Memory
  • DRAM Dynamic Random Access Memory
  • Static random access memory SRAM Static Random-Access Memory
  • enhanced dynamic random access memory EDRAM Enhanced Dynamic Random Access Memory
  • high-bandwidth memory HBM High-Bandwidth Memory
  • hybrid storage cube HMC Hybrid Memory Cube
  • the integrated unit/module is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer readable memory.
  • the technical solution of the present disclosure essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory, It includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present disclosure.
  • the aforementioned memory includes: U disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.
  • an artificial intelligence chip is also disclosed, which includes the aforementioned computing device.
  • a board card which includes a storage device, an interface device, a control device, and the aforementioned artificial intelligence chip; wherein, the artificial intelligence chip is connected to the storage device and the control device And the interface devices are respectively connected; the storage device is used to store data; the interface device is used to realize data transmission between the artificial intelligence chip and an external device; the control device is used to The state of the artificial intelligence chip is monitored.
  • Fig. 6 is a structural block diagram of a board card shown in an exemplary embodiment of the present invention.
  • the board card may include other supporting components in addition to the chip 389 described above.
  • the supporting components include, but are not limited to: a storage device 390. Interface device 391 and control device 392;
  • the storage device 390 is connected to the artificial intelligence chip through a bus for storing data.
  • the storage device may include multiple groups of storage units 393. Each group of the storage unit and the artificial intelligence chip are connected through a bus. It can be understood that each group of the storage units may be DDR SDRAM (English: Double Data Rate SDRAM, double-rate synchronous dynamic random access memory).
  • the storage device may include 4 groups of the storage units. Each group of the storage unit may include a plurality of DDR4 particles (chips).
  • the artificial intelligence chip may include four 72-bit DDR4 controllers. In the 72-bit DDR4 controller, 64 bits are used for data transmission and 8 bits are used for ECC verification. It can be understood that when DDR3200 particles are used in each group of the storage units, the theoretical bandwidth of data transmission can reach 25600MB/s.
  • each group of the storage unit includes a plurality of double-rate synchronous dynamic random access memories arranged in parallel.
  • DDR can transmit data twice in one clock cycle.
  • a controller for controlling the DDR is provided in the chip, which is used to control the data transmission and data storage of each storage unit.
  • the interface device is electrically connected with the artificial intelligence chip.
  • the interface device is used to implement data transmission between the artificial intelligence chip and an external device (such as a server or a computer).
  • the interface device may be a standard PCIE interface.
  • the data to be processed is transferred from the server to the chip through a standard PCIE interface to realize data transfer.
  • the interface device may also be other interfaces. The present disclosure does not limit the specific manifestations of the other interfaces mentioned above, as long as the interface unit can realize the switching function.
  • the calculation result of the artificial intelligence chip is still transmitted by the interface device back to an external device (such as a server).
  • the control device is electrically connected with the artificial intelligence chip.
  • the control device is used to monitor the state of the artificial intelligence chip.
  • the artificial intelligence chip and the control device may be electrically connected through an SPI interface.
  • the control device may include a single-chip microcomputer (Micro Controller Unit, MCU).
  • MCU Micro Controller Unit
  • the artificial intelligence chip may include multiple processing chips, multiple processing cores, or multiple processing circuits, and can drive multiple loads. Therefore, the artificial intelligence chip can be in different working states such as multi-load and light-load.
  • the control device can realize the regulation and control of the working states of multiple processing chips, multiple processing and or multiple processing circuits in the artificial intelligence chip.
  • an electronic device which includes the aforementioned artificial intelligence chip.
  • Electronic equipment includes data processing devices, robots, computers, printers, scanners, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cloud servers, cameras, cameras, projectors, watches, headsets , Mobile storage, wearable devices, vehicles, household appliances, and/or medical equipment.
  • the transportation means include airplanes, ships, and/or vehicles;
  • the household appliances include TVs, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods;
  • the medical equipment includes nuclear magnetic resonance, B-ultrasound and/or electrocardiograph.
  • An operation method comprising:
  • the disassembling the transformation operation of the multiplication operation result into a summation operation and determining the operation result according to the summation operation includes:
  • the number of the multiple feature sub-tensors is the same as the number of non-zero elements in the feature data, each feature sub-tensor has a single non-zero element, and the non-zero elements in the feature sub-tensor are the same as The non-zero elements at corresponding positions in the feature data are the same;
  • the number of the multiple resultant sub-tensors is the same as the number of non-zero elements in the result of the multiplication operation, each of the resultant sub-tensors has a single non-zero element, and there are non-zero elements in the resultant sub-tensor Same as the non-zero element in the corresponding position in the result of the multiplication operation
  • the performing a transformation operation on the multiple feature sub-tensors according to the feature transformation matrix and summing them to obtain the feature transformation result includes:
  • the performing a transformation operation on the multiple result levy tensors according to the inverse transformation matrix and summing them to obtain the operation result includes:
  • the result element sub-tensor is a tensor in which a non-zero element of the result sub-tensor is set to 1;
  • the left multiplication matrix and the right multiplication matrix in the feature element sub-tensor are both determined by the scale of the feature sub-tensor
  • the determining the transformation result of the result sub-tensor according to the inverse transformation matrix, the result element sub-tensor and the corresponding non-zero elements includes:
  • the left multiplication matrix and the right multiplication matrix in the result element sub-tensor are both determined by the scale of the result sub-tensor.
  • the obtaining the weight transformation result of the current layer of the convolutional network after the positive transformation includes:
  • the weight data is transformed according to the weight transformation matrix to obtain the weight transformation result; wherein, the transformation operation of the weight data is disassembled into a summation operation, and the weight is determined according to the summation operation.
  • the result of the value transformation
  • the determining the transformation result of the weight sub-tensor according to the weight transformation matrix, the weight element sub-tensor and its corresponding non-zero elements includes:
  • the left multiplication matrix and the right multiplication matrix in the weight sub-tensor are both determined by the scale of the weight sub-tensor.
  • the master-slave processing architecture includes: a master functional unit and at least one slave functional unit.
  • the slave functional unit obtains an inverse transformation matrix for inversely transforming the multiplication operation result, and transforms the multiplication operation result according to the inverse transformation matrix to obtain an operation result; wherein, the multiplication operation result is transformed
  • the operation is disassembled into a summation operation, and the operation result is determined according to the summation operation.
  • the operation process of the main functional unit and the slave functional unit is parallel operation. Before the main functional unit calculates the transformation result value of each element position in the feature data, all The said slave function unit executes the alignment multiplication operation of the feature transformation result and the weight transformation result under the element position for the element position of the calculated feature transformation result value, until the alignment multiplication operation value of each element position is calculated, Obtain the result of the multiplication operation.
  • An arithmetic device including:
  • An acquisition module used to acquire the feature data output by the upper layer convolutional network and the feature transformation matrix used to perform positive transformation on the feature data;
  • the feature transformation module is used to transform the feature data according to the feature transformation matrix to obtain a feature transformation result; wherein the transformation operation of the feature data is disassembled into a summation operation, and the summation operation is determined according to the summation operation.
  • the result of feature transformation is used to transform the feature data according to the feature transformation matrix to obtain a feature transformation result;
  • the bit multiplication module is used to obtain the weight conversion result of the positive transformation of the convolutional network of this layer, and perform bit multiplication on the feature conversion result and the weight conversion result to obtain the multiplication result;
  • the inverse transformation module is used to obtain an inverse transformation matrix for inverse transformation of the multiplication operation result, and transform the multiplication operation result according to the inverse transformation matrix to obtain the operation result; wherein The transformation operation is disassembled into a summation operation, and the operation result is determined according to the summation operation;
  • the transmission module is used to output the calculation result to the lower layer convolutional network.
  • the feature transformation module is specifically configured to:
  • the inverse transform module is specifically used for:
  • the number of the multiple feature sub-tensors is the same as the number of non-zero elements in the feature data, each feature sub-tensor has a single non-zero element, and the non-zero elements in the feature sub-tensor are the same as The non-zero elements at corresponding positions in the feature data are the same;
  • the number of the multiple resultant sub-tensors is the same as the number of non-zero elements in the result of the multiplication operation, each of the resultant sub-tensors has a single non-zero element, and there are non-zero elements in the resultant sub-tensor It is the same as the non-zero element in the corresponding position in the multiplication operation result.
  • the feature transformation module is specifically configured to:
  • the inverse transform module is specifically used for:
  • the result element sub-tensor is a tensor in which a non-zero element of the result sub-tensor is set to 1;
  • the feature transformation module is specifically configured to:
  • the left multiplication matrix and the right multiplication matrix in the feature element sub-tensor are both determined by the scale of the feature sub-tensor
  • the inverse transform module is specifically used for:
  • the left multiplication matrix and the right multiplication matrix in the result element sub-tensor are both determined by the scale of the result sub-tensor.
  • the alignment multiplication module is specifically configured to:
  • the weight data is transformed according to the weight transformation matrix to obtain the weight transformation result; wherein, the transformation operation of the weight data is disassembled into a summation operation, and the weight is determined according to the summation operation.
  • the result of the value transformation
  • the alignment multiplication module is specifically configured to:
  • the alignment multiplication module is specifically configured to:
  • the alignment multiplication module is specifically configured to:
  • the left multiplication matrix and the right multiplication matrix in the weight sub-tensor are both determined by the scale of the weight sub-tensor.
  • An artificial intelligence chip comprising the computing device according to any one of clauses A13-A21.
  • a board card comprising: a storage device, an interface device, a control device, and the artificial intelligence chip as described in clause A22;
  • the artificial intelligence chip is connected to the storage device, the control device, and the interface device respectively;
  • the storage device is used to store data
  • the interface device is used to implement data transmission between the artificial intelligence chip and external equipment
  • the control device is used to monitor the state of the artificial intelligence chip.
  • the storage device includes: multiple groups of storage units, each group of the storage units is connected to the artificial intelligence chip through a bus, and the storage unit is: DDR SDRAM;
  • the chip includes: a DDR controller, which is used to control the data transmission and data storage of each storage unit;
  • the interface device is: a standard PCIE interface.
  • the efficiency of neural network processing can be effectively improved.
  • the above solutions can be implemented based on various architectures, for example, through a master-slave architecture or a general architecture.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Neurology (AREA)
  • Complex Calculations (AREA)

Abstract

An operation method and a related product, the method comprising: acquiring feature data outputted by an upper layer convolutional network and a feature transformation matrix used for performing forward transformation on the feature data (201); on the basis of the feature transformation matrix, transforming the feature data to obtain a feature transformation result, wherein the transformation operation of the feature data is disassembled into a summation operation, and the feature transformation result is determined on the basis of the summation operation (202); acquiring the weight transformation result of the present layer convolutional network after forward transformation and performing a bitwise multiplication operation on the feature transformation result and the weight transformation result to obtain a multiplication operation result (203); acquiring an inverse transformation matrix used for performing inverse transformation on the multiplication operation result and, on the basis of the inverse transformation matrix, performing transformation on the multiplication operation result to obtain an operation result, wherein the transformation operation of the multiplication operation result is disassembled into a summation operation, and the operation result is determined on the basis of the summation operation (204); and outputting the operation result to a lower layer convolutional network (205).

Description

运算方法及相关产品Algorithms and related products
本申请要求于2019年11月1日提交中国专利局、申请号为2019110611184、申请名称为“运算方法及相关产品”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office with the application number 2019110611184 and the application name "Operation Method and Related Products" on November 1, 2019, the entire content of which is incorporated into this application by reference.
技术领域Technical field
本申请涉及深度学习技术领域,尤其涉及一种基于神经网络的运算方法及相关产品。This application relates to the field of deep learning technology, in particular to a neural network-based calculation method and related products.
背景技术Background technique
近年来,深度学习技术得到了飞速发展,特别是在图像识别、语音识别、自然语言分析、智能机器人、大数据分析等领域得到了广泛引用,成为了研究重点。In recent years, deep learning technology has developed rapidly, especially in the fields of image recognition, speech recognition, natural language analysis, intelligent robots, and big data analysis, and has become a research focus.
神经网络模型是深度学习技术中的运算模型,通过利用多层架构以对输入的数据进行处理,并输出相应的运算结果。在现有技术中,对于神经网络模型进行训练是使用神经网络模型进行运算的必要步骤,在训练过程中,待训练的神经网络需要对海量训练数据重复进行迭代运算以得到训练完毕的神经网络模型。The neural network model is an operation model in deep learning technology, which uses a multi-layer architecture to process the input data and output the corresponding operation results. In the prior art, training a neural network model is a necessary step for calculations using the neural network model. During the training process, the neural network to be trained needs to repeatedly perform iterative operations on massive training data to obtain the trained neural network model. .
但是,传统的对海量训练数据进行重复迭代运算的方式将占用大量的运算资源,且由于对数据进行运算的效率较低,其使得训练所需时间较长,运算功耗较大。However, the traditional method of performing repeated iterative operations on massive training data will take up a lot of computing resources, and due to the low efficiency of computing the data, it requires a long training time and large computing power consumption.
发明内容Summary of the invention
基于此,有必要针对上述技术问题,提供一种能够用于提高神经网络模型训练效率,降低训练运算损耗资源的运算方法及相关产品。Based on this, it is necessary to address the above-mentioned technical problems and provide a computing method and related products that can be used to improve the training efficiency of neural network models and reduce training computing resources.
第一方面,本申请提供了一种运算方法,包括:In the first aspect, this application provides an operation method, including:
获取上层卷积网络输出的特征数据以及用于对该特征数据进行正变换的特征变换矩阵;Obtain the feature data output by the upper layer convolutional network and the feature transformation matrix used for positive transformation of the feature data;
根据所述特征变换矩阵对所述特征数据进行变换得到特征变换结果;其中,将所述特征数据的变换运算拆解为求和运算,并根据所述求和运算确定所述特征变换结果;Transforming the feature data according to the feature transformation matrix to obtain a feature transformation result; wherein the transformation operation of the feature data is disassembled into a summation operation, and the feature transformation result is determined according to the summation operation;
获取本层卷积网络经过正变换的权值变换结果,对所述特征变换结果和所述权值变换结果进行对位乘法运算,得到乘法运算结果;Acquiring the weight transformation result of the positive transformation of the convolutional network of this layer, and performing a bitwise multiplication operation on the feature transformation result and the weight transformation result to obtain the multiplication operation result;
获取用于对所述乘法运算结果进行逆变换的逆变换矩阵,根据所述逆变换矩阵对所述乘法运算结果进行变换得到运算结果;其中,将所述乘法运算结果的变换运算拆解为求和运算,并根据所述求和运算确定所述运算结果;Obtain an inverse transformation matrix for inverse transformation of the multiplication operation result, and transform the multiplication operation result according to the inverse transformation matrix to obtain an operation result; wherein, the transformation operation of the multiplication operation result is decomposed into Sum operation, and determine the operation result according to the sum operation;
将所述运算结果输出至下层卷积网络。The result of the operation is output to the lower layer convolutional network.
第二方面,本申请提供了一种运算装置,包括:In the second aspect, this application provides a computing device, including:
获取模块,用于获取上层卷积网络输出的特征数据以及用于对该特征数据进行正变换的特征变换矩阵;An acquisition module, used to acquire the feature data output by the upper layer convolutional network and the feature transformation matrix used to perform positive transformation on the feature data;
特征变换模块,用于根据所述特征变换矩阵对所述特征数据进行变换得到特征变换结果;其中,将所述特征数据的变换运算拆解为求和运算,并根据所述求和运算确定所述特征变换结果;The feature transformation module is used to transform the feature data according to the feature transformation matrix to obtain a feature transformation result; wherein the transformation operation of the feature data is disassembled into a summation operation, and the summation operation is determined according to the summation operation. The result of feature transformation;
对位乘模块,用于获取本层卷积网络经过正变换的权值变换结果,对所述特征变换结果和所述权值变换结果进行对位乘法运算,得到乘法运算结果;The bit multiplication module is used to obtain the weight conversion result of the positive transformation of the convolutional network of this layer, and perform bit multiplication on the feature conversion result and the weight conversion result to obtain the multiplication result;
逆变换模块,用于获取用于对所述乘法运算结果进行逆变换的逆变换矩阵,根据所述逆变换矩阵对所述乘法运算结果进行变换得到运算结果;其中,将所述乘法运算结果的变换运算拆解为求和运算,并根据所述求和运算确定所述运算结果;The inverse transformation module is used to obtain an inverse transformation matrix for inverse transformation of the multiplication operation result, and transform the multiplication operation result according to the inverse transformation matrix to obtain the operation result; wherein The transformation operation is disassembled into a summation operation, and the operation result is determined according to the summation operation;
传输模块,用于将所述运算结果输出至下层卷积网络。The transmission module is used to output the calculation result to the lower layer convolutional network.
第三方面,本申请提供了一种人工智能芯片,所述芯片包括如前任意一项所述的运算装置。In a third aspect, this application provides an artificial intelligence chip, which includes the computing device described in any one of the preceding items.
第四方面,本申请提供了一种电子设备,所述电子设备包括如前所述的人工智能芯片。In a fourth aspect, the present application provides an electronic device including the artificial intelligence chip as described above.
第五方面,本申请提供了一种板卡,所述板卡包括:存储器件、接口装置和控制器件以及如前所述的人工智能芯片;In a fifth aspect, the present application provides a board card, the board card includes: a storage device, an interface device, a control device, and the aforementioned artificial intelligence chip;
其中,所述人工智能芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;Wherein, the artificial intelligence chip is connected to the storage device, the control device, and the interface device respectively;
所述存储器件,用于存储数据;The storage device is used to store data;
所述接口装置,用于实现所述人工智能芯片与外部设备之间的数据传输;The interface device is used to implement data transmission between the artificial intelligence chip and external equipment;
所述控制器件,用于对所述人工智能芯片的状态进行监控。The control device is used to monitor the state of the artificial intelligence chip.
本申请提供的运算方法及相关产品,包括获取上层卷积网络输出的特征数据以及用于对该特征数据进行正变换的特征变换矩阵;根据特征变换矩阵对特征数据进行变换得到特征变换结果;其中,将特征数据的变换运算拆解为求和运算,并根据求和运算确 定特征变换结果;获取本层卷积网络经过正变换的权值变换结果,对特征变换结果和权值变换结果进行对位乘法运算,得到乘法运算结果;获取用于对乘法运算结果进行逆变换的逆变换矩阵,根据逆变换矩阵对乘法运算结果进行变换得到运算结果;其中,将乘法运算结果的变换运算拆解为求和运算,并根据求和运算确定运算结果;将运算结果输出至下层卷积网络。在对特征数据进行卷积处理时,采用winograd算法,该算法能够将乘法转换为加法,同时,将该算法中的数据变换运算转换为求和运算,能够进一步的降低乘法次数,从而降低计算机系统的性能损耗,提高运算速度。The operation method and related products provided by this application include obtaining the feature data output by the upper-layer convolutional network and the feature transformation matrix used for forward transformation of the feature data; transforming the feature data according to the feature transformation matrix to obtain the feature transformation result; wherein , The transformation operation of feature data is disassembled into a summation operation, and the feature transformation result is determined according to the summation operation; the weight transformation result of the positive transformation of the convolutional network of this layer is obtained, and the feature transformation result and the weight transformation result are compared Bit multiplication operation, the multiplication operation result is obtained; the inverse transformation matrix used for inverse transformation of the multiplication operation result is obtained, and the multiplication operation result is transformed according to the inverse transformation matrix to obtain the operation result; among them, the transformation operation of the multiplication operation result is disassembled into Sum operation, and determine the operation result according to the sum operation; output the operation result to the lower layer convolutional network. When performing convolution processing on feature data, the winograd algorithm is used. This algorithm can convert multiplication to addition. At the same time, the data transformation operation in the algorithm is converted to a summation operation, which can further reduce the number of multiplications, thereby reducing the computer system The performance loss of the system increases the computing speed.
附图说明Description of the drawings
包含在说明书中并且构成说明书的一部分的附图与说明书一起示出了本公开的示例性实施例、特征和方面,并且用于解释本公开的原理。The drawings included in the specification and constituting a part of the specification together with the specification illustrate exemplary embodiments, features, and aspects of the present disclosure, and are used to explain the principle of the present disclosure.
图1为一示例性实施例示出的处理系统的结构图;Fig. 1 is a structural diagram of a processing system shown in an exemplary embodiment;
图2为本发明一示例性实施例示出的运算方法的流程图;Fig. 2 is a flowchart of an operation method shown in an exemplary embodiment of the present invention;
图3为本发明一示例性实施例示出的运算方法的流程图;Fig. 3 is a flowchart of an operation method shown in an exemplary embodiment of the present invention;
图4为本发明一示例性实施例示出的主从处理架构示意图;4 is a schematic diagram of a master-slave processing architecture shown in an exemplary embodiment of the present invention;
图5为本发明一示例性实施例示出的运算装置示意图;Fig. 5 is a schematic diagram of an arithmetic device shown in an exemplary embodiment of the present invention;
图6为本发明一示例性实施例示出的板卡的结构框图。Fig. 6 is a structural block diagram of a board according to an exemplary embodiment of the present invention.
具体实施方式Detailed ways
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。The technical solutions in the embodiments of the present disclosure will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are part of the embodiments of the present disclosure, rather than all of the embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by those skilled in the art without creative work shall fall within the protection scope of the present disclosure.
应当理解,本公开的权利要求、说明书及附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。本公开的说明书和权利要求书中使用的术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It should be understood that the terms "first", "second", "third" and "fourth" in the claims, specification and drawings of the present disclosure are used to distinguish different objects, rather than to describe a specific order. . The terms "comprising" and "comprising" used in the specification and claims of the present disclosure indicate the existence of the described features, wholes, steps, operations, elements and/or components, but do not exclude one or more other features, wholes The existence or addition of, steps, operations, elements, components, and/or their collections.
还应当理解,在此本公开说明书中所使用的术语仅仅是出于描述特定实施例的目的,而并不意在限定本公开。如在本公开说明书和权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。还应 当进一步理解,在本公开说明书和权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。It should also be understood that the terms used in this specification of the present disclosure are only for the purpose of describing specific embodiments, and are not intended to limit the present disclosure. As used in the specification and claims of the present disclosure, unless the context clearly indicates otherwise, the singular forms "a", "an" and "the" are intended to include plural forms. It should also be further understood that the term "and/or" used in the specification and claims of this disclosure refers to any combination of one or more of the associated listed items and all possible combinations, and includes these combinations.
如在本说明书和权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。As used in this specification and claims, the term "if" can be interpreted as "when" or "once" or "in response to determination" or "in response to detection" depending on the context. Similarly, the phrase "if determined" or "if detected [described condition or event]" can be interpreted as meaning "once determined" or "in response to determination" or "once detected [described condition or event]" depending on the context ]" or "in response to detection of [condition or event described]".
为了清楚理解本申请的技术方案,下面对现有技术和本申请实施例中涉及的技术术语进行解释:In order to clearly understand the technical solutions of the present application, the technical terms involved in the prior art and the embodiments of the present application are explained below:
卷积运算:卷积运算是指从图像的左上角开始,开一个与模板同样大小的活动窗口,活动窗口对应一个窗口图像,该窗口图像为卷积核,窗口图像与图像中的像素对应起来相乘再相加,并用计算结果作为卷积运算后新图像的第一个像素值。然后,活动窗口向右移动一列,将活动窗口对应的窗口图像与图像中的像素对应起来相乘再相加,并用计算结果作为卷积运算后新图像的第二个像素值。以此类推,从左到右、从上到下,即可得到一幅新图像。Convolution operation: Convolution operation refers to opening an active window with the same size as the template from the upper left corner of the image. The active window corresponds to a window image, which is the convolution kernel, and the window image corresponds to the pixels in the image Multiply and add, and use the calculation result as the first pixel value of the new image after the convolution operation. Then, the active window moves one column to the right, the window image corresponding to the active window and the pixels in the image are multiplied and then added, and the calculation result is used as the second pixel value of the new image after the convolution operation. By analogy, from left to right, from top to bottom, you can get a new image.
Winograd卷积运算:Winograd卷积运算是一种基于多项式插值算法的卷积加速实现方式。它通过对卷积操作的两个输入:第一目标矩阵和第二目标矩阵分别进行Winograd卷积正变换,再将正变换后的第一目标矩阵和第二目标矩阵进行对位乘法,最后对对位乘法结果再次进行Winograd卷积逆变换,得到与原卷积操作等价的卷积结果。Winograd convolution operation: Winograd convolution operation is a convolution acceleration implementation method based on polynomial interpolation algorithm. It passes the two inputs of the convolution operation: the first target matrix and the second target matrix are respectively subjected to Winograd convolution positive transformation, and then the first target matrix and the second target matrix after the positive transformation are subjected to bitwise multiplication, and finally The Winograd convolution inverse transformation is performed again on the result of the bit multiplication, and the convolution result equivalent to the original convolution operation is obtained.
卷积神经网络模型:卷积神经网络模型是一类包含卷积计算且具有深度结构的前馈神经网络模型,是深度学习的代表模型之一。在卷积神经网络模型中的卷积层,全连接层等网络层中均需要对神经元与卷积核进行卷积运算,得到特征数据,其被广泛用于图像分类、图像识别等。Convolutional neural network model: Convolutional neural network model is a type of feedforward neural network model that includes convolution calculation and has a deep structure, and is one of the representative models of deep learning. In the convolutional layer, fully connected layer and other network layers in the convolutional neural network model, it is necessary to perform convolution operations on neurons and convolution kernels to obtain feature data, which is widely used in image classification and image recognition.
根据本公开实施例的运算方法可应用于包括多个处理器(多核)的处理系统(例如人工智能芯片)的任意一个处理器中。该处理器可以是通用处理器,例如CPU(Central Processing Unit,中央处理器),也可以是用于执行人工智能运算的人工智能处理器(IPU)。人工智能运算可包括机器学习运算,类脑运算等。其中,机器学习运算包括神经网络运算、k-means运算、支持向量机运算等。该人工智能处理器可例如包括GPU(Graphics Processing Unit,图形处理单元)、NPU(Neural-Network Processing Unit,神经网络处理单元)、DSP(Digital Signal Process,数字信号处理单元)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)芯片中的一种或组合。本公开对 处理器的具体类型不作限制。此外,处理系统中的多个处理器的类型可以相同或不同,本公开对此不作限制。The operation method according to the embodiment of the present disclosure can be applied to any one processor of a processing system (for example, an artificial intelligence chip) including multiple processors (multi-core). The processor may be a general-purpose processor, such as a CPU (Central Processing Unit, central processing unit), or an artificial intelligence processor (IPU) for performing artificial intelligence operations. Artificial intelligence operations can include machine learning operations, brain-like operations, and so on. Among them, machine learning operations include neural network operations, k-means operations, support vector machine operations, and so on. The artificial intelligence processor may, for example, include GPU (Graphics Processing Unit), NPU (Neural-Network Processing Unit, neural network processing unit), DSP (Digital Signal Process, digital signal processing unit), field programmable gate array (Field-Programmable Gate Array, FPGA) One or a combination of chips. The present disclosure does not limit the specific types of processors. In addition, the types of multiple processors in the processing system may be the same or different, which is not limited in the present disclosure.
在一种可能的实现方式中,本公开中所提及的处理器可包括多个处理单元,每个处理单元可以独立运行所分配到的各种任务,如:卷积运算任务、池化任务或全连接任务等。本公开对处理单元及处理单元所运行的任务不作限制。In a possible implementation manner, the processor mentioned in the present disclosure may include multiple processing units, and each processing unit can independently run various tasks assigned to it, such as convolution computing tasks and pooling tasks. Or fully connected tasks, etc. The present disclosure does not limit the processing unit and the tasks run by the processing unit.
图1示出根据本公开实施例的运算方法的处理系统的示意图。如图1所示,处理系统100包括多个处理器101以及存储器102,多个处理器101用于执行指令序列,存储器102用于存储数据,可包括随机存储器(RAM,Random Access Memory)和寄存器堆。处理系统100中的多个处理器101既可共用部分存储空间,例如共用部分RAM存储空间和寄存器堆,又可同时拥有各自的存储空间。Fig. 1 shows a schematic diagram of a processing system of an arithmetic method according to an embodiment of the present disclosure. As shown in Figure 1, the processing system 100 includes multiple processors 101 and a memory 102. The multiple processors 101 are used to execute instruction sequences. The memory 102 is used to store data, and may include random access memory (RAM, Random Access Memory) and registers. heap. The multiple processors 101 in the processing system 100 can not only share part of the storage space, for example, share part of the RAM storage space and the register file, but also have their own storage space at the same time.
处理系统基于神经网络实现人工智能的功能时,处理系统可以根据一个卷积层的权值对输入该层的特征数据进行卷积运算,并得到卷积结果,再将卷积结果作为下一个卷积层的输入数据,下一个卷积层继续使用本层权值数据对输入的特征数据进行卷积计算。通过卷积方式能够提取原始数据中的特征,例如提取图片特征,从而根据这些特征输出需要的结果。When the processing system implements artificial intelligence functions based on neural networks, the processing system can perform convolution operations on the feature data input to the layer according to the weight of a convolution layer, and obtain the convolution result, and then use the convolution result as the next convolution For the input data of the build-up layer, the next convolutional layer continues to use the weight data of this layer to perform convolution calculation on the input feature data. The convolution method can extract the features in the original data, for example, extract the image features, so as to output the required results according to these features.
为了解决前述提到的技术问题,本实施例提供的运算方法中,在根据卷积层的权值数据对输入的特征数据进行卷积运算时,采用winograd算法,并将变换运算拆分为求和运算,以减少乘法处理量,进而降低运算系统的性能损耗,提高运算效率。In order to solve the aforementioned technical problems, in the operation method provided in this embodiment, when convolution operation is performed on the input feature data according to the weight data of the convolution layer, the winograd algorithm is used, and the transformation operation is divided into calculations. Sum operation to reduce the amount of multiplication processing, thereby reducing the performance loss of the computing system and improving computing efficiency.
图2为本发明一示例性实施例示出的运算方法的流程图。Fig. 2 is a flowchart of an operation method shown in an exemplary embodiment of the present invention.
如图2所示,本实施例提供的运算方法包括:As shown in Figure 2, the calculation method provided by this embodiment includes:
步骤201,获取上层卷积网络输出的特征数据以及用于对该特征数据进行正变换的特征变换矩阵。Step 201: Obtain feature data output by the upper-layer convolutional network and a feature transformation matrix used to perform positive transformation on the feature data.
其中,用于执行本实施例方案的计算机系统可以与一终端设备连接。终端设备可以向计算机系统发送原始数据,计算机系统可以采用本实施例提供的方法对原始数据进行处理,提取原始数据中的特征,进而可以根据这些特征向终端设备反馈识别结果,例如反馈原始数据对应的信息。Among them, the computer system used to execute the solution of the embodiment may be connected to a terminal device. The terminal device can send the original data to the computer system. The computer system can use the method provided in this embodiment to process the original data, extract the features in the original data, and then feed back the recognition results to the terminal device based on these features, such as feeding back the original data correspondence Information.
具体的,原始数据例如可以是图片数据,可以由终端设备向计算机系统上传原始图片,由计算机系统提取该图片中包括的特征,并根据特征确定一识别结果,再向终端设备反馈识别结果。Specifically, the original data may be picture data, for example, the terminal device may upload the original picture to the computer system, the computer system extracts the features included in the picture, determines a recognition result based on the characteristics, and then feeds back the recognition result to the terminal device.
其中,一层卷积网络可以将卷积得到的运算结果输出至下一层卷积网络,从而使得该卷积网络获取上层卷积网络输出的特征数据。Among them, a layer of convolutional network can output the operation result obtained by convolution to the next layer of convolutional network, so that the convolutional network can obtain the feature data output by the upper layer of convolutional network.
该层卷积网络可以根据本层的权值对特征数据进行卷积计算,从而得到运算结果。本实施例提供的方法中,可以采用winograd算法对特征数据进行变换。This layer of convolutional network can perform convolution calculation on the feature data according to the weight of this layer, so as to obtain the calculation result. In the method provided in this embodiment, the winograd algorithm may be used to transform the characteristic data.
其中,通过winograd算法进行卷积运算时,可以采用下式进行计算:Among them, when the convolution operation is performed by the winograd algorithm, the following formula can be used for calculation:
Y=A T[(GgG T)⊙(B TdB)]A Y=A T [(GgG T )⊙(B T dB)] A
其中,Y用于表示卷积矩阵,即使用特征数据与权值数据进行卷积运算得到的结果矩阵;d用于表示输入的特征数据;g用于表示神经网络中的权值数据;B用于表示将特征数据从原域转换至winograd域的特征变换矩阵;B T用于表示将特征数据从winograd域转换至原域的特征逆变换矩阵;G用于表示将权值数据从原域转换至winograd域的权值变换矩阵;G T用于表示将权值数据从winograd域转换至原域的权值逆变换矩阵;A用于表示将对位乘后的运算结果从原域转换至winograd域的逆变换运算的变换矩阵;A T用于表示将对位乘后的运算结果从winograd域转换至原域的逆变换运算的逆变换矩阵。 Among them, Y is used to represent the convolution matrix, that is, the result matrix obtained by convolution operation using feature data and weight data; d is used to represent the input feature data; g is used to represent the weight data in the neural network; B is used Yu represents the feature transformation matrix that transforms the feature data from the original domain to the winograd domain; B T is used to represent the feature inverse transformation matrix that transforms the feature data from the winograd domain to the original domain; G is used to represent the conversion of the weight data from the original domain The weight transformation matrix to the winograd domain; G T is used to represent the weight inverse transformation matrix that converts the weight data from the winograd domain to the original domain; A is used to represent the conversion of the result of the bitwise multiplication from the original domain to winograd The transformation matrix of the inverse transformation operation of the domain; AT is used to represent the inverse transformation matrix of the inverse transformation operation of converting the result of the bitwise multiplication from the winograd domain to the original domain.
需要说明的是,上述的原域是指未经过winograd变换的域,而winograd域是指经过winograd变换的域。It should be noted that the above-mentioned original domain refers to a domain that has not been transformed by winograd, and a winograd domain refers to a domain that has been transformed by winograd.
具体的,还可以获取用于对特征数据d进行正变换的特征变换矩阵B以及B T Specifically, feature transformation matrices B and B T used to perform positive transformation on feature data d can also be obtained.
进一步的,传统的卷积运算中乘法运算次数较多,通过采用winograd算法进行卷积处理,能够降低乘法次数,从而降低运算带来的性能损耗。Further, in the traditional convolution operation, the number of multiplication operations is relatively large. By using the winograd algorithm for convolution processing, the number of multiplications can be reduced, thereby reducing the performance loss caused by the operation.
实际应用时,在winograd算法中,需要对特征数据进行正变换,因此,本实施例提供的方法可以获取用于对特征数据进行正变换的特征变换矩阵。In actual application, in the winograd algorithm, the feature data needs to be forward transformed. Therefore, the method provided in this embodiment can obtain the feature transformation matrix for forward transformation of the feature data.
winograd算法中,若d、g的尺寸固定,则A、A T、B、B T、G、G T矩阵也是固定的。具体可以根据需要的输出结果Y的尺寸、权值数据g以及卷积过程的滑动步长,确定d的尺寸,进而根据这些数据的尺寸确定对应的A、A T、B、B T、G、G Twinograd algorithm, if d, g size is fixed, the A, A T, B, B T, G, G T is a fixed matrix. Specifically, the size of d can be determined according to the size of the required output result Y, the weight data g, and the sliding step length of the convolution process, and then the corresponding A, AT , B, B T , G, G T.
步骤202,根据特征变换矩阵对特征数据进行变换得到特征变换结果;其中,将特征数据的变换运算拆解为求和运算,并根据求和运算确定特征变换结果。Step 202: Transform the feature data according to the feature transformation matrix to obtain a feature transformation result; wherein, the transformation operation of the feature data is disassembled into a summation operation, and the feature transformation result is determined according to the summation operation.
其中,可以使用特征变换矩阵B、B T对特征数据d进行变换,得到特征变化结果,即确定B TdB的结果。 Among them, the characteristic data d can be transformed by the characteristic transformation matrix B and B T to obtain the characteristic change result, that is, the result of determining the B T dB.
进一步的,为了进一步的降低乘法次数,降低运算带来的性能损耗,本实施例提供的方法中,将特征数据的变换运算拆解为求和运算,并根据求和运算确定特征变换结果。Further, in order to further reduce the number of multiplications and reduce the performance loss caused by the operation, in the method provided in this embodiment, the transformation operation of the characteristic data is disassembled into a summation operation, and the characteristic transformation result is determined according to the summation operation.
实际应用时,可以将特征数据的变换运算拆解为多个子变换结果,再将子变换结果的求和结果确定为特征变换结果。In practical applications, the transformation operation of the feature data can be disassembled into multiple sub-transformation results, and then the sum result of the sub-transformation results can be determined as the feature transformation result.
其中,以B TdB进行举例来说,假设特征数据是4×4的矩阵,则可以预先设置d中每 个元素对应的替换矩阵,例如d 00对应一矩阵D 00,d 01对应一D 01……d 33对应一D 33。替换矩阵可以是包括0、1、﹣1的矩阵。 Taking B T dB as an example, assuming that the characteristic data is a 4×4 matrix, the replacement matrix corresponding to each element in d can be preset, for example, d 00 corresponds to a matrix D 00 , and d 01 corresponds to a D 01 ……D 33 corresponds to a D 33 . The replacement matrix can be a matrix including 0, 1, -1.
在对d进行变换时,可以直接读取与d对应的替换矩阵,并提取d中的每个元素,用该元素与相应的替换矩阵相乘再相加,得到变换结果。具体可以根据特征数据的尺寸、特征转换矩阵B、特征逆变换矩阵B T确定替换矩阵,在对特征数据进行变换时,可以直接读取预先存储的特征转换替换矩阵。 When transforming d, the replacement matrix corresponding to d can be directly read, and each element in d can be extracted, and the element can be multiplied by the corresponding replacement matrix and then added to obtain the transformation result. Specific characteristic data can be converted according to the size, feature matrix B, wherein determining an inverse transformation matrix B T substitution matrix, when the conversion characteristic data can be read directly converting the stored characteristics beforehand substitution matrix.
具体的,通过单个元素与替换矩阵进行相乘的运算,能够降低乘法次数。尤其在替换矩阵由0、1、﹣1组成的时候,更能够大幅的降低运算量。例如,特征数据是4×4的矩阵,共包括d 00、d 01……d 33这个16个元素,此时,可以具有与这些元素对应的16个替换矩阵D 01、D 01……D 33。具体运算时, Specifically, the multiplication operation of a single element and the replacement matrix can reduce the number of multiplications. Especially when the replacement matrix is composed of 0, 1, ﹣1, the amount of calculation can be greatly reduced. For example, the feature data is a 4×4 matrix, which includes 16 elements d 00 , d 01 ... d 33 altogether. At this time, there may be 16 replacement matrices D 01 , D 01 ... D 33 corresponding to these elements. . In specific calculations,
B TdB=d 00×D 01+d 01×D 01…d 33×D 33 B T dB=d 00 ×D 01 +d 01 ×D 01 …d 33 ×D 33
当替换矩阵由0、1、﹣1组成时,元素与替换矩阵的乘法被改变为直接写入数据的过程,例如替换矩阵中的1与d 00相乘,实际结果为直接写入d 00。因此,基于本实施例提供的方法,能够将winograd算法中的变换过程转换为加法算法,从而进一步的降低卷积过程的运算量。 When the replacement matrix is composed of 0, 1, ﹣1, the multiplication of the element and the replacement matrix is changed to the process of directly writing data. For example, the 1 in the replacement matrix is multiplied by d 00 , and the actual result is directly written into d 00 . Therefore, based on the method provided in this embodiment, the transformation process in the winograd algorithm can be converted into an addition algorithm, thereby further reducing the computational complexity of the convolution process.
步骤203,获取本层卷积网络经过正变换的权值变换结果,对特征变换结果和权值变换结果进行对位乘法运算,得到乘法运算结果。Step 203: Obtain the weight transformation result of the positive transformation of the convolutional network of this layer, and perform a bitwise multiplication operation on the feature transformation result and the weight transformation result to obtain the multiplication operation result.
在一种实施方式中,可以与步骤201、202类似,获取本层卷积网络的权值数据g,以及用于对权值数据进行变换的权值变换矩阵G T、G。再利用权值变换矩阵对权值数据进行变换,得到权值变换结果。 In an embodiment, similar to steps 201 and 202, the weight data g of the convolutional network of the current layer and the weight transformation matrices G T , G used to transform the weight data can be obtained. Then use the weight transformation matrix to transform the weight data to obtain the weight transformation result.
即确定G TgG的结果。在确定的过程中,也可以按照上述方式将其拆解为求和运算,从而降低运算过程的性能损耗。例如,可以预先存储与g中每个元素对应的替换矩阵,进而可以通过这些替换矩阵将权值的正变换运算转换为求和运算。 That is to determine the result of G T gG. In the determination process, it can also be disassembled into a summation operation in the above-mentioned manner, thereby reducing the performance loss of the operation process. For example, the replacement matrix corresponding to each element in g can be stored in advance, and then the positive transformation operation of the weight can be converted into a summation operation through these replacement matrices.
在另一种实施方式中,由于使用神经网络进行数据处理时,每个卷积层中的权值数据是固定不变的,因此,可以预先确定与权值数据对应的权值变换矩阵,并根据权值数据及其对应的权值变换矩阵预先确定权值变换结果。在需要对特征数据进行卷积计算时,可以直接读取预先确定的权值变化结果。例如,可以将预先确定的权值变换结果存储在存储单元中,在需要的时候可以直接读取该权值变换结果。从而更进一步的降低对权值数据进行正变换引起的性能损耗。In another embodiment, since the weight data in each convolutional layer is fixed when the neural network is used for data processing, the weight transformation matrix corresponding to the weight data can be determined in advance, and The weight transformation result is determined in advance according to the weight data and the corresponding weight transformation matrix. When it is necessary to perform convolution calculation on the feature data, the predetermined weight change result can be directly read. For example, the predetermined weight transformation result can be stored in the storage unit, and the weight transformation result can be directly read when needed. Thereby, the performance loss caused by the positive transformation of the weight data is further reduced.
可选的,获取权值变化结果以及确定特征变换结果的时序不做限制。Optionally, the time sequence for obtaining the weight change result and determining the feature transformation result is not limited.
其中,在确定了特征变换结果,并获取了权值变换结果之后,可以对这两个结果进行 对位乘运算。即得到了B TdB以及G TgG之后,可以对这两个矩阵进行对位乘运算,即确定(GgG T)⊙(B TdB)的结果。 Among them, after the feature transformation result is determined and the weight transformation result is obtained, the two results can be multiplied by bit. That is, after obtaining B T dB and G T gG, the two matrices can be multiplied by bit to determine the result of (GgG T )⊙(B T dB).
实际应用时,可以将两个变换结果相应位置的数值相乘,从而得到新的矩阵作为乘法运算结果。例如特征数据变换结果为:In practical applications, the values at the corresponding positions of the two transformation results can be multiplied to obtain a new matrix as the result of the multiplication operation. For example, the result of feature data transformation is:
Figure PCTCN2020113166-appb-000001
Figure PCTCN2020113166-appb-000001
权值变换结果为:
Figure PCTCN2020113166-appb-000002
The result of weight transformation is:
Figure PCTCN2020113166-appb-000002
则乘法运算结果为:
Figure PCTCN2020113166-appb-000003
The result of the multiplication operation is:
Figure PCTCN2020113166-appb-000003
步骤204,获取用于对乘法运算结果进行逆变换的逆变换矩阵,根据逆变换矩阵对乘法运算结果进行变换得到运算结果;其中,将乘法运算结果的变换运算拆解为求和运算,并根据求和运算确定运算结果。Step 204: Obtain an inverse transformation matrix for inversely transforming the result of the multiplication operation, and transform the result of the multiplication operation according to the inverse transformation matrix to obtain the operation result; wherein, the transformation operation of the multiplication operation result is disassembled into a summation operation, and according to The summation operation determines the result of the operation.
具体的,还可以获取用于对乘法运算结果进行逆变换的逆变换矩阵A、A T。如上所述,可以根据运算结果的尺寸确定逆变换矩阵。 Specifically, the inverse transformation matrices A and AT used for inverse transformation of the multiplication operation result can also be obtained. As described above, the inverse transformation matrix can be determined according to the size of the operation result.
进一步的,可以使用逆变换矩阵对乘法运算结果进行逆变换,即确定Further, the inverse transformation matrix can be used to inversely transform the result of the multiplication operation, that is, to determine
Y=A T[(GgG T)⊙(B TdB)]A。 Y=A T [(GgG T )⊙(B T dB)] A.
实际应用时,可以预先确定与乘法运算中包括的每个元素对应的替换矩阵,从而可以根据这些替换矩阵对逆变换运算拆解为求和运算,并根据求和运算确定运算结果。In practical applications, the replacement matrix corresponding to each element included in the multiplication operation can be determined in advance, so that the inverse transformation operation can be disassembled into a summation operation according to these replacement matrices, and the operation result can be determined according to the summation operation.
其中,具体的拆解方式与对特征变换运算的拆解方式类似,进而通过更少的乘法方式就能够得到卷积运算结果。Among them, the specific disassembly method is similar to the disassembly method for the feature transformation operation, and the convolution operation result can be obtained through fewer multiplication methods.
步骤205,将运算结果输出至下层卷积网络。Step 205: Output the calculation result to the lower layer convolutional network.
具体的,本实施例提供的方法中,本层卷积网络可以将确定的运算结果输出至下层卷积网络,从而将其作为下层卷积网络的输入特征,下层卷积网络可以根据该层的权重数据对输入的数据进行卷积计算。Specifically, in the method provided in this embodiment, the convolutional network of this layer can output the determined calculation result to the lower layer convolutional network, so as to use it as the input feature of the lower layer convolutional network, and the lower layer convolutional network can be based on the The weight data performs convolution calculation on the input data.
进一步的,此时,下层卷积网络进行过卷积运算时,也可以采用上述的计算方式,即 采用winograd算法,并将算法中的变化运算转换为求和运算。Further, at this time, when the lower-level convolutional network has performed convolution operations, the above-mentioned calculation method can also be used, that is, the winograd algorithm is adopted, and the change operation in the algorithm is converted into a summation operation.
本实施例提供的方法用于进行卷积运算,该方法由设置有本实施例提供的方法的设备执行,该设备通常以硬件和/或软件的方式来实现。The method provided in this embodiment is used to perform convolution operations, and the method is executed by a device provided with the method provided in this embodiment, and the device is usually implemented in hardware and/or software.
本实施例提供的运算方法,包括获取上层卷积网络输出的特征数据以及用于对该特征数据进行正变换的特征变换矩阵;根据特征变换矩阵对特征数据进行变换得到特征变换结果;其中,将特征数据的变换运算拆解为求和运算,并根据求和运算确定特征变换结果;获取本层卷积网络经过正变换的权值变换结果,对特征变换结果和权值变换结果进行对位乘法运算,得到乘法运算结果;获取用于对乘法运算结果进行逆变换的逆变换矩阵,根据逆变换矩阵对乘法运算结果进行变换得到运算结果;其中,将乘法运算结果的变换运算拆解为求和运算,并根据求和运算确定运算结果;将运算结果输出至下层卷积网络。本实施例提供的方法中,计算机系统在对特征数据进行卷积处理时,采用winograd算法,该算法能够将乘法转换为加法,并且将该算法中的变化过程转换为求和运算,从而更进一步的降低数据处理过程中的乘法运算,能够降低计算机系统的性能损耗,提高运算速度。The operation method provided by this embodiment includes obtaining the feature data output by the upper-layer convolutional network and the feature transformation matrix used for forward transformation of the feature data; transforming the feature data according to the feature transformation matrix to obtain the feature transformation result; where The transformation operation of feature data is disassembled into a summation operation, and the feature transformation result is determined according to the summation operation; the weight transformation result of the positive transformation of the convolutional network of this layer is obtained, and the feature transformation result and the weight transformation result are subjected to bitwise multiplication Operate to obtain the result of the multiplication operation; obtain the inverse transformation matrix for inverse transformation of the result of the multiplication operation, and transform the result of the multiplication operation according to the inverse transformation matrix to obtain the operation result; among them, the transformation operation of the multiplication operation result is disassembled into a summation Calculate, and determine the result of the operation according to the summation operation; output the result of the operation to the lower layer convolutional network. In the method provided in this embodiment, the computer system uses the winograd algorithm when performing convolution processing on the feature data, which can convert multiplication to addition, and convert the change process in the algorithm to a summation operation, thereby further Reduce the multiplication operation in the data processing process, can reduce the performance loss of the computer system, and increase the calculation speed.
图3为本发明一示例性实施例示出的运算方法的流程图。Fig. 3 is a flowchart of an operation method shown in an exemplary embodiment of the present invention.
如图3所示,本实施例提供的运算方法包括:As shown in Figure 3, the calculation method provided by this embodiment includes:
步骤301,获取上层卷积网络输出的特征数据以及用于对该特征数据进行正变换的特征变换矩阵。Step 301: Obtain feature data output by the upper-layer convolutional network and a feature transformation matrix used for positive transformation of the feature data.
步骤301与步骤201的实现原理、方式类似,不再赘述。The implementation principles and methods of step 301 and step 201 are similar, and will not be described again.
步骤302,将特征数据拆解为多个特征子张量。Step 302: Disassemble the feature data into multiple feature sub-tensors.
其中,本实施例提供的方法中,可以将特征数据的正变换拆解为求和运算,从而降低乘法的运算次数。在拆解过程中,可以将特征数据拆解为多个特征子张量。Among them, in the method provided in this embodiment, the positive transformation of the characteristic data can be disassembled into a summation operation, thereby reducing the number of multiplication operations. In the disassembly process, the feature data can be disassembled into multiple feature sub-tensors.
具体的,多个特征子张量之和为特征数据,多个特征子张量的个数与特征数据中非0元素的个数相同,每个特征子张量中有单个非0元素,且在特征子张量中的非0元素与在特征数据中对应位置的非0元素相同。Specifically, the sum of multiple feature sub-tensors is feature data, the number of multiple feature sub-tensors is the same as the number of non-zero elements in the feature data, each feature sub-tensor has a single non-zero element, and the feature The non-zero elements in the sub-tensor are the same as the non-zero elements in the corresponding position in the feature data.
进一步的,例如特征数据d为:Further, for example, the characteristic data d is:
Figure PCTCN2020113166-appb-000004
Figure PCTCN2020113166-appb-000004
则可以根据上述规定,将特征数据拆分为16个(假设特征数据中的元素均为非0时),特征子张量,分别为:According to the above regulations, the feature data can be divided into 16 (assuming that the elements in the feature data are all non-zero), and the feature sub-tensors are:
Figure PCTCN2020113166-appb-000005
Figure PCTCN2020113166-appb-000005
步骤303,根据特征变换矩阵对多个特征子张量进行变换运算并求和,得到特征变换结果。Step 303: Perform a transformation operation on the multiple feature sub-tensors according to the feature transformation matrix and sum them to obtain a feature transformation result.
实际应用时,将特征数据拆分为特征子张量之后,可以使用特征变换矩阵对每个特征子张量进行变换,再将特征子张量的变换结果相加,得到特征变换结果。In practical applications, after the feature data is split into feature sub-tensors, the feature transformation matrix can be used to transform each feature sub-tensor, and then the transformation results of the feature sub-tensors can be added to obtain the feature transformation result.
其中,由于特征子张量之和与特征数据等同,那么对特征子数据进行变换,再将变换结果相加得到的结果,与特征数据的变换结果相同。Among them, since the sum of the feature sub-tensors is equal to the feature data, the result of transforming the feature sub-data and adding the transformation results is the same as the transformation result of the feature data.
例如,对于其中的一个特征子张量,可以基于下式对其进行转换:For example, for one of the feature subtensors, it can be transformed based on the following formula:
Figure PCTCN2020113166-appb-000006
Figure PCTCN2020113166-appb-000006
针对每个特征子张量,都可以进行如上的变换,再将各个特征子张量的变换结果相加,得到特征变换结果。For each feature sub-tensor, the above transformation can be performed, and then the transformation results of each feature sub-tensor are added to obtain the feature transformation result.
为了进一步的减少运算过程中的乘法运算,在对特征子张量进行变换运算并求和,得到特征变换结果时,还可以:In order to further reduce the multiplication operation in the calculation process, when the feature sub-tensor is transformed and summed to obtain the feature transformation result, you can also:
根据特征子张量确定对应的特征元子张量,其中,特征元子张量是将特征子张量的非0元素置为1的张量;Determine the corresponding feature element sub-tensor according to the feature sub-tensor, where the feature element sub-tensor is a tensor in which the non-zero elements of the feature sub-tensor are set to 1;
根据特征变换矩阵、特征元子张量及其对应的非0元素,确定特征子张量的变换结果;Determine the transformation result of the feature sub-tensor according to the feature transformation matrix, the feature element sub-tensor and the corresponding non-zero elements;
对特征子张量的变换结果进行求和,得到特征变换结果。The result of the feature transformation is obtained by summing the transformation results of the feature subtensors.
其中,可以识别特征子张量中的非0元素,并将非0元素对应的位置设置为1, 得到特征元子张量,例如对于特征子张量Among them, the non-zero elements in the feature sub-tensor can be identified, and the position corresponding to the non-zero elements can be set to 1, to obtain the feature element sub-tensor, for example, for the feature sub-tensor
Figure PCTCN2020113166-appb-000007
Figure PCTCN2020113166-appb-000007
来说,对应的特征元子张量是:In other words, the corresponding feature element sub-tensor is:
Figure PCTCN2020113166-appb-000008
Figure PCTCN2020113166-appb-000008
针对每个特征子张量都可以确定其对应的特征元子张量。For each feature sub-tensor, the corresponding feature element sub-tensor can be determined.
在对特征子张量进行变换时,可以根据特征变换矩阵、特征元子张量及其对应的非0元素,确定特征子张量的变换结果。When transforming the feature sub-tensor, the transformation result of the feature sub-tensor can be determined according to the feature transformation matrix, the feature element sub-tensor and the corresponding non-zero elements.
具体可以对特征元子张量左边乘以特征变换矩阵中的左乘矩阵、右边乘以特征变换矩阵中的右乘矩阵,并将结果与特征元子张量对应的非0元素相乘,得到特征子张量的变换结果;其中,特征元子张量中的左乘矩阵和右乘矩阵都是由特征子张量的规模确定的Specifically, the left side of the eigen-element sub-tensor can be multiplied by the left multiplication matrix in the feature transformation matrix, and the right side can be multiplied by the right-multiplication matrix in the feature transformation matrix, and the result can be multiplied with the non-zero elements corresponding to the eigen-element sub-tensor to get The transformation result of the feature sub-tensor; among them, the left multiplication matrix and the right multiplication matrix in the feature sub-tensor are both determined by the size of the feature sub tensor
例如对于For example for
Figure PCTCN2020113166-appb-000009
Figure PCTCN2020113166-appb-000009
来说,可以转换为For example, it can be converted to
Figure PCTCN2020113166-appb-000010
Figure PCTCN2020113166-appb-000010
由于矩阵中的元素除了0就是1,因此,对于上式的运算过程引起的性能损耗较小。Since the elements in the matrix are 1 except for 0, the performance loss caused by the calculation process of the above formula is small.
进一步的,由于根据特征数据的尺寸可以确定B T、B,并且,特征元子张量也是预先可以根据特征数据确定的。因此,还可以预先根据B T、B、特征元子张量确定与特征数据中每个元素位置对应的替换矩阵。 Further, since B T and B can be determined according to the size of the feature data, and the feature element sub-tensor can also be determined in advance according to the feature data. Therefore, the replacement matrix corresponding to the position of each element in the feature data can also be determined in advance according to B T, B, and feature element sub-tensors.
例如,对于第一行第一列的元素位置,替换矩阵为:For example, for the element position in the first row and first column, the replacement matrix is:
Figure PCTCN2020113166-appb-000011
Figure PCTCN2020113166-appb-000011
基于上式可以得知,特征子张量Based on the above formula, it can be known that the feature sub-tensor
Figure PCTCN2020113166-appb-000012
Figure PCTCN2020113166-appb-000012
的变换结果变为:The result of the transformation becomes:
d 00×D 00 d 00 ×D 00
针对特征数据中每个元素位置都可以确定对应的替换矩阵,可以在对特征数据进行变换时,直接根据数据尺寸,确定对应的替换矩阵集合,再根据替换矩阵集合确定特征变换结果。The corresponding replacement matrix can be determined for each element position in the feature data. When the feature data is transformed, the corresponding replacement matrix set can be determined directly according to the data size, and then the feature transformation result can be determined according to the replacement matrix set.
基于上式可以得到特征变化结果为:Based on the above formula, the result of feature change can be obtained as:
B TdB=d 00×D 01+d 01×D 01…d 33×D 33 B T dB=d 00 ×D 01 +d 01 ×D 01 …d 33 ×D 33
步骤304,获取本层卷积网络的权值数据以及用于对该权值数据进行正变换的权值变换矩阵。Step 304: Obtain the weight data of the convolutional network of the current layer and the weight transformation matrix used for positive transformation of the weight data.
步骤305,根据权值变换矩阵对权值数据进行变换得到权值变换结果;其中,将权值数据的变换运算拆解为求和运算,并根据求和运算确定权值变换结果。Step 305: Transform the weight data according to the weight transformation matrix to obtain a weight transformation result; wherein, the transformation operation of the weight data is disassembled into a summation operation, and the weight transformation result is determined according to the summation operation.
其中,在本实施例提供的方法中,可以根据权值变换矩阵对权值数据进行变换,得到权值变换结果。Among them, in the method provided in this embodiment, the weight data can be transformed according to the weight transformation matrix to obtain the weight transformation result.
具体的,在变换过程中,为了减少乘法运算,可以将权值数据的变换运算拆解为求和运算,并根据求和运算确定权值变换结果。Specifically, in the transformation process, in order to reduce the multiplication operation, the transformation operation of the weight data can be disassembled into a summation operation, and the weight transformation result can be determined according to the summation operation.
进一步的,可以基于下式对权值数据进行变换:Further, the weight data can be transformed based on the following formula:
G TgG G T gG
其中,G T、G是权值变换矩阵,g是权值数据。在将该变换过程拆解为求和运算时,可以将权值数据拆解为多个权值子张量;再根据权值变换矩阵对多个权值子张量进行变换运算并求和,得到权值变换结果。 Among them, G T and G are weight transformation matrices, and g is weight data. When the transformation process is disassembled into a summation operation, the weight data can be disassembled into multiple weight sub-tensors; then the multiple weight sub-tensors are transformed and summed according to the weight transformation matrix, Get the result of weight transformation.
具体的,多个权值子张量之和为权值数据,多个权值子张量的个数与权值数据中非0元素的个数相同,每个权值子张量中有单个非0元素,且在权值子张量中的非0元素与在权值数据中对应位置的非0元素相同。Specifically, the sum of multiple weight sub-tensors is weight data, the number of multiple weight sub-tensors is the same as the number of non-zero elements in the weight data, and each weight sub-tensor has a single non-zero value. Element, and the non-zero element in the weight sub-tensor is the same as the non-zero element in the corresponding position in the weight data.
进一步的,例如权值数据g为:Further, for example, the weight data g is:
Figure PCTCN2020113166-appb-000013
Figure PCTCN2020113166-appb-000013
与拆分特征数据相似,也可以将权值数据拆分为16个权值子张量,分别为:Similar to the split feature data, the weight data can also be split into 16 weight sub-tensors, which are:
Figure PCTCN2020113166-appb-000014
Figure PCTCN2020113166-appb-000014
实际应用时,将权值数据拆分为权值子张量之后,可以使用权值变换矩阵对每个权值子张量进行变换,再将权值子张量的变换结果相加,得到权值变换结果。In practical applications, after splitting the weight data into weight sub-tensors, you can use the weight transformation matrix to transform each weight sub-tensor, and then add the transformation results of the weight sub-tensors to obtain the weight The result of the value transformation.
例如,对于其中的一个权值子张量,可以基于下式对其进行转换:For example, for one of the weight sub-tensors, it can be transformed based on the following formula:
Figure PCTCN2020113166-appb-000015
Figure PCTCN2020113166-appb-000015
针对每个权值子张量,都可以进行如上的变换,再将各个权值子张量的变换结果相加,得到权值变换结果。For each weight sub-tensor, the above transformation can be performed, and then the transformation results of each weight sub-tensor are added to obtain the weight transformation result.
为了进一步的减少运算过程中的乘法运算,在对权值子张量进行变换运算并求和,得到权值变换结果时,还可以:In order to further reduce the multiplication operation in the calculation process, when the weight sub-tensor is transformed and summed to obtain the weight transformation result, you can also:
根据权值子张量确定对应的权值元子张量,其中,权值元子张量是将权值子张量的非0元素置为1的张量;Determine the corresponding weight sub-tensor according to the weight sub-tensor, where the weight sub-tensor is a tensor with the non-zero elements of the weight sub-tensor set to 1;
根据权值变换矩阵、权值元子张量及其对应的非0元素,确定权值子张量的变换结果;Determine the transformation result of the weight sub-tensor according to the weight transformation matrix, the weight element sub-tensor and the corresponding non-zero elements;
对权值子张量的变换结果进行求和,得到权值变换结果。Sum the transformation results of the weight subtensors to obtain the weight transformation results.
与特征数据的变换过程类似,可以识别权值子张量中的非0元素,并将非0元素对应的位置设置为1,得到权值元子张量,例如对于权值子张量Similar to the transformation process of feature data, it can identify the non-zero elements in the weight sub-tensor, and set the position corresponding to the non-zero element to 1, to obtain the weight element sub-tensor, for example, for the weight sub-tensor
Figure PCTCN2020113166-appb-000016
Figure PCTCN2020113166-appb-000016
来说,对应的权值元子张量是:In other words, the corresponding weight element sub-tensor is:
Figure PCTCN2020113166-appb-000017
Figure PCTCN2020113166-appb-000017
针对每个权值子张量都可以确定其对应的权值元子张量。For each weight sub-tensor, the corresponding weight element sub-tensor can be determined.
在对权值子张量进行变换时,可以根据权值变换矩阵、权值元子张量及其对应的非0元素,确定权值子张量的变换结果。When transforming the weight sub-tensor, the transformation result of the weight sub-tensor can be determined according to the weight transformation matrix, the weight element sub-tensor and the corresponding non-zero elements.
具体可以对权值元子张量左边乘以权值变换矩阵中的左乘矩阵、右边乘以权值变换矩阵中的右乘矩阵,并将结果与权值元子张量对应的非0元素相乘,得到权值子张量的变换结果;其中,权值元子张量中的左乘矩阵和右乘矩阵都是由权值子张量的规模确定的。Specifically, the left side of the weight element sub-tensor can be multiplied by the left multiplication matrix in the weight transformation matrix, and the right side can be multiplied by the right multiplication matrix in the weight transformation matrix, and the result can be the non-zero element corresponding to the weight element sub-tensor. Multiply to obtain the transformation result of the weight sub-tensor; among them, the left multiplication matrix and the right multiplication matrix in the weight element sub-tensor are both determined by the scale of the weight sub-tensor.
例如对于For example for
Figure PCTCN2020113166-appb-000018
Figure PCTCN2020113166-appb-000018
来说,可以转换为For example, it can be converted to
Figure PCTCN2020113166-appb-000019
Figure PCTCN2020113166-appb-000019
进一步的,由于根据权值数据的尺寸可以确定G T、G,并且,权值元子张量也是预先可以根据权值数据确定的。因此,还可以预先根据G T、G、权值元子张量确定与权值数据中每个元素位置对应的替换矩阵。 Further, since G T and G can be determined according to the size of the weight data, and the weight element sub-tensor can also be determined in advance according to the weight data. Therefore, the replacement matrix corresponding to the position of each element in the weight data can also be determined in advance according to G T, G and the weight element sub-tensor.
例如,对于第一行第一列的元素位置,替换矩阵为:For example, for the element position in the first row and first column, the replacement matrix is:
Figure PCTCN2020113166-appb-000020
Figure PCTCN2020113166-appb-000020
基于上式可以得知,权值子张量Based on the above formula, it can be known that the weight sub-tensor
Figure PCTCN2020113166-appb-000021
Figure PCTCN2020113166-appb-000021
的变换结果变为:The result of the transformation becomes:
g 00×D′ 00 g 00 ×D′ 00
针对权值数据中每个元素位置都可以确定对应的替换矩阵,可以在对权值数据进行变 换时,直接根据数据尺寸,确定对应的替换矩阵集合,再根据替换矩阵集合确定权值变换结果。The corresponding replacement matrix can be determined for each element position in the weight data. When the weight data is transformed, the corresponding replacement matrix set can be determined directly according to the data size, and then the weight transformation result can be determined according to the replacement matrix set.
基于上式可以得到权值变化结果为:Based on the above formula, the result of weight change can be obtained as:
G TgG=g 00×D' 01+g 01×D' 01…g 33×D' 33 G T gG=g 00 ×D' 01 +g 01 ×D' 01 …g 33 ×D' 33
步骤306,对特征变换结果和权值变换结果进行对位乘法运算,得到乘法运算结果。Step 306: Perform a bitwise multiplication operation on the feature transformation result and the weight transformation result to obtain the multiplication operation result.
步骤306与步骤203中对特征变换结果和权值变换结果进行对位乘法运算的实现原理、方式类似,不再赘述。Step 306 is similar to the implementation principle and method of performing bit multiplication on the feature transformation result and the weight transformation result in step 203, and will not be repeated here.
步骤307,将乘法运算结果拆解为多个结果子张量。Step 307: Decompose the result of the multiplication operation into multiple resultant sub-tensors.
步骤308,根据逆变换矩阵对多个结果子张量进行变换运算并求和,得到运算结果。Step 308: Perform a transformation operation on the multiple resultant sub-tensors according to the inverse transformation matrix and sum them to obtain the operation result.
其中,在本实施例提供的方法中,可以根据逆变换矩阵对乘法运算结果数据进行变换,得到运算结果。Among them, in the method provided in this embodiment, the multiplication operation result data can be transformed according to the inverse transformation matrix to obtain the operation result.
具体的,在变换过程中,为了减少乘法运算,可以将乘法运算结果的变换运算拆解为求和运算,并根据求和运算确定运算结果。Specifically, in the transformation process, in order to reduce the multiplication operation, the transformation operation of the multiplication operation result can be disassembled into a summation operation, and the operation result can be determined according to the summation operation.
进一步的,可以基于下式对乘法运算结果进行变换:Further, the result of the multiplication operation can be transformed based on the following formula:
ApA T ApA T
其中,A T、A是逆变换矩阵,p是乘法运算结果。在将该变换过程拆解为求和运算时,可以将乘法运算结果拆解为多个结果子张量;再根据逆变换矩阵对多个结果子张量进行变换运算并求和,得到运算结果。 Among them, A T and A are inverse transformation matrices, and p is the result of the multiplication operation. When the transformation process is disassembled into a summation operation, the result of the multiplication operation can be disassembled into multiple result sub-tensors; then the multiple result sub-tensors are transformed and summed according to the inverse transformation matrix to obtain the result of the operation .
具体的,多个结果子张量之和为乘法运算结果,多个结果子张量的个数与乘法运算结果中非0元素的个数相同,每个结果子张量中有单个非0元素,且在结果子张量中的非0元素与在乘法运算结果中对应位置的非0元素相同。Specifically, the sum of multiple resultant subtensors is the result of the multiplication operation, the number of multiple resultant subtensors is the same as the number of non-zero elements in the result of the multiplication operation, and each resultant subtensor has a single non-zero element, and The non-zero elements in the resulting sub-tensor are the same as the non-zero elements in the corresponding position in the result of the multiplication operation.
进一步的,例如乘法运算结果p为:Further, for example, the multiplication operation result p is:
Figure PCTCN2020113166-appb-000022
Figure PCTCN2020113166-appb-000022
与拆分特征数据相似,也可以将乘法运算结果拆分为16个结果子张量,分别为:Similar to splitting the feature data, the result of the multiplication operation can also be split into 16 result sub-tensors, which are:
Figure PCTCN2020113166-appb-000023
Figure PCTCN2020113166-appb-000023
Figure PCTCN2020113166-appb-000024
Figure PCTCN2020113166-appb-000024
实际应用时,将乘法运算结果拆分为结果子张量之后,可以使用逆变换矩阵对每个结果子张量进行变换,再将结果子张量的变换结果相加,得到运算结果。In practical applications, after splitting the result of the multiplication operation into the resultant sub-tensors, the inverse transformation matrix can be used to transform each resultant sub-tensor, and then the transformation results of the resultant sub-tensors can be added to obtain the result of the operation.
例如,对于其中的一个结果子张量,可以基于下式对其进行转换:For example, for one of the resulting sub-tensors, it can be transformed based on the following formula:
Figure PCTCN2020113166-appb-000025
Figure PCTCN2020113166-appb-000025
针对每个结果子张量,都可以进行如上的变换,再将各个结果子张量的变换结果相加,得到运算结果。For each result sub-tensor, the above transformation can be performed, and then the transformation results of each result sub-tensor are added to obtain the operation result.
为了进一步的减少运算过程中的乘法运算,在对结果子张量进行变换运算并求和,得到运算结果时,还可以:In order to further reduce the multiplication operation in the operation process, when the result sub-tensor is transformed and summed to obtain the operation result, you can also:
根据结果子张量确定对应的结果元子张量,其中,结果元子张量是将结果子张量的非0元素置为1的张量;Determine the corresponding result sub-tensor according to the result sub-tensor, where the result sub-tensor is a tensor with the non-zero elements of the result sub-tensor set to 1;
根据逆变换矩阵、结果元子张量及其对应的非0元素,确定结果子张量的变换结果;Determine the transformation result of the resulting sub-tensor according to the inverse transformation matrix, the resulting sub-tensor and its corresponding non-zero elements;
对结果子张量的变换结果进行求和,得到运算结果。Sum the transformation results of the resulting sub-tensor to obtain the result of the operation.
与特征数据的变换过程类似,可以识别结果子张量中的非0元素,并将非0元素对应的位置设置为1,得到结果元子张量,例如对于结果子张量Similar to the transformation process of feature data, the non-zero elements in the result sub-tensor can be identified, and the position corresponding to the non-zero elements can be set to 1, and the result element sub-tensor can be obtained, for example, for the result sub-tensor
Figure PCTCN2020113166-appb-000026
Figure PCTCN2020113166-appb-000026
来说,对应的结果元子张量是:In other words, the corresponding result element subtensor is:
Figure PCTCN2020113166-appb-000027
Figure PCTCN2020113166-appb-000027
针对每个结果子张量都可以确定其对应的结果元子张量。For each result sub-tensor, the corresponding result element sub-tensor can be determined.
在对结果子张量进行变换时,可以根据逆变换矩阵、结果元子张量及其对应的非0元素,确定运算结果。When transforming the resultant sub-tensor, the result of the operation can be determined according to the inverse transformation matrix, the resultant sub-tensor and its corresponding non-zero elements.
具体可以对结果元子张量左边乘以逆变换矩阵中的左乘矩阵、右边乘以逆变换矩阵中的右乘矩阵,并将结果与结果元子张量对应的非0元素相乘,得到结果子张量的变换结果;其中,结果元子张量中的左乘矩阵和右乘矩阵都是由结果子张量的规模确定的。Specifically, the left side of the resultant sub-tensor can be multiplied by the left multiplication matrix in the inverse transformation matrix, and the right side can be multiplied by the right multiplication matrix in the inverse transformation matrix, and the result can be multiplied by the non-zero elements corresponding to the resultant sub-tensor to get The transformation result of the resultant sub-tensor; where, the left-multiplication matrix and the right-multiplication matrix in the resultant sub-tensor are both determined by the scale of the resultant sub-tensor.
例如对于For example for
Figure PCTCN2020113166-appb-000028
Figure PCTCN2020113166-appb-000028
来说,可以转换为For example, it can be converted to
Figure PCTCN2020113166-appb-000029
Figure PCTCN2020113166-appb-000029
进一步的,由于根据运算结果的尺寸可以确定A T、A,并且,结果元子张量也是预先可以根据运算结果的尺寸确定的。因此,还可以预先根据A T、A、结果元子张量确定与乘法运算数据中每个元素位置对应的替换矩阵。 Further, since A T and A can be determined according to the size of the operation result, and the result element sub-tensor can also be determined in advance according to the size of the operation result. Therefore, the replacement matrix corresponding to the position of each element in the multiplication data can also be determined in advance according to AT, A, and the result element sub-tensor.
例如,对于第一行第一列的元素位置,替换矩阵为:For example, for the element position in the first row and first column, the replacement matrix is:
Figure PCTCN2020113166-appb-000030
Figure PCTCN2020113166-appb-000030
基于上式可以得知,结果子张量Based on the above formula, it can be known that the resulting sub-tensor
Figure PCTCN2020113166-appb-000031
Figure PCTCN2020113166-appb-000031
的变换结果变为:The result of the transformation becomes:
p 00×D″ 00 p 00 ×D″ 00
针对乘法运算结果中每个元素位置都可以确定对应的替换矩阵,可以在对乘法运算结果进行变换时,直接根据该结果或最终运算结果尺寸,确定对应的替换矩阵集合,再根据替换矩阵集合确定运算结果。The corresponding replacement matrix can be determined for the position of each element in the multiplication operation result. When transforming the multiplication operation result, the corresponding replacement matrix set can be determined directly according to the result or the final operation result size, and then determined according to the replacement matrix set The result of the calculation.
基于上式可以得到运算结果为:Based on the above formula, the operation result can be obtained as
ApA T=p 00×D'' 00+p 01×D'' 01…p 33×D'' 33 ApA T = p 00 ×D'' 00 +p 01 ×D'' 01 …p 33 ×D'' 33
步骤309,将运算结果输出至下层卷积网络。Step 309: Output the calculation result to the lower layer convolutional network.
步骤309与步骤205的实现原理、方式类似,不再赘述The implementation principles and methods of step 309 and step 205 are similar, and will not be repeated here.
图4为本发明一示例性实施例示出的主从处理架构示意图。Fig. 4 is a schematic diagram showing a master-slave processing architecture according to an exemplary embodiment of the present invention.
如图4所示,本实施例的方案还提供一种主从处理架构,可以用来实现本实施例提供的运算方法。As shown in FIG. 4, the solution of this embodiment also provides a master-slave processing architecture, which can be used to implement the calculation method provided in this embodiment.
主从处理结构包括主功能单元41,以及至少一个从功能单元42。The master-slave processing structure includes a master functional unit 41 and at least one slave functional unit 42.
其中,主功能单元41根据特征变换矩阵对特征数据进行变换得到特征变换结果;其中,将特征数据的变换运算拆解为求和运算,并根据求和运算确定特征变换结果。Among them, the main functional unit 41 transforms the characteristic data according to the characteristic transformation matrix to obtain the characteristic transformation result; wherein, the transformation operation of the characteristic data is disassembled into a summation operation, and the characteristic transformation result is determined according to the summation operation.
可选的,还可以设置一主存储单元(图中未示出),该主存储单元可以与主功能单元41连接。可以由一主控制单元(图中未示出)分别向主存储单元以及主功能单元41发送指令,使得主存储单元能够向主功能单元发送特征数据。Optionally, a main storage unit (not shown in the figure) can also be provided, and the main storage unit can be connected to the main function unit 41. A main control unit (not shown in the figure) can respectively send instructions to the main storage unit and the main function unit 41, so that the main storage unit can send characteristic data to the main function unit.
从功能单元获取本层卷积网络经过正变换的权值变换结果,对特征变换结果和权值变换结果进行对位乘法运算,得到乘法运算结果;Obtain the weight transformation result of the positive transformation of the convolutional network of this layer from the functional unit, and perform the bitwise multiplication on the feature transformation result and the weight transformation result to obtain the multiplication result;
从功能单元获取用于对乘法运算结果进行逆变换的逆变换矩阵,根据逆变换矩阵对乘法运算结果进行变换得到运算结果;其中,将乘法运算结果的变换运算拆解为求和运算,并根据求和运算确定运算结果。Obtain the inverse transformation matrix for inverse transformation of the multiplication operation result from the functional unit, and transform the multiplication operation result according to the inverse transformation matrix to obtain the operation result; among them, the transformation operation of the multiplication operation result is disassembled into a summation operation, and according to The summation operation determines the result of the operation.
从功能单元42对对特征数据变换结果和权值变换结果进行对位乘法运算,得到乘法运算结果。The functional unit 42 performs a bitwise multiplication operation on the feature data transformation result and the weight transformation result, and obtains the multiplication operation result.
其中,从功能单元42可以对接收到的特征数据变换结果和权值变换结果进行对位乘法运算,从而得到乘法运算结果。Wherein, the slave functional unit 42 may perform a bitwise multiplication operation on the received feature data transformation result and the weight transformation result, so as to obtain the multiplication operation result.
本实施例提供的方法中,对数据的处理过程与上述实施例相似,不再赘述。In the method provided in this embodiment, the data processing process is similar to the foregoing embodiment, and will not be repeated.
具体的,一个主功能单元41可以连接多个从功能单元42,可以预设分配规则,用于向从功能单元42分配特征数据变换结果。Specifically, one main functional unit 41 can be connected to multiple slave functional units 42, and allocation rules can be preset for allocating characteristic data transformation results to the slave functional units 42.
进一步的,主功能单元41和从功能单元42的运算过程为并行运算,在主功能单元41对特征数据中各元素位置的变换结果值计算完成之前,从功能单元42针对已计算出的特征变换结果值的元素位置,执行元素位置下特征变换结果与权值变换结果的对位乘法运算,直至计算出各元素位置的对位乘法运算值,获得乘法运算结果。Further, the operation process of the main function unit 41 and the slave function unit 42 is a parallel operation. Before the main function unit 41 completes the calculation of the transformation result value of each element position in the feature data, the slave function unit 42 transforms the calculated feature For the element position of the result value, perform the alignment multiplication operation of the feature transformation result and the weight transformation result under the element position, until the alignment multiplication operation value of each element position is calculated, and the multiplication operation result is obtained.
实际应用时,通过将主功能单元41确定的特征变化结果发送给从功能单元42,且从功能单元42并行执行对位乘运算,能够提高系统的运算效率。In actual application, by sending the feature change result determined by the main functional unit 41 to the slave functional unit 42, and the slave functional unit 42 executes the bitwise multiplication operation in parallel, the operation efficiency of the system can be improved.
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的 动作组合,但是本领域技术人员应该知悉,本公开并不受所描述的动作顺序的限制,因为依据本公开,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于可选实施例,所涉及的动作和模块并不一定是本公开所必须的。It should be noted that for the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should know that the present disclosure is not limited by the described sequence of actions. Because according to the present disclosure, certain steps can be performed in other order or at the same time. Secondly, those skilled in the art should also know that the embodiments described in the specification are all optional embodiments, and the involved actions and modules are not necessarily required by the present disclosure.
进一步需要说明的是,虽然流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些子步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be further noted that although the various steps in the flowchart are displayed in sequence according to the directions of the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless there is a clear description in this article, there is no strict order for the execution of these steps, and these steps can be executed in other orders. Moreover, at least part of the steps in the flowchart may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. The execution of these sub-steps or stages The sequence is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
图5为本发明一示例性实施例示出的运算装置示意图。Fig. 5 is a schematic diagram of an arithmetic device shown in an exemplary embodiment of the present invention.
如图5所示,本实施例提供的运算装置包括:As shown in Figure 5, the computing device provided in this embodiment includes:
获取模块51,用于获取上层卷积网络输出的特征数据以及用于对该特征数据进行正变换的特征变换矩阵;The obtaining module 51 is configured to obtain the feature data output by the upper layer convolutional network and the feature transformation matrix used to perform positive transformation on the feature data;
特征变换模块52,用于根据所述特征变换矩阵对所述特征数据进行变换得到特征变换结果;其中,将所述特征数据的变换运算拆解为求和运算,并根据所述求和运算确定所述特征变换结果;The feature transformation module 52 is configured to transform the feature data according to the feature transformation matrix to obtain a feature transformation result; wherein, the transformation operation of the feature data is disassembled into a summation operation, and determined according to the summation operation The feature transformation result;
对位乘模块53,用于获取本层卷积网络经过正变换的权值变换结果,对所述特征变换结果和所述权值变换结果进行对位乘法运算,得到乘法运算结果;The bit multiplication module 53 is used to obtain the weight conversion result of the positive transformation of the convolutional network of the current layer, and perform bit multiplication on the feature conversion result and the weight conversion result to obtain the multiplication result;
逆变换模块54,用于获取用于对所述乘法运算结果进行逆变换的逆变换矩阵,根据所述逆变换矩阵对所述乘法运算结果进行变换得到运算结果;其中,将所述乘法运算结果的变换运算拆解为求和运算,并根据所述求和运算确定所述运算结果;The inverse transformation module 54 is configured to obtain an inverse transformation matrix used to inversely transform the multiplication operation result, and transform the multiplication operation result according to the inverse transformation matrix to obtain an operation result; wherein, the multiplication operation result is The transformation operation of is disassembled into a summation operation, and the operation result is determined according to the summation operation;
传输模块55,用于将所述运算结果输出至下层卷积网络。The transmission module 55 is configured to output the calculation result to the lower layer convolutional network.
本实施例提供的运算装置的具体原理、实现方式及效果与图2所示实施例相似,不再赘述。The specific principle, implementation, and effect of the computing device provided in this embodiment are similar to those in the embodiment shown in FIG. 2 and will not be repeated.
在图5所示的运算装置基础上,本实施例提供的运算装置中,所述特征变换模块52具体用于:Based on the computing device shown in FIG. 5, in the computing device provided in this embodiment, the feature transformation module 52 is specifically configured to:
将所述特征数据拆解为多个特征子张量;Disassemble the feature data into multiple feature sub-tensors;
根据所述特征变换矩阵对所述多个特征子张量进行变换运算并求和,得到所述特征变换结果;Performing a transformation operation on the multiple feature sub-tensors according to the feature transformation matrix and summing them to obtain the feature transformation result;
所述逆变换模块54具体用于:The inverse transform module 54 is specifically configured to:
将所述乘法运算结果拆解为多个结果子张量;Disassemble the multiplication operation result into multiple result sub-tensors;
根据所述逆变换矩阵对所述多个结果子张量进行变换运算并求和,得到所述运算结果。Perform a transformation operation on the multiple resultant sub-tensors according to the inverse transformation matrix and sum them to obtain the operation result.
多个所述特征子张量之和为所述特征数据;多个所述结果子张量之和为所述乘法运算结果;The sum of a plurality of the feature sub-tensors is the feature data; the sum of a plurality of the result sub-tensors is the result of the multiplication operation;
所述多个特征子张量的个数与所述特征数据中非0元素的个数相同,每个所述特征子张量中有单个非0元素,且在所述特征子张量中的非0元素与在所述特征数据中对应位置的非0元素相同;The number of the multiple feature sub-tensors is the same as the number of non-zero elements in the feature data, each feature sub-tensor has a single non-zero element, and the non-zero elements in the feature sub-tensor are the same as The non-zero elements at corresponding positions in the feature data are the same;
所述多个结果子张量的个数与所述乘法运算结果中非0元素的个数相同,每个所述结果子张量中有单个非0元素,且在所述结果子张量中的非0元素与在所述乘法运算结果中对应位置的非0元素相同。The number of the multiple resultant sub-tensors is the same as the number of non-zero elements in the result of the multiplication operation, each of the resultant sub-tensors has a single non-zero element, and there are non-zero elements in the resultant sub-tensor It is the same as the non-zero element in the corresponding position in the multiplication operation result.
所述特征变换模块52具体用于:The feature transformation module 52 is specifically configured to:
根据所述特征子张量确定对应的特征元子张量,其中,所述特征元子张量是将所述特征子张量的非0元素置为1的张量;Determine a corresponding feature element sub-tensor according to the feature sub-tensor, wherein the feature element sub-tensor is a tensor with a non-zero element of the feature sub-tensor set to 1;
根据所述特征变换矩阵、所述特征元子张量及其对应的非0元素,确定所述特征子张量的变换结果;Determine the transformation result of the feature sub-tensor according to the feature transformation matrix, the feature element sub-tensor and the corresponding non-zero elements;
对所述特征子张量的变换结果进行求和,得到所述特征变换结果;Summing the transformation results of the feature subtensors to obtain the feature transformation results;
所述逆变换模块54具体用于:The inverse transform module 54 is specifically configured to:
根据所述结果子张量确定对应的结果元子张量,其中,所述结果元子张量是将所述结果子张量的非0元素置为1的张量;Determine a corresponding result element sub-tensor according to the result sub-tensor, wherein the result element sub-tensor is a tensor in which a non-zero element of the result sub-tensor is set to 1;
根据所述逆变换矩阵、所述结果元子张量及其对应的非0元素,确定所述结果子张量的变换结果;Determine the transformation result of the result sub-tensor according to the inverse transformation matrix, the result element sub-tensor and the corresponding non-zero elements;
对所述结果子张量的变换结果进行求和,得到所述运算结果。Summing the transformation results of the resultant sub-tensor to obtain the operation result.
所述特征变换模块52具体用于:The feature transformation module 52 is specifically configured to:
对所述特征元子张量左边乘以所述特征变换矩阵中的左乘矩阵、右边乘以所述特征变换矩阵中的右乘矩阵,并将结果与所述特征元子张量对应的非0元素相乘,得到所述特征子张量的变换结果;Multiply the left side of the feature element sub-tensor by the left multiplication matrix in the feature transformation matrix, and multiply the right side by the right multiplication matrix in the feature transformation matrix, and the result will be the non-correspondence of the feature element sub-tensor. Multiply the 0 elements to obtain the transformation result of the feature sub-tensor;
其中,所述特征元子张量中的左乘矩阵和所述右乘矩阵都是由所述特征子张量的规模确定的;Wherein, the left multiplication matrix and the right multiplication matrix in the feature element sub-tensor are both determined by the scale of the feature sub-tensor;
所述逆变换模块54具体用于:The inverse transform module 54 is specifically configured to:
对所述结果元子张量左边乘以所述逆变换矩阵中的左乘矩阵、右边乘以所述逆变换矩阵中的右乘矩阵,并将结果与所述结果元子张量对应的非0元素相乘,得到所述结果子张量的变换结果;Multiply the left side of the result element sub-tensor by the left multiplying matrix in the inverse transformation matrix, and multiply the right side by the right multiplying matrix in the inverse transformation matrix, and the result is the non-sub-tensor corresponding to the result element. Multiply the 0 elements to obtain the transformation result of the resulting sub-tensor;
其中,所述结果元子张量中的左乘矩阵和所述右乘矩阵都是由所述结果子张量的规模确定的。Wherein, the left multiplication matrix and the right multiplication matrix in the result element sub-tensor are both determined by the scale of the result sub-tensor.
所述对位乘模块53具体用于:The alignment multiplication module 53 is specifically used for:
获取本层卷积网络的权值数据以及用于对该权值数据进行正变换的权值变换矩阵;Obtain the weight data of the convolutional network of this layer and the weight transformation matrix used for positive transformation of the weight data;
根据所述权值变换矩阵对所述权值数据进行变换得到权值变换结果;其中,将所述权值数据的变换运算拆解为求和运算,并根据所述求和运算确定所述权值变换结果。The weight data is transformed according to the weight transformation matrix to obtain the weight transformation result; wherein, the transformation operation of the weight data is disassembled into a summation operation, and the weight is determined according to the summation operation. The result of the value transformation.
所述对位乘模块53具体用于:The alignment multiplication module 53 is specifically used for:
将所述权值数据拆解为多个权值子张量;Disassemble the weight data into multiple weight sub-tensors;
根据所述权值变换矩阵对所述多个权值子张量进行变换运算并求和,得到所述权值变换结果。Perform a transformation operation on the multiple weight sub-tensors according to the weight transformation matrix and sum them to obtain the weight transformation result.
所述对位乘模块53具体用于:The alignment multiplication module 53 is specifically used for:
根据所述权值子张量确定对应的权值元子张量,其中,所述权值元子张量是将所述权值子张量的非0元素置为1的张量;Determine the corresponding weight sub-tensor according to the weight sub-tensor, wherein the weight sub-tensor is a tensor in which non-zero elements of the weight sub-tensor are set to 1;
根据所述权值变换矩阵、所述权值元子张量及其对应的非0元素,确定所述权值子张量的变换结果;Determining the transformation result of the weight sub-tensor according to the weight transformation matrix, the weight element sub-tensor and the corresponding non-zero elements;
对所述权值子张量的变换结果进行求和,得到所述权值变换结果。Sum the transformation results of the weight sub-tensors to obtain the weight transformation results.
所述对位乘模块53具体用于:The alignment multiplication module 53 is specifically used for:
对所述权值元子张量左边乘以所述权值变换矩阵中的左乘矩阵、右边乘以所述权值变换矩阵中的右乘矩阵,并将结果与所述权值元子张量对应的非0元素相乘,得到所述权值子张量的变换结果;Multiply the left side of the weight element sub-tensor by the left multiplication matrix in the weight transformation matrix, and multiply the right side by the right multiplication matrix in the weight transformation matrix, and compare the result with the weight element sub-tensor Multiply the non-zero elements corresponding to the quantity to obtain the transformation result of the weight sub-tensor;
其中,所述权值元子张量中的左乘矩阵和所述右乘矩阵都是由所述权值子张量的规模确定的。Wherein, the left multiplication matrix and the right multiplication matrix in the weight sub-tensor are both determined by the scale of the weight sub-tensor.
本实施例提供的运算装置的具体原理、实现方式及效果与图3、4所示实施例相似,不再赘述。The specific principle, implementation, and effect of the computing device provided in this embodiment are similar to those in the embodiments shown in FIGS. 3 and 4, and will not be repeated.
应该理解,上述的装置实施例仅是示意性的,本公开的装置还可通过其它的方式实现。例如,上述实施例中所述单元/模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。例如,多个单元、模块或组件可以结合,或者可以集成到另一个系统,或一些特征可以忽略或不执行。It should be understood that the foregoing device embodiments are only illustrative, and the device of the present disclosure may also be implemented in other ways. For example, the division of units/modules in the above-mentioned embodiments is only a logical function division, and there may be other division methods in actual implementation. For example, multiple units, modules or components may be combined or integrated into another system, or some features may be omitted or not implemented.
另外,若无特别说明,在本公开各个实施例中的各功能单元/模块可以集成在一个单元/模块中,也可以是各个单元/模块单独物理存在,也可以两个或两个以上单元/模块集成在一起。上述集成的单元/模块既可以采用硬件的形式实现,也可以采用软件程序模块的形式实现。In addition, unless otherwise specified, the functional units/modules in the various embodiments of the present disclosure may be integrated into one unit/module, or each unit/module may exist alone physically, or two or more units/modules may exist. The modules are integrated together. The above-mentioned integrated unit/module can be implemented in the form of hardware or software program module.
所述集成的单元/模块如果以硬件的形式实现时,该硬件可以是数字电路,模拟电路等等。硬件结构的物理实现包括但不局限于晶体管,忆阻器等等。若无特别说明,所述人工智能处理器可以是任何适当的硬件处理器,比如CPU、GPU、FPGA、DSP和ASIC等等。若无特别说明,所述存储单元可以是任何适当的磁存储介质或者磁光存储介质,比如,阻变式存储器RRAM(Resistive Random Access Memory)、动态随机存取存储器DRAM(Dynamic Random Access Memory)、静态随机存取存储器SRAM(Static Random-Access Memory)、增强动态随机存取存储器EDRAM(Enhanced Dynamic Random Access Memory)、高带宽内存HBM(High-Bandwidth Memory)、混合存储立方HMC(Hybrid Memory Cube)等等。If the integrated unit/module is implemented in the form of hardware, the hardware may be a digital circuit, an analog circuit, and so on. The physical realization of the hardware structure includes but is not limited to transistors, memristors and so on. Unless otherwise specified, the artificial intelligence processor may be any appropriate hardware processor, such as CPU, GPU, FPGA, DSP, ASIC, and so on. Unless otherwise specified, the storage unit may be any suitable magnetic storage medium or magneto-optical storage medium, such as RRAM (Resistive Random Access Memory), DRAM (Dynamic Random Access Memory), Static random access memory SRAM (Static Random-Access Memory), enhanced dynamic random access memory EDRAM (Enhanced Dynamic Random Access Memory), high-bandwidth memory HBM (High-Bandwidth Memory), hybrid storage cube HMC (Hybrid Memory Cube), etc. Wait.
所述集成的单元/模块如果以软件程序模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储器中。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit/module is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer readable memory. Based on this understanding, the technical solution of the present disclosure essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory, It includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present disclosure. The aforementioned memory includes: U disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.
在一种可能的实现方式中,还公开了一种人工智能芯片,其包括了上述运算装置。In a possible implementation manner, an artificial intelligence chip is also disclosed, which includes the aforementioned computing device.
在一种可能的实现方式中,还公开了一种板卡,其包括存储器件、接口装置和控制器件以及上述人工智能芯片;其中,所述人工智能芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;所述存储器件,用于存储数据;所述接口装置,用于实现所述人工智能芯片与外部设备之间的数据传输;所述控制器件,用于对所述人工智能芯片的状态进行监控。In a possible implementation manner, a board card is also disclosed, which includes a storage device, an interface device, a control device, and the aforementioned artificial intelligence chip; wherein, the artificial intelligence chip is connected to the storage device and the control device And the interface devices are respectively connected; the storage device is used to store data; the interface device is used to realize data transmission between the artificial intelligence chip and an external device; the control device is used to The state of the artificial intelligence chip is monitored.
图6为本发明一示例性实施例示出的板卡的结构框图,参阅图6,上述板卡除了包括上述芯片389以外,还可以包括其他的配套部件,该配套部件包括但不限于:存储器件390、接口装置391和控制器件392;Fig. 6 is a structural block diagram of a board card shown in an exemplary embodiment of the present invention. Referring to Fig. 6, the board card may include other supporting components in addition to the chip 389 described above. The supporting components include, but are not limited to: a storage device 390. Interface device 391 and control device 392;
所述存储器件390与所述人工智能芯片通过总线连接,用于存储数据。所述存储 器件可以包括多组存储单元393。每一组所述存储单元与所述人工智能芯片通过总线连接。可以理解,每一组所述存储单元可以是DDR SDRAM(英文:Double Data Rate SDRAM,双倍速率同步动态随机存储器)。The storage device 390 is connected to the artificial intelligence chip through a bus for storing data. The storage device may include multiple groups of storage units 393. Each group of the storage unit and the artificial intelligence chip are connected through a bus. It can be understood that each group of the storage units may be DDR SDRAM (English: Double Data Rate SDRAM, double-rate synchronous dynamic random access memory).
DDR不需要提高时钟频率就能加倍提高SDRAM的速度。DDR允许在时钟脉冲的上升沿和下降沿读出数据。DDR的速度是标准SDRAM的两倍。在一个实施例中,所述存储装置可以包括4组所述存储单元。每一组所述存储单元可以包括多个DDR4颗粒(芯片)。在一个实施例中,所述人工智能芯片内部可以包括4个72位DDR4控制器,上述72位DDR4控制器中64bit用于传输数据,8bit用于ECC校验。可以理解,当每一组所述存储单元中采用DDR3200颗粒时,数据传输的理论带宽可达到25600MB/s。DDR does not need to increase the clock frequency to double the speed of SDRAM. DDR allows data to be read on the rising and falling edges of the clock pulse. The speed of DDR is twice that of standard SDRAM. In an embodiment, the storage device may include 4 groups of the storage units. Each group of the storage unit may include a plurality of DDR4 particles (chips). In one embodiment, the artificial intelligence chip may include four 72-bit DDR4 controllers. In the 72-bit DDR4 controller, 64 bits are used for data transmission and 8 bits are used for ECC verification. It can be understood that when DDR3200 particles are used in each group of the storage units, the theoretical bandwidth of data transmission can reach 25600MB/s.
在一个实施例中,每一组所述存储单元包括多个并联设置的双倍速率同步动态随机存储器。DDR在一个时钟周期内可以传输两次数据。在所述芯片中设置控制DDR的控制器,用于对每个所述存储单元的数据传输与数据存储的控制。In one embodiment, each group of the storage unit includes a plurality of double-rate synchronous dynamic random access memories arranged in parallel. DDR can transmit data twice in one clock cycle. A controller for controlling the DDR is provided in the chip, which is used to control the data transmission and data storage of each storage unit.
所述接口装置与所述人工智能芯片电连接。所述接口装置用于实现所述人工智能芯片与外部设备(例如服务器或计算机)之间的数据传输。例如在一个实施例中,所述接口装置可以为标准PCIE接口。比如,待处理的数据由服务器通过标准PCIE接口传递至所述芯片,实现数据转移。优选的,当采用PCIE 3.0 X 16接口传输时,理论带宽可达到16000MB/s。在另一个实施例中,所述接口装置还可以是其他的接口,本公开并不限制上述其他的接口的具体表现形式,所述接口单元能够实现转接功能即可。另外,所述人工智能芯片的计算结果仍由所述接口装置传送回外部设备(例如服务器)。The interface device is electrically connected with the artificial intelligence chip. The interface device is used to implement data transmission between the artificial intelligence chip and an external device (such as a server or a computer). For example, in one embodiment, the interface device may be a standard PCIE interface. For example, the data to be processed is transferred from the server to the chip through a standard PCIE interface to realize data transfer. Preferably, when the PCIE 3.0 X 16 interface is used for transmission, the theoretical bandwidth can reach 16000MB/s. In another embodiment, the interface device may also be other interfaces. The present disclosure does not limit the specific manifestations of the other interfaces mentioned above, as long as the interface unit can realize the switching function. In addition, the calculation result of the artificial intelligence chip is still transmitted by the interface device back to an external device (such as a server).
所述控制器件与所述人工智能芯片电连接。所述控制器件用于对所述人工智能芯片的状态进行监控。具体的,所述人工智能芯片与所述控制器件可以通过SPI接口电连接。所述控制器件可以包括单片机(Micro Controller Unit,MCU)。如所述人工智能芯片可以包括多个处理芯片、多个处理核或多个处理电路,可以带动多个负载。因此,所述人工智能芯片可以处于多负载和轻负载等不同的工作状态。通过所述控制装置可以实现对所述人工智能芯片中多个处理芯片、多个处理和或多个处理电路的工作状态的调控。The control device is electrically connected with the artificial intelligence chip. The control device is used to monitor the state of the artificial intelligence chip. Specifically, the artificial intelligence chip and the control device may be electrically connected through an SPI interface. The control device may include a single-chip microcomputer (Micro Controller Unit, MCU). For example, the artificial intelligence chip may include multiple processing chips, multiple processing cores, or multiple processing circuits, and can drive multiple loads. Therefore, the artificial intelligence chip can be in different working states such as multi-load and light-load. The control device can realize the regulation and control of the working states of multiple processing chips, multiple processing and or multiple processing circuits in the artificial intelligence chip.
在一种可能的实现方式中,公开了一种电子设备,其包括了上述人工智能芯片。电子设备包括数据处理装置、机器人、电脑、打印机、扫描仪、平板电脑、智能终端、手机、行车记录仪、导航仪、传感器、摄像头、服务器、云端服务器、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备、交通工具、家用电器、和/或医疗设备。In a possible implementation manner, an electronic device is disclosed, which includes the aforementioned artificial intelligence chip. Electronic equipment includes data processing devices, robots, computers, printers, scanners, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cloud servers, cameras, cameras, projectors, watches, headsets , Mobile storage, wearable devices, vehicles, household appliances, and/or medical equipment.
所述交通工具包括飞机、轮船和/或车辆;所述家用电器包括电视、空调、微波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机;所述医疗设备包括核磁共振仪、B超仪和/或心电图仪。The transportation means include airplanes, ships, and/or vehicles; the household appliances include TVs, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods; the medical equipment includes nuclear magnetic resonance, B-ultrasound and/or electrocardiograph.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。上述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。In the above-mentioned embodiments, the description of each embodiment has its own focus. For parts that are not described in detail in an embodiment, reference may be made to related descriptions of other embodiments. The technical features of the above-mentioned embodiments can be combined arbitrarily. In order to make the description concise, all possible combinations of the various technical features in the above-mentioned embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, they should all be combined. It is considered as the range described in this specification.
依据以下条款可更好地理解前述内容:The foregoing can be better understood according to the following clauses:
A1、一种运算方法,所述方法包括:A1. An operation method, the method comprising:
获取上层卷积网络输出的特征数据以及用于对该特征数据进行正变换的特征变换矩阵;Obtain the feature data output by the upper layer convolutional network and the feature transformation matrix used for positive transformation of the feature data;
根据所述特征变换矩阵对所述特征数据进行变换得到特征变换结果;其中,将所述特征数据的变换运算拆解为求和运算,并根据所述求和运算确定所述特征变换结果;Transforming the feature data according to the feature transformation matrix to obtain a feature transformation result; wherein the transformation operation of the feature data is disassembled into a summation operation, and the feature transformation result is determined according to the summation operation;
获取本层卷积网络经过正变换的权值变换结果,对所述特征变换结果和所述权值变换结果进行对位乘法运算,得到乘法运算结果;Acquiring the weight transformation result of the positive transformation of the convolutional network of this layer, and performing a bitwise multiplication operation on the feature transformation result and the weight transformation result to obtain the multiplication operation result;
获取用于对所述乘法运算结果进行逆变换的逆变换矩阵,根据所述逆变换矩阵对所述乘法运算结果进行变换得到运算结果;其中,将所述乘法运算结果的变换运算拆解为求和运算,并根据所述求和运算确定所述运算结果;Obtain an inverse transformation matrix for inverse transformation of the multiplication operation result, and transform the multiplication operation result according to the inverse transformation matrix to obtain an operation result; wherein, the transformation operation of the multiplication operation result is decomposed into Sum operation, and determine the operation result according to the sum operation;
将所述运算结果输出至下层卷积网络Output the result of the operation to the lower convolutional network
A2、根据条款A1所述的方法,所述将所述特征数据的变换运算拆解为求和运算,并根据所述求和运算确定所述特征变换结果,包括:A2. The method according to clause A1, the disassembling the transformation operation of the characteristic data into a summation operation, and determining the characteristic transformation result according to the summation operation, includes:
将所述特征数据拆解为多个特征子张量;Disassemble the feature data into multiple feature sub-tensors;
根据所述特征变换矩阵对所述多个特征子张量进行变换运算并求和,得到所述特征变换结果;Performing a transformation operation on the multiple feature sub-tensors according to the feature transformation matrix and summing them to obtain the feature transformation result;
所述将所述乘法运算结果的变换运算拆解为求和运算,并根据所述求和运算确定所述运算结果,包括:The disassembling the transformation operation of the multiplication operation result into a summation operation and determining the operation result according to the summation operation includes:
将所述乘法运算结果拆解为多个结果子张量;Disassemble the multiplication operation result into multiple result sub-tensors;
根据所述逆变换矩阵对所述多个结果子张量进行变换运算并求和,得到所述运算结果Perform a transformation operation on the multiple resultant sub-tensors according to the inverse transformation matrix and sum them to obtain the operation result
A3、根据条款A2所述的方法,多个所述特征子张量之和为所述特征数据;多个所述结果子张量之和为所述乘法运算结果;A3. The method according to clause A2, wherein the sum of a plurality of the feature sub-tensors is the feature data; the sum of a plurality of the result sub-tensors is the result of the multiplication operation;
所述多个特征子张量的个数与所述特征数据中非0元素的个数相同,每个所述特征子张量中有单个非0元素,且在所述特征子张量中的非0元素与在所述特征数据中对应位置的非0元素相同;The number of the multiple feature sub-tensors is the same as the number of non-zero elements in the feature data, each feature sub-tensor has a single non-zero element, and the non-zero elements in the feature sub-tensor are the same as The non-zero elements at corresponding positions in the feature data are the same;
所述多个结果子张量的个数与所述乘法运算结果中非0元素的个数相同,每个所述结果子张量中有单个非0元素,且在所述结果子张量中的非0元素与在所述乘法运算结果中对应位置的非0元素相同The number of the multiple resultant sub-tensors is the same as the number of non-zero elements in the result of the multiplication operation, each of the resultant sub-tensors has a single non-zero element, and there are non-zero elements in the resultant sub-tensor Same as the non-zero element in the corresponding position in the result of the multiplication operation
A4、根据条款A2所述的方法,所述根据所述特征变换矩阵对所述多个特征子张量进行变换运算并求和,得到所述特征变换结果,包括:A4. The method according to clause A2, the performing a transformation operation on the multiple feature sub-tensors according to the feature transformation matrix and summing them to obtain the feature transformation result includes:
根据所述特征子张量确定对应的特征元子张量,其中,所述特征元子张量是将所述特征子张量的非0元素置为1的张量;Determine a corresponding feature element sub-tensor according to the feature sub-tensor, wherein the feature element sub-tensor is a tensor with a non-zero element of the feature sub-tensor set to 1;
根据所述特征变换矩阵、所述特征元子张量及其对应的非0元素,确定所述特征子张量的变换结果;Determine the transformation result of the feature sub-tensor according to the feature transformation matrix, the feature element sub-tensor and the corresponding non-zero elements;
对所述特征子张量的变换结果进行求和,得到所述特征变换结果;Summing the transformation results of the feature subtensors to obtain the feature transformation results;
所述根据所述逆变换矩阵对所述多个结果征子张量进行变换运算并求和,得到所述运算结果,包括:The performing a transformation operation on the multiple result levy tensors according to the inverse transformation matrix and summing them to obtain the operation result includes:
根据所述结果子张量确定对应的结果元子张量,其中,所述结果元子张量是将所述结果子张量的非0元素置为1的张量;Determine a corresponding result element sub-tensor according to the result sub-tensor, wherein the result element sub-tensor is a tensor in which a non-zero element of the result sub-tensor is set to 1;
根据所述逆变换矩阵、所述结果元子张量及其对应的非0元素,确定所述结果子张量的变换结果;Determine the transformation result of the result sub-tensor according to the inverse transformation matrix, the result element sub-tensor and the corresponding non-zero elements;
对所述结果子张量的变换结果进行求和,得到所述运算结果。Summing the transformation results of the resultant sub-tensor to obtain the operation result.
A5、根据条款A4所述的方法,所述根据所述特征变换矩阵、所述特征元子张量及其对应的非0元素,确定所述特征子张量的变换结果,包括:A5. The method according to clause A4, wherein the determining the transformation result of the feature sub-tensor according to the feature transformation matrix, the feature element sub-tensor and its corresponding non-zero elements includes:
对所述特征元子张量左边乘以所述特征变换矩阵中的左乘矩阵、右边乘以所述特征变换矩阵中的右乘矩阵,并将结果与所述特征元子张量对应的非0元素相乘,得到所述特征子张量的变换结果;Multiply the left side of the feature element sub-tensor by the left multiplication matrix in the feature transformation matrix, and multiply the right side by the right multiplication matrix in the feature transformation matrix, and the result will be the non-correspondence of the feature element sub-tensor. Multiply the 0 elements to obtain the transformation result of the feature sub-tensor;
其中,所述特征元子张量中的左乘矩阵和所述右乘矩阵都是由所述特征子张量的规模确定的;Wherein, the left multiplication matrix and the right multiplication matrix in the feature element sub-tensor are both determined by the scale of the feature sub-tensor;
所述根据所述逆变换矩阵、所述结果元子张量及其对应的非0元素,确定所述结果子张量的变换结果,包括:The determining the transformation result of the result sub-tensor according to the inverse transformation matrix, the result element sub-tensor and the corresponding non-zero elements includes:
对所述结果元子张量左边乘以所述逆变换矩阵中的左乘矩阵、右边乘以所述逆变换矩阵中的右乘矩阵,并将结果与所述结果元子张量对应的非0元素相乘,得到所述 结果子张量的变换结果;Multiply the left side of the result element sub-tensor by the left multiplying matrix in the inverse transformation matrix, and multiply the right side by the right multiplying matrix in the inverse transformation matrix, and the result is the non-sub-tensor corresponding to the result element. Multiply the 0 elements to obtain the transformation result of the resulting sub-tensor;
其中,所述结果元子张量中的左乘矩阵和所述右乘矩阵都是由所述结果子张量的规模确定的。Wherein, the left multiplication matrix and the right multiplication matrix in the result element sub-tensor are both determined by the scale of the result sub-tensor.
A6、根据条款A1所述的方法,所述获取本层卷积网络经过正变换的权值变换结果,包括:A6. According to the method described in clause A1, the obtaining the weight transformation result of the current layer of the convolutional network after the positive transformation includes:
获取本层卷积网络的权值数据以及用于对该权值数据进行正变换的权值变换矩阵;Obtain the weight data of the convolutional network of this layer and the weight transformation matrix used for positive transformation of the weight data;
根据所述权值变换矩阵对所述权值数据进行变换得到权值变换结果;其中,将所述权值数据的变换运算拆解为求和运算,并根据所述求和运算确定所述权值变换结果。The weight data is transformed according to the weight transformation matrix to obtain the weight transformation result; wherein, the transformation operation of the weight data is disassembled into a summation operation, and the weight is determined according to the summation operation. The result of the value transformation.
A7、根据条款A6所述的方法,所述将所述权值数据的变换运算拆解为求和运算,并根据所述求和运算确定所述权值变换结果,包括:A7. The method according to clause A6, the disassembling the transformation operation of the weight data into a summation operation, and determining the weight transformation result according to the summation operation, includes:
将所述权值数据拆解为多个权值子张量;Disassemble the weight data into multiple weight sub-tensors;
根据所述权值变换矩阵对所述多个权值子张量进行变换运算并求和,得到所述权值变换结果。Perform a transformation operation on the multiple weight sub-tensors according to the weight transformation matrix and sum them to obtain the weight transformation result.
A8、根据条款A7所述的方法,所述根据所述权值变换矩阵对所述多个权值子张量进行变换运算并求和,得到所述权值变换结果,包括:A8. The method according to clause A7, the performing a transformation operation on the multiple weight sub-tensors and summing them according to the weight transformation matrix to obtain the weight transformation result includes:
根据所述权值子张量确定对应的权值元子张量,其中,所述权值元子张量是将所述权值子张量的非0元素置为1的张量;Determine the corresponding weight sub-tensor according to the weight sub-tensor, wherein the weight sub-tensor is a tensor in which non-zero elements of the weight sub-tensor are set to 1;
根据所述权值变换矩阵、所述权值元子张量及其对应的非0元素,确定所述权值子张量的变换结果;Determining the transformation result of the weight sub-tensor according to the weight transformation matrix, the weight element sub-tensor and the corresponding non-zero elements;
对所述权值子张量的变换结果进行求和,得到所述权值变换结果。Sum the transformation results of the weight sub-tensors to obtain the weight transformation results.
A9、根据条款A8所述的方法,所述根据所述权值变换矩阵、所述权值元子张量及其对应的非0元素,确定所述权值子张量的变换结果,包括:A9. The method according to clause A8, the determining the transformation result of the weight sub-tensor according to the weight transformation matrix, the weight element sub-tensor and its corresponding non-zero elements, includes:
对所述权值元子张量左边乘以所述权值变换矩阵中的左乘矩阵、右边乘以所述权值变换矩阵中的右乘矩阵,并将结果与所述权值元子张量对应的非0元素相乘,得到所述权值子张量的变换结果;Multiply the left side of the weight element sub-tensor by the left multiplication matrix in the weight transformation matrix, and multiply the right side by the right multiplication matrix in the weight transformation matrix, and compare the result with the weight element sub-tensor Multiply the non-zero elements corresponding to the quantity to obtain the transformation result of the weight sub-tensor;
其中,所述权值元子张量中的左乘矩阵和所述右乘矩阵都是由所述权值子张量的规模确定的。Wherein, the left multiplication matrix and the right multiplication matrix in the weight sub-tensor are both determined by the scale of the weight sub-tensor.
A10、根据条款A1-A9任一项所述的方法,所述方法应用于主从处理架构;所述主从处理架构包括:主功能单元以及至少一个从功能单元。A10. The method according to any one of clauses A1-A9, which is applied to a master-slave processing architecture; the master-slave processing architecture includes: a master functional unit and at least one slave functional unit.
A11、根据条款A10所述的方法,所述主功能单元根据所述特征变换矩阵对所述特征数据进行变换得到特征变换结果;其中,将所述特征数据的变换运算拆解为求和 运算,并根据所述求和运算确定所述特征变换结果;A11. The method according to clause A10, wherein the main function unit transforms the feature data according to the feature transformation matrix to obtain a feature transformation result; wherein the transformation operation of the feature data is disassembled into a summation operation, And determine the feature transformation result according to the summation operation;
所述从功能单元获取本层卷积网络经过正变换的权值变换结果,对所述特征变换结果和所述权值变换结果进行对位乘法运算,得到乘法运算结果;Acquiring, by the functional unit, the weight transformation result of the current layer of the convolutional network through the forward transformation, and performing a bitwise multiplication operation on the feature transformation result and the weight transformation result to obtain the multiplication operation result;
所述从功能单元获取用于对所述乘法运算结果进行逆变换的逆变换矩阵,根据所述逆变换矩阵对所述乘法运算结果进行变换得到运算结果;其中,将所述乘法运算结果的变换运算拆解为求和运算,并根据所述求和运算确定所述运算结果。The slave functional unit obtains an inverse transformation matrix for inversely transforming the multiplication operation result, and transforms the multiplication operation result according to the inverse transformation matrix to obtain an operation result; wherein, the multiplication operation result is transformed The operation is disassembled into a summation operation, and the operation result is determined according to the summation operation.
A12、根据条款A11所述的方法,所述主功能单元和所述从功能单元的运算过程为并行运算,在所述主功能单元对特征数据中各元素位置的变换结果值计算完成之前,所述从功能单元针对已计算出的特征变换结果值的元素位置,执行所述元素位置下特征变换结果与权值变换结果的对位乘法运算,直至计算出各元素位置的对位乘法运算值,获得乘法运算结果。A12. According to the method described in clause A11, the operation process of the main functional unit and the slave functional unit is parallel operation. Before the main functional unit calculates the transformation result value of each element position in the feature data, all The said slave function unit executes the alignment multiplication operation of the feature transformation result and the weight transformation result under the element position for the element position of the calculated feature transformation result value, until the alignment multiplication operation value of each element position is calculated, Obtain the result of the multiplication operation.
A13、一种运算装置,包括:A13. An arithmetic device, including:
获取模块,用于获取上层卷积网络输出的特征数据以及用于对该特征数据进行正变换的特征变换矩阵;An acquisition module, used to acquire the feature data output by the upper layer convolutional network and the feature transformation matrix used to perform positive transformation on the feature data;
特征变换模块,用于根据所述特征变换矩阵对所述特征数据进行变换得到特征变换结果;其中,将所述特征数据的变换运算拆解为求和运算,并根据所述求和运算确定所述特征变换结果;The feature transformation module is used to transform the feature data according to the feature transformation matrix to obtain a feature transformation result; wherein the transformation operation of the feature data is disassembled into a summation operation, and the summation operation is determined according to the summation operation. The result of feature transformation;
对位乘模块,用于获取本层卷积网络经过正变换的权值变换结果,对所述特征变换结果和所述权值变换结果进行对位乘法运算,得到乘法运算结果;The bit multiplication module is used to obtain the weight conversion result of the positive transformation of the convolutional network of this layer, and perform bit multiplication on the feature conversion result and the weight conversion result to obtain the multiplication result;
逆变换模块,用于获取用于对所述乘法运算结果进行逆变换的逆变换矩阵,根据所述逆变换矩阵对所述乘法运算结果进行变换得到运算结果;其中,将所述乘法运算结果的变换运算拆解为求和运算,并根据所述求和运算确定所述运算结果;The inverse transformation module is used to obtain an inverse transformation matrix for inverse transformation of the multiplication operation result, and transform the multiplication operation result according to the inverse transformation matrix to obtain the operation result; wherein The transformation operation is disassembled into a summation operation, and the operation result is determined according to the summation operation;
传输模块,用于将所述运算结果输出至下层卷积网络。The transmission module is used to output the calculation result to the lower layer convolutional network.
A14,根据条款A13所述的装置,所述特征变换模块具体用于:A14. According to the device described in clause A13, the feature transformation module is specifically configured to:
将所述特征数据拆解为多个特征子张量;Disassemble the feature data into multiple feature sub-tensors;
根据所述特征变换矩阵对所述多个特征子张量进行变换运算并求和,得到所述特征变换结果;Performing a transformation operation on the multiple feature sub-tensors according to the feature transformation matrix and summing them to obtain the feature transformation result;
所述逆变换模块具体用于:The inverse transform module is specifically used for:
将所述乘法运算结果拆解为多个结果子张量;Disassemble the multiplication operation result into multiple result sub-tensors;
根据所述逆变换矩阵对所述多个结果子张量进行变换运算并求和,得到所述运算 结果。Perform a transformation operation on the multiple resultant sub-tensors according to the inverse transformation matrix and sum them to obtain the operation result.
A15,根据条款A14所述的装置,多个所述特征子张量之和为所述特征数据;多个所述结果子张量之和为所述乘法运算结果;A15. The device according to clause A14, wherein the sum of a plurality of the feature sub-tensors is the feature data; the sum of a plurality of the result sub-tensors is the result of the multiplication operation;
所述多个特征子张量的个数与所述特征数据中非0元素的个数相同,每个所述特征子张量中有单个非0元素,且在所述特征子张量中的非0元素与在所述特征数据中对应位置的非0元素相同;The number of the multiple feature sub-tensors is the same as the number of non-zero elements in the feature data, each feature sub-tensor has a single non-zero element, and the non-zero elements in the feature sub-tensor are the same as The non-zero elements at corresponding positions in the feature data are the same;
所述多个结果子张量的个数与所述乘法运算结果中非0元素的个数相同,每个所述结果子张量中有单个非0元素,且在所述结果子张量中的非0元素与在所述乘法运算结果中对应位置的非0元素相同。The number of the multiple resultant sub-tensors is the same as the number of non-zero elements in the result of the multiplication operation, each of the resultant sub-tensors has a single non-zero element, and there are non-zero elements in the resultant sub-tensor It is the same as the non-zero element in the corresponding position in the multiplication operation result.
A16,根据条款A14所述的装置,所述特征变换模块具体用于:A16. According to the device described in clause A14, the feature transformation module is specifically configured to:
根据所述特征子张量确定对应的特征元子张量,其中,所述特征元子张量是将所述特征子张量的非0元素置为1的张量;Determine a corresponding feature element sub-tensor according to the feature sub-tensor, wherein the feature element sub-tensor is a tensor with a non-zero element of the feature sub-tensor set to 1;
根据所述特征变换矩阵、所述特征元子张量及其对应的非0元素,确定所述特征子张量的变换结果;Determine the transformation result of the feature sub-tensor according to the feature transformation matrix, the feature element sub-tensor and the corresponding non-zero elements;
对所述特征子张量的变换结果进行求和,得到所述特征变换结果;Summing the transformation results of the feature subtensors to obtain the feature transformation results;
所述逆变换模块具体用于:The inverse transform module is specifically used for:
根据所述结果子张量确定对应的结果元子张量,其中,所述结果元子张量是将所述结果子张量的非0元素置为1的张量;Determine a corresponding result element sub-tensor according to the result sub-tensor, wherein the result element sub-tensor is a tensor in which a non-zero element of the result sub-tensor is set to 1;
根据所述逆变换矩阵、所述结果元子张量及其对应的非0元素,确定所述结果子张量的变换结果;Determine the transformation result of the result sub-tensor according to the inverse transformation matrix, the result element sub-tensor and the corresponding non-zero elements;
对所述结果子张量的变换结果进行求和,得到所述运算结果。Summing the transformation results of the resultant sub-tensor to obtain the operation result.
A17,根据条款A16所述的装置,所述特征变换模块具体用于:A17. According to the device described in clause A16, the feature transformation module is specifically configured to:
对所述特征元子张量左边乘以所述特征变换矩阵中的左乘矩阵、右边乘以所述特征变换矩阵中的右乘矩阵,并将结果与所述特征元子张量对应的非0元素相乘,得到所述特征子张量的变换结果;Multiply the left side of the feature element sub-tensor by the left multiplication matrix in the feature transformation matrix, and multiply the right side by the right multiplication matrix in the feature transformation matrix, and the result will be the non-correspondence of the feature element sub-tensor. Multiply the 0 elements to obtain the transformation result of the feature sub-tensor;
其中,所述特征元子张量中的左乘矩阵和所述右乘矩阵都是由所述特征子张量的规模确定的;Wherein, the left multiplication matrix and the right multiplication matrix in the feature element sub-tensor are both determined by the scale of the feature sub-tensor;
所述逆变换模块具体用于:The inverse transform module is specifically used for:
对所述结果元子张量左边乘以所述逆变换矩阵中的左乘矩阵、右边乘以所述逆变换矩阵中的右乘矩阵,并将结果与所述结果元子张量对应的非0元素相乘,得到所述结果子张量的变换结果;Multiply the left side of the result element sub-tensor by the left multiplying matrix in the inverse transformation matrix, and multiply the right side by the right multiplying matrix in the inverse transformation matrix, and the result is the non-sub-tensor corresponding to the result element. Multiply the 0 elements to obtain the transformation result of the resulting sub-tensor;
其中,所述结果元子张量中的左乘矩阵和所述右乘矩阵都是由所述结果子张量的规模确定的。Wherein, the left multiplication matrix and the right multiplication matrix in the result element sub-tensor are both determined by the scale of the result sub-tensor.
A18,根据条款A13所述的装置,所述对位乘模块具体用于:A18. According to the device described in clause A13, the alignment multiplication module is specifically configured to:
获取本层卷积网络的权值数据以及用于对该权值数据进行正变换的权值变换矩阵;Obtain the weight data of the convolutional network of this layer and the weight transformation matrix used for positive transformation of the weight data;
根据所述权值变换矩阵对所述权值数据进行变换得到权值变换结果;其中,将所述权值数据的变换运算拆解为求和运算,并根据所述求和运算确定所述权值变换结果。The weight data is transformed according to the weight transformation matrix to obtain the weight transformation result; wherein, the transformation operation of the weight data is disassembled into a summation operation, and the weight is determined according to the summation operation. The result of the value transformation.
A19,根据条款A18所述的装置,所述对位乘模块具体用于:A19. According to the device described in clause A18, the alignment multiplication module is specifically configured to:
将所述权值数据拆解为多个权值子张量;Disassemble the weight data into multiple weight sub-tensors;
根据所述权值变换矩阵对所述多个权值子张量进行变换运算并求和,得到所述权值变换结果。Perform a transformation operation on the multiple weight sub-tensors according to the weight transformation matrix and sum them to obtain the weight transformation result.
A20,根据条款A19所述的装置,所述对位乘模块具体用于:A20. According to the device described in clause A19, the alignment multiplication module is specifically configured to:
根据所述权值子张量确定对应的权值元子张量,其中,所述权值元子张量是将所述权值子张量的非0元素置为1的张量;Determine the corresponding weight sub-tensor according to the weight sub-tensor, wherein the weight sub-tensor is a tensor in which non-zero elements of the weight sub-tensor are set to 1;
根据所述权值变换矩阵、所述权值元子张量及其对应的非0元素,确定所述权值子张量的变换结果;Determining the transformation result of the weight sub-tensor according to the weight transformation matrix, the weight element sub-tensor and the corresponding non-zero elements;
对所述权值子张量的变换结果进行求和,得到所述权值变换结果。Sum the transformation results of the weight sub-tensors to obtain the weight transformation results.
A21,根据条款A20所述的装置,所述对位乘模块具体用于:A21. According to the device described in clause A20, the alignment multiplication module is specifically configured to:
对所述权值元子张量左边乘以所述权值变换矩阵中的左乘矩阵、右边乘以所述权值变换矩阵中的右乘矩阵,并将结果与所述权值元子张量对应的非0元素相乘,得到所述权值子张量的变换结果;Multiply the left side of the weight element sub-tensor by the left multiplication matrix in the weight transformation matrix, and multiply the right side by the right multiplication matrix in the weight transformation matrix, and compare the result with the weight element sub-tensor Multiply the non-zero elements corresponding to the quantity to obtain the transformation result of the weight sub-tensor;
其中,所述权值元子张量中的左乘矩阵和所述右乘矩阵都是由所述权值子张量的规模确定的。Wherein, the left multiplication matrix and the right multiplication matrix in the weight sub-tensor are both determined by the scale of the weight sub-tensor.
A22,一种人工智能芯片,所述芯片包括如条款A13-A21中任意一项所述的运算装置。A22. An artificial intelligence chip, the chip comprising the computing device according to any one of clauses A13-A21.
A23,一种电子设备,所述电子设备包括如条款A22所述的人工智能芯片。A23. An electronic device including the artificial intelligence chip as described in clause A22.
A24,一种板卡,所述板卡包括:存储器件、接口装置和控制器件以及如条款A22所述的人工智能芯片;A24. A board card, the board card comprising: a storage device, an interface device, a control device, and the artificial intelligence chip as described in clause A22;
其中,所述人工智能芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;Wherein, the artificial intelligence chip is connected to the storage device, the control device, and the interface device respectively;
所述存储器件,用于存储数据;The storage device is used to store data;
所述接口装置,用于实现所述人工智能芯片与外部设备之间的数据传输;The interface device is used to implement data transmission between the artificial intelligence chip and external equipment;
所述控制器件,用于对所述人工智能芯片的状态进行监控。The control device is used to monitor the state of the artificial intelligence chip.
A25,根据条款A24所述的板卡,所述存储器件包括:多组存储单元,每一组所述存储单元与所述人工智能芯片通过总线连接,所述存储单元为:DDR SDRAM;A25. The board according to clause A24, the storage device includes: multiple groups of storage units, each group of the storage units is connected to the artificial intelligence chip through a bus, and the storage unit is: DDR SDRAM;
所述芯片包括:DDR控制器,用于对每个所述存储单元的数据传输与数据存储的控制;The chip includes: a DDR controller, which is used to control the data transmission and data storage of each storage unit;
所述接口装置为:标准PCIE接口。The interface device is: a standard PCIE interface.
以上对本公开实施例进行了详细介绍,本文中应用了具体个例对本公开的原理及实施方式进行了阐述,以上实施例的说明仅用于帮助理解本公开的方法及其核心思想。同时,本领域技术人员依据本公开的思想,基于本公开的具体实施方式及应用范围上做出的改变或变形之处,都属于本公开保护的范围。综上所述,本说明书内容不应理解为对本公开的限制。The embodiments of the present disclosure are described in detail above, and specific examples are used in this article to illustrate the principles and implementations of the present disclosure. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the present disclosure. At the same time, changes or modifications made by those skilled in the art based on the ideas of the present disclosure, the specific embodiments and the scope of application of the present disclosure, are all within the protection scope of the present disclosure. In summary, the content of this specification should not be construed as a limitation of this disclosure.
基于上述实施例提供的方案,能够有效提高神经网络处理的效率。在实际应用中,以上方案可以基于各种架构实现,例如,通过主从架构或者通用架构实现。Based on the solutions provided by the foregoing embodiments, the efficiency of neural network processing can be effectively improved. In practical applications, the above solutions can be implemented based on various architectures, for example, through a master-slave architecture or a general architecture.

Claims (25)

  1. 一种运算方法,其特征在于,包括:An operation method, characterized in that it includes:
    获取上层卷积网络输出的特征数据以及用于对该特征数据进行正变换的特征变换矩阵;Obtain the feature data output by the upper layer convolutional network and the feature transformation matrix used for positive transformation of the feature data;
    根据所述特征变换矩阵对所述特征数据进行变换得到特征变换结果;其中,将所述特征数据的变换运算拆解为求和运算,并根据所述求和运算确定所述特征变换结果;Transforming the feature data according to the feature transformation matrix to obtain a feature transformation result; wherein the transformation operation of the feature data is disassembled into a summation operation, and the feature transformation result is determined according to the summation operation;
    获取本层卷积网络经过正变换的权值变换结果,对所述特征变换结果和所述权值变换结果进行对位乘法运算,得到乘法运算结果;Acquiring the weight transformation result of the positive transformation of the convolutional network of this layer, and performing a bitwise multiplication operation on the feature transformation result and the weight transformation result to obtain the multiplication operation result;
    获取用于对所述乘法运算结果进行逆变换的逆变换矩阵,根据所述逆变换矩阵对所述乘法运算结果进行变换得到运算结果;其中,将所述乘法运算结果的变换运算拆解为求和运算,并根据所述求和运算确定所述运算结果;Obtain an inverse transformation matrix for inverse transformation of the multiplication operation result, and transform the multiplication operation result according to the inverse transformation matrix to obtain an operation result; wherein, the transformation operation of the multiplication operation result is decomposed into Sum operation, and determine the operation result according to the sum operation;
    将所述运算结果输出至下层卷积网络。The result of the operation is output to the lower layer convolutional network.
  2. 根据权利要求1所述的方法,其特征在于,所述将所述特征数据的变换运算拆解为求和运算,并根据所述求和运算确定所述特征变换结果,包括:The method according to claim 1, wherein the disassembling the transformation operation of the characteristic data into a summation operation, and determining the characteristic transformation result according to the summation operation, comprises:
    将所述特征数据拆解为多个特征子张量;Disassemble the feature data into multiple feature sub-tensors;
    根据所述特征变换矩阵对所述多个特征子张量进行变换运算并求和,得到所述特征变换结果;Performing a transformation operation on the multiple feature sub-tensors according to the feature transformation matrix and summing them to obtain the feature transformation result;
    所述将所述乘法运算结果的变换运算拆解为求和运算,并根据所述求和运算确定所述运算结果,包括:The disassembling the transformation operation of the multiplication operation result into a summation operation and determining the operation result according to the summation operation includes:
    将所述乘法运算结果拆解为多个结果子张量;Disassemble the multiplication operation result into multiple result sub-tensors;
    根据所述逆变换矩阵对所述多个结果子张量进行变换运算并求和,得到所述运算结果。Perform a transformation operation on the multiple resultant sub-tensors according to the inverse transformation matrix and sum them to obtain the operation result.
  3. 根据权利要求2所述的方法,其特征在于,多个所述特征子张量之和为所述特征数据;多个所述结果子张量之和为所述乘法运算结果;3. The method according to claim 2, wherein the sum of a plurality of the feature sub-tensors is the feature data; the sum of a plurality of the result sub-tensors is the result of the multiplication operation;
    所述多个特征子张量的个数与所述特征数据中非0元素的个数相同,每个所述特征子张量中有单个非0元素,且在所述特征子张量中的非0元素与在所述特征数据中对应位置的非0元素相同;The number of the multiple feature sub-tensors is the same as the number of non-zero elements in the feature data, each feature sub-tensor has a single non-zero element, and the non-zero elements in the feature sub-tensor are the same as The non-zero elements at corresponding positions in the feature data are the same;
    所述多个结果子张量的个数与所述乘法运算结果中非0元素的个数相同,每个所述结果子张量中有单个非0元素,且在所述结果子张量中的非0元素与在所述乘法运算结果中对应位置的非0元素相同。The number of the multiple resultant sub-tensors is the same as the number of non-zero elements in the result of the multiplication operation, each of the resultant sub-tensors has a single non-zero element, and there are non-zero elements in the resultant sub-tensor It is the same as the non-zero element in the corresponding position in the multiplication operation result.
  4. 根据权利要求2所述的方法,其特征在于,所述根据所述特征变换矩阵对所述多个特征子张量进行变换运算并求和,得到所述特征变换结果,包括:3. The method according to claim 2, wherein the performing a transformation operation on the plurality of feature sub-tensors according to the feature transformation matrix and summing them to obtain the feature transformation result comprises:
    根据所述特征子张量确定对应的特征元子张量,其中,所述特征元子张量是将所述特征子张量的非0元素置为1的张量;Determine a corresponding feature element sub-tensor according to the feature sub-tensor, wherein the feature element sub-tensor is a tensor with a non-zero element of the feature sub-tensor set to 1;
    根据所述特征变换矩阵、所述特征元子张量及其对应的非0元素,确定所述特征子张量的变换结果;Determine the transformation result of the feature sub-tensor according to the feature transformation matrix, the feature element sub-tensor and the corresponding non-zero elements;
    对所述特征子张量的变换结果进行求和,得到所述特征变换结果;Summing the transformation results of the feature subtensors to obtain the feature transformation results;
    所述根据所述逆变换矩阵对所述多个结果征子张量进行变换运算并求和,得到所述运算结果,包括:The performing a transformation operation on the multiple result levy tensors according to the inverse transformation matrix and summing them to obtain the operation result includes:
    根据所述结果子张量确定对应的结果元子张量,其中,所述结果元子张量是将所述结果子张量的非0元素置为1的张量;Determine a corresponding result element sub-tensor according to the result sub-tensor, wherein the result element sub-tensor is a tensor in which a non-zero element of the result sub-tensor is set to 1;
    根据所述逆变换矩阵、所述结果元子张量及其对应的非0元素,确定所述结果子张量的变换结果;Determine the transformation result of the result sub-tensor according to the inverse transformation matrix, the result element sub-tensor and the corresponding non-zero elements;
    对所述结果子张量的变换结果进行求和,得到所述运算结果。Summing the transformation results of the resultant sub-tensor to obtain the operation result.
  5. 根据权利要求4所述的方法,其特征在于,所述根据所述特征变换矩阵、所述特征元子张量及其对应的非0元素,确定所述特征子张量的变换结果,包括:The method according to claim 4, wherein the determining the transformation result of the feature sub-tensor according to the feature transformation matrix, the feature element sub-tensor and its corresponding non-zero elements, comprises:
    对所述特征元子张量左边乘以所述特征变换矩阵中的左乘矩阵、右边乘以所述特征变换矩阵中的右乘矩阵,并将结果与所述特征元子张量对应的非0元素相乘,得到所述特征子张量的变换结果;Multiply the left side of the feature element sub-tensor by the left multiplication matrix in the feature transformation matrix, and multiply the right side by the right multiplication matrix in the feature transformation matrix, and the result will be the non-correspondence of the feature element sub-tensor. Multiply the 0 elements to obtain the transformation result of the feature sub-tensor;
    其中,所述特征元子张量中的左乘矩阵和所述右乘矩阵都是由所述特征子张量的规模确定的;Wherein, the left multiplication matrix and the right multiplication matrix in the feature element sub-tensor are both determined by the scale of the feature sub-tensor;
    所述根据所述逆变换矩阵、所述结果元子张量及其对应的非0元素,确定所述结果子张量的变换结果,包括:The determining the transformation result of the result sub-tensor according to the inverse transformation matrix, the result element sub-tensor and the corresponding non-zero elements includes:
    对所述结果元子张量左边乘以所述逆变换矩阵中的左乘矩阵、右边乘以所述逆变换矩阵中的右乘矩阵,并将结果与所述结果元子张量对应的非0元素相乘,得到所述结果子张量的变换结果;Multiply the left side of the result element sub-tensor by the left multiplying matrix in the inverse transformation matrix, and multiply the right side by the right multiplying matrix in the inverse transformation matrix, and the result is the non-sub-tensor corresponding to the result element. Multiply the 0 elements to obtain the transformation result of the resulting sub-tensor;
    其中,所述结果元子张量中的左乘矩阵和所述右乘矩阵都是由所述结果子张量的规模确定的。Wherein, the left multiplication matrix and the right multiplication matrix in the result element sub-tensor are both determined by the scale of the result sub-tensor.
  6. 根据权利要求1所述的方法,其特征在于,所述获取本层卷积网络经过正变换的权值变换结果,包括:The method according to claim 1, wherein the obtaining the weight transformation result of the current layer of the convolutional network after the positive transformation comprises:
    获取本层卷积网络的权值数据以及用于对该权值数据进行正变换的权值变换矩阵;Obtain the weight data of the convolutional network of this layer and the weight transformation matrix used for positive transformation of the weight data;
    根据所述权值变换矩阵对所述权值数据进行变换得到权值变换结果;其中,将所述权值数据的变换运算拆解为求和运算,并根据所述求和运算确定所述权值变换结果。The weight data is transformed according to the weight transformation matrix to obtain the weight transformation result; wherein, the transformation operation of the weight data is disassembled into a summation operation, and the weight is determined according to the summation operation. The result of the value transformation.
  7. 根据权利要求6所述的方法,其特征在于,所述将所述权值数据的变换运算拆解为求和运算,并根据所述求和运算确定所述权值变换结果,包括:The method according to claim 6, wherein the disassembling the transformation operation of the weight data into a summation operation, and determining the weight transformation result according to the summation operation, comprises:
    将所述权值数据拆解为多个权值子张量;Disassemble the weight data into multiple weight sub-tensors;
    根据所述权值变换矩阵对所述多个权值子张量进行变换运算并求和,得到所述权值变换结果。Perform a transformation operation on the multiple weight sub-tensors according to the weight transformation matrix and sum them to obtain the weight transformation result.
  8. 根据权利要求7所述的方法,其特征在于,所述根据所述权值变换矩阵对所述多个权值子张量进行变换运算并求和,得到所述权值变换结果,包括:8. The method according to claim 7, wherein the performing a transformation operation on the multiple weight sub-tensors and summing them according to the weight transformation matrix to obtain the weight transformation result comprises:
    根据所述权值子张量确定对应的权值元子张量,其中,所述权值元子张量是将所述权值子张量的非0元素置为1的张量;Determine the corresponding weight sub-tensor according to the weight sub-tensor, wherein the weight sub-tensor is a tensor in which non-zero elements of the weight sub-tensor are set to 1;
    根据所述权值变换矩阵、所述权值元子张量及其对应的非0元素,确定所述权值子张量的变换结果;Determining the transformation result of the weight sub-tensor according to the weight transformation matrix, the weight element sub-tensor and the corresponding non-zero elements;
    对所述权值子张量的变换结果进行求和,得到所述权值变换结果。Sum the transformation results of the weight sub-tensors to obtain the weight transformation results.
  9. 根据权利要求8所述的方法,其特征在于,所述根据所述权值变换矩阵、所述权值元子张量及其对应的非0元素,确定所述权值子张量的变换结果,包括:8. The method according to claim 8, wherein said determining the transformation result of said weight sub-tensor according to said weight transformation matrix, said weight element sub-tensor and its corresponding non-zero elements ,include:
    对所述权值元子张量左边乘以所述权值变换矩阵中的左乘矩阵、右边乘以所述权值变换矩阵中的右乘矩阵,并将结果与所述权值元子张量对应的非0元素相乘,得到所述权值子张量的变换结果;Multiply the left side of the weight element sub-tensor by the left multiplication matrix in the weight transformation matrix, and multiply the right side by the right multiplication matrix in the weight transformation matrix, and compare the result with the weight element sub-tensor Multiply the non-zero elements corresponding to the quantity to obtain the transformation result of the weight sub-tensor;
    其中,所述权值元子张量中的左乘矩阵和所述右乘矩阵都是由所述权值子张量的规模确定的。Wherein, the left multiplication matrix and the right multiplication matrix in the weight sub-tensor are both determined by the scale of the weight sub-tensor.
  10. 根据权利要求1-9任一项所述的方法,其特征在于,所述方法应用于主从处理架构;所述主从处理架构包括:主功能单元以及至少一个从功能单元。The method according to any one of claims 1-9, wherein the method is applied to a master-slave processing architecture; the master-slave processing architecture comprises: a master functional unit and at least one slave functional unit.
  11. 根据权利要求10所述的方法,其特征在于,所述主功能单元根据所述特征变换矩阵对所述特征数据进行变换得到特征变换结果;其中,将所述特征数据的变换运算拆解为求和运算,并根据所述求和运算确定所述特征变换结果;The method according to claim 10, wherein the main function unit transforms the feature data according to the feature transformation matrix to obtain a feature transformation result; wherein the transformation operation of the feature data is disassembled into Sum operation, and determine the feature transformation result according to the sum operation;
    所述从功能单元获取本层卷积网络经过正变换的权值变换结果,对所述特征变换结果和所述权值变换结果进行对位乘法运算,得到乘法运算结果;Acquiring, by the functional unit, the weight transformation result of the current layer of the convolutional network through the forward transformation, and performing a bitwise multiplication operation on the feature transformation result and the weight transformation result to obtain the multiplication operation result;
    所述从功能单元获取用于对所述乘法运算结果进行逆变换的逆变换矩阵,根据所述逆变换矩阵对所述乘法运算结果进行变换得到运算结果;其中,将所述乘法运算结果的变换运算拆解为求和运算,并根据所述求和运算确定所述运算结果。The slave functional unit obtains an inverse transformation matrix for inversely transforming the multiplication operation result, and transforms the multiplication operation result according to the inverse transformation matrix to obtain an operation result; wherein, the multiplication operation result is transformed The operation is disassembled into a summation operation, and the operation result is determined according to the summation operation.
  12. 根据权利要求11所述的方法,其特征在于,所述主功能单元和所述从功能单元的运算过程为并行运算,在所述主功能单元对特征数据中各元素位置的变换结果值计算完成之前,所述从功能单元针对已计算出的特征变换结果值的元素位置,执行所述元素位置下特征变换结果与权值变换结果的对位乘法运算,直至计算出各元素位置的对位乘法运算值,获得乘法运算结果。The method according to claim 11, wherein the operation process of the main functional unit and the slave functional unit is a parallel operation, and the calculation of the transformation result value of the position of each element in the characteristic data by the main functional unit is completed Previously, the slave functional unit performs the alignment multiplication operation of the feature transformation result and the weight transformation result under the element position for the element position of the calculated feature transformation result value, until the alignment multiplication of each element position is calculated Calculate the value and obtain the result of the multiplication operation.
  13. 一种运算装置,其特征在于,包括:An arithmetic device, characterized in that it comprises:
    获取模块,用于获取上层卷积网络输出的特征数据以及用于对该特征数据进行正变换的特征变换矩阵;An acquisition module, used to acquire the feature data output by the upper layer convolutional network and the feature transformation matrix used to perform positive transformation on the feature data;
    特征变换模块,用于根据所述特征变换矩阵对所述特征数据进行变换得到特征变换结果;其中,将所述特征数据的变换运算拆解为求和运算,并根据所述求和运算确定所述特征变换结果;The feature transformation module is used to transform the feature data according to the feature transformation matrix to obtain a feature transformation result; wherein the transformation operation of the feature data is disassembled into a summation operation, and the summation operation is determined according to the summation operation. The result of feature transformation;
    对位乘模块,用于获取本层卷积网络经过正变换的权值变换结果,对所述特征变换结果和所述权值变换结果进行对位乘法运算,得到乘法运算结果;The bit multiplication module is used to obtain the weight conversion result of the positive transformation of the convolutional network of this layer, and perform bit multiplication on the feature conversion result and the weight conversion result to obtain the multiplication result;
    逆变换模块,用于获取用于对所述乘法运算结果进行逆变换的逆变换矩阵,根据所述逆变换矩阵对所述乘法运算结果进行变换得到运算结果;其中,将所述乘法运算结果的变换运算拆解为求和运算,并根据所述求和运算确定所述运算结果;The inverse transformation module is used to obtain an inverse transformation matrix for inverse transformation of the multiplication operation result, and transform the multiplication operation result according to the inverse transformation matrix to obtain the operation result; wherein The transformation operation is disassembled into a summation operation, and the operation result is determined according to the summation operation;
    传输模块,用于将所述运算结果输出至下层卷积网络。The transmission module is used to output the calculation result to the lower layer convolutional network.
  14. 根据权利要求13所述的装置,其特征在于,The device of claim 13, wherein:
    所述特征变换模块具体用于:The feature transformation module is specifically used for:
    将所述特征数据拆解为多个特征子张量;Disassemble the feature data into multiple feature sub-tensors;
    根据所述特征变换矩阵对所述多个特征子张量进行变换运算并求和,得到所述特征变换结果;Performing a transformation operation on the multiple feature sub-tensors according to the feature transformation matrix and summing them to obtain the feature transformation result;
    所述逆变换模块具体用于:The inverse transform module is specifically used for:
    将所述乘法运算结果拆解为多个结果子张量;Disassemble the multiplication operation result into multiple result sub-tensors;
    根据所述逆变换矩阵对所述多个结果子张量进行变换运算并求和,得到所述运算结果。Perform a transformation operation on the multiple resultant sub-tensors according to the inverse transformation matrix and sum them to obtain the operation result.
  15. 根据权利要求14所述的装置,其特征在于,多个所述特征子张量之和为所述特征数据;多个所述结果子张量之和为所述乘法运算结果;The device according to claim 14, wherein the sum of a plurality of the feature sub-tensors is the feature data; the sum of a plurality of the result sub-tensors is the result of the multiplication operation;
    所述多个特征子张量的个数与所述特征数据中非0元素的个数相同,每个所述特征子张量中有单个非0元素,且在所述特征子张量中的非0元素与在所述特征数据中 对应位置的非0元素相同;The number of the multiple feature sub-tensors is the same as the number of non-zero elements in the feature data, each feature sub-tensor has a single non-zero element, and the non-zero elements in the feature sub-tensor are the same as The non-zero elements at corresponding positions in the feature data are the same;
    所述多个结果子张量的个数与所述乘法运算结果中非0元素的个数相同,每个所述结果子张量中有单个非0元素,且在所述结果子张量中的非0元素与在所述乘法运算结果中对应位置的非0元素相同。The number of the multiple resultant sub-tensors is the same as the number of non-zero elements in the result of the multiplication operation, each of the resultant sub-tensors has a single non-zero element, and there are non-zero elements in the resultant sub-tensor It is the same as the non-zero element in the corresponding position in the multiplication operation result.
  16. 根据权利要求14所述的装置,其特征在于,所述特征变换模块具体用于:The device according to claim 14, wherein the feature transformation module is specifically configured to:
    根据所述特征子张量确定对应的特征元子张量,其中,所述特征元子张量是将所述特征子张量的非0元素置为1的张量;Determine a corresponding feature element sub-tensor according to the feature sub-tensor, wherein the feature element sub-tensor is a tensor with a non-zero element of the feature sub-tensor set to 1;
    根据所述特征变换矩阵、所述特征元子张量及其对应的非0元素,确定所述特征子张量的变换结果;Determine the transformation result of the feature sub-tensor according to the feature transformation matrix, the feature element sub-tensor and the corresponding non-zero elements;
    对所述特征子张量的变换结果进行求和,得到所述特征变换结果;Summing the transformation results of the feature subtensors to obtain the feature transformation results;
    所述逆变换模块具体用于:The inverse transform module is specifically used for:
    根据所述结果子张量确定对应的结果元子张量,其中,所述结果元子张量是将所述结果子张量的非0元素置为1的张量;Determine a corresponding result element sub-tensor according to the result sub-tensor, wherein the result element sub-tensor is a tensor in which a non-zero element of the result sub-tensor is set to 1;
    根据所述逆变换矩阵、所述结果元子张量及其对应的非0元素,确定所述结果子张量的变换结果;Determine the transformation result of the result sub-tensor according to the inverse transformation matrix, the result element sub-tensor and the corresponding non-zero elements;
    对所述结果子张量的变换结果进行求和,得到所述运算结果。Summing the transformation results of the resultant sub-tensor to obtain the operation result.
  17. 根据权利要求16所述的装置,其特征在于,所述特征变换模块具体用于:The device according to claim 16, wherein the feature transformation module is specifically configured to:
    对所述特征元子张量左边乘以所述特征变换矩阵中的左乘矩阵、右边乘以所述特征变换矩阵中的右乘矩阵,并将结果与所述特征元子张量对应的非0元素相乘,得到所述特征子张量的变换结果;Multiply the left side of the feature element sub-tensor by the left multiplication matrix in the feature transformation matrix, and multiply the right side by the right multiplication matrix in the feature transformation matrix, and the result will be the non-correspondence of the feature element sub-tensor. Multiply the 0 elements to obtain the transformation result of the feature sub-tensor;
    其中,所述特征元子张量中的左乘矩阵和所述右乘矩阵都是由所述特征子张量的规模确定的;Wherein, the left multiplication matrix and the right multiplication matrix in the feature element sub-tensor are both determined by the scale of the feature sub-tensor;
    所述逆变换模块具体用于:The inverse transform module is specifically used for:
    对所述结果元子张量左边乘以所述逆变换矩阵中的左乘矩阵、右边乘以所述逆变换矩阵中的右乘矩阵,并将结果与所述结果元子张量对应的非0元素相乘,得到所述结果子张量的变换结果;Multiply the left side of the result element sub-tensor by the left multiplying matrix in the inverse transformation matrix, and multiply the right side by the right multiplying matrix in the inverse transformation matrix, and the result is the non-sub-tensor corresponding to the result element. Multiply the 0 elements to obtain the transformation result of the resulting sub-tensor;
    其中,所述结果元子张量中的左乘矩阵和所述右乘矩阵都是由所述结果子张量的规模确定的。Wherein, the left multiplication matrix and the right multiplication matrix in the result element sub-tensor are both determined by the scale of the result sub-tensor.
  18. 根据权利要求13所述的装置,其特征在于,所述对位乘模块具体用于:The device according to claim 13, wherein the alignment multiplication module is specifically configured to:
    获取本层卷积网络的权值数据以及用于对该权值数据进行正变换的权值变换矩阵;Obtain the weight data of the convolutional network of this layer and the weight transformation matrix used for positive transformation of the weight data;
    根据所述权值变换矩阵对所述权值数据进行变换得到权值变换结果;其中,将所述 权值数据的变换运算拆解为求和运算,并根据所述求和运算确定所述权值变换结果。Transform the weight data according to the weight transformation matrix to obtain a weight transformation result; wherein, the transformation operation of the weight data is disassembled into a summation operation, and the weight is determined according to the summation operation. The result of the value transformation.
  19. 根据权利要求18所述的装置,其特征在于,所述对位乘模块具体用于:The device according to claim 18, wherein the alignment multiplication module is specifically configured to:
    将所述权值数据拆解为多个权值子张量;Disassemble the weight data into multiple weight sub-tensors;
    根据所述权值变换矩阵对所述多个权值子张量进行变换运算并求和,得到所述权值变换结果。Perform a transformation operation on the multiple weight sub-tensors according to the weight transformation matrix and sum them to obtain the weight transformation result.
  20. 根据权利要求19所述的装置,其特征在于,所述对位乘模块具体用于:The device according to claim 19, wherein the alignment multiplication module is specifically configured to:
    根据所述权值子张量确定对应的权值元子张量,其中,所述权值元子张量是将所述权值子张量的非0元素置为1的张量;Determine the corresponding weight sub-tensor according to the weight sub-tensor, wherein the weight sub-tensor is a tensor in which non-zero elements of the weight sub-tensor are set to 1;
    根据所述权值变换矩阵、所述权值元子张量及其对应的非0元素,确定所述权值子张量的变换结果;Determining the transformation result of the weight sub-tensor according to the weight transformation matrix, the weight element sub-tensor and the corresponding non-zero elements;
    对所述权值子张量的变换结果进行求和,得到所述权值变换结果。Sum the transformation results of the weight sub-tensors to obtain the weight transformation results.
  21. 根据权利要求20所述的装置,其特征在于,所述对位乘模块具体用于:The device according to claim 20, wherein the alignment multiplication module is specifically configured to:
    对所述权值元子张量左边乘以所述权值变换矩阵中的左乘矩阵、右边乘以所述权值变换矩阵中的右乘矩阵,并将结果与所述权值元子张量对应的非0元素相乘,得到所述权值子张量的变换结果;Multiply the left side of the weight element sub-tensor by the left multiplication matrix in the weight transformation matrix, and multiply the right side by the right multiplication matrix in the weight transformation matrix, and compare the result with the weight element sub-tensor Multiply the non-zero elements corresponding to the quantity to obtain the transformation result of the weight sub-tensor;
    其中,所述权值元子张量中的左乘矩阵和所述右乘矩阵都是由所述权值子张量的规模确定的。Wherein, the left multiplication matrix and the right multiplication matrix in the weight sub-tensor are both determined by the scale of the weight sub-tensor.
  22. 一种人工智能芯片,其特征在于,所述芯片包括如权利要求13-21中任意一项所述的运算装置。An artificial intelligence chip, characterized in that the chip comprises the computing device according to any one of claims 13-21.
  23. 一种电子设备,其特征在于,所述电子设备包括如权利要求22所述的人工智能芯片。An electronic device, wherein the electronic device comprises the artificial intelligence chip according to claim 22.
  24. 一种板卡,其特征在于,所述板卡包括:存储器件、接口装置和控制器件以及如权利要求22所述的人工智能芯片;A board card, characterized in that the board card comprises: a storage device, an interface device, a control device, and the artificial intelligence chip according to claim 22;
    其中,所述人工智能芯片与所述存储器件、所述控制器件以及所述接口装置分别连接;Wherein, the artificial intelligence chip is connected to the storage device, the control device, and the interface device respectively;
    所述存储器件,用于存储数据;The storage device is used to store data;
    所述接口装置,用于实现所述人工智能芯片与外部设备之间的数据传输;The interface device is used to implement data transmission between the artificial intelligence chip and external equipment;
    所述控制器件,用于对所述人工智能芯片的状态进行监控。The control device is used to monitor the state of the artificial intelligence chip.
  25. 根据权利要求24所述的板卡,其特征在于,The board according to claim 24, characterized in that,
    所述存储器件包括:多组存储单元,每一组所述存储单元与所述人工智能芯片通过总线连接,所述存储单元为:DDR SDRAM;The storage device includes: multiple groups of storage units, each group of the storage unit is connected to the artificial intelligence chip through a bus, and the storage unit is: DDR SDRAM;
    所述芯片包括:DDR控制器,用于对每个所述存储单元的数据传输与数据存储的控制;The chip includes: a DDR controller, which is used to control the data transmission and data storage of each storage unit;
    所述接口装置为:标准PCIE接口。The interface device is: a standard PCIE interface.
PCT/CN2020/113166 2019-11-01 2020-09-03 Operation method and related product WO2021082724A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911061118.4 2019-11-01
CN201911061118.4A CN112784207B (en) 2019-11-01 2019-11-01 Operation method and related product

Publications (1)

Publication Number Publication Date
WO2021082724A1 true WO2021082724A1 (en) 2021-05-06

Family

ID=75715766

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/113166 WO2021082724A1 (en) 2019-11-01 2020-09-03 Operation method and related product

Country Status (2)

Country Link
CN (1) CN112784207B (en)
WO (1) WO2021082724A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160019456A1 (en) * 2014-07-16 2016-01-21 Qualcomm Incorporated Decomposing convolution operation in neural networks
CN108229654A (en) * 2016-12-14 2018-06-29 上海寒武纪信息科技有限公司 Neural network convolution algorithm device and method
CN108229656A (en) * 2016-12-14 2018-06-29 上海寒武纪信息科技有限公司 Neural network computing device and method
CN108549931A (en) * 2018-04-25 2018-09-18 济南浪潮高新科技投资发展有限公司 A kind of accelerator and method of convolutional neural networks
CN109523020A (en) * 2017-10-30 2019-03-26 上海寒武纪信息科技有限公司 A kind of arithmetic unit and method
CN109754064A (en) * 2017-11-07 2019-05-14 三星电子株式会社 The method and apparatus for executing the neural network of deconvolution

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11055063B2 (en) * 2016-05-02 2021-07-06 Marvell Asia Pte, Ltd. Systems and methods for deep learning processor
US20170344876A1 (en) * 2016-05-31 2017-11-30 Samsung Electronics Co., Ltd. Efficient sparse parallel winograd-based convolution scheme
WO2018107383A1 (en) * 2016-12-14 2018-06-21 上海寒武纪信息科技有限公司 Neural network convolution computation method and device, and computer-readable storage medium
US10482155B2 (en) * 2016-12-30 2019-11-19 Intel Corporation Winograd algorithm on a matrix processing architecture
US10990648B2 (en) * 2017-08-07 2021-04-27 Intel Corporation System and method for an optimized winograd convolution accelerator
US10372787B2 (en) * 2017-12-12 2019-08-06 Facebook, Inc. Hardware accelerator pre-configured with coefficients for matrix-transform operations
CN108388446A (en) * 2018-02-05 2018-08-10 上海寒武纪信息科技有限公司 Computing module and method
CN109325591B (en) * 2018-09-26 2020-12-29 中国科学院计算技术研究所 Winograd convolution-oriented neural network processor
CN109685201B (en) * 2018-12-14 2020-10-30 安徽寒武纪信息科技有限公司 Operation method, device and related product
CN110097172B (en) * 2019-03-18 2021-10-29 中国科学院计算技术研究所 Convolutional neural network data processing method and device based on Winograd convolutional operation
CN110188869B (en) * 2019-05-05 2021-08-10 北京中科汇成科技有限公司 Method and system for integrated circuit accelerated calculation based on convolutional neural network algorithm

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160019456A1 (en) * 2014-07-16 2016-01-21 Qualcomm Incorporated Decomposing convolution operation in neural networks
CN108229654A (en) * 2016-12-14 2018-06-29 上海寒武纪信息科技有限公司 Neural network convolution algorithm device and method
CN108229656A (en) * 2016-12-14 2018-06-29 上海寒武纪信息科技有限公司 Neural network computing device and method
CN109523020A (en) * 2017-10-30 2019-03-26 上海寒武纪信息科技有限公司 A kind of arithmetic unit and method
CN109754064A (en) * 2017-11-07 2019-05-14 三星电子株式会社 The method and apparatus for executing the neural network of deconvolution
CN108549931A (en) * 2018-04-25 2018-09-18 济南浪潮高新科技投资发展有限公司 A kind of accelerator and method of convolutional neural networks

Also Published As

Publication number Publication date
CN112784207B (en) 2024-02-02
CN112784207A (en) 2021-05-11

Similar Documents

Publication Publication Date Title
CN109543832B (en) Computing device and board card
CN109522052B (en) Computing device and board card
US20230026006A1 (en) Convolution computation engine, artificial intelligence chip, and data processing method
CN109670581B (en) Computing device and board card
CN110059797B (en) Computing device and related product
WO2021082725A1 (en) Winograd convolution operation method and related product
WO2021083101A1 (en) Data processing method and apparatus, and related product
CN109711540B (en) Computing device and board card
WO2021185262A1 (en) Computing apparatus and method, board card, and computer readable storage medium
WO2021082747A1 (en) Operational apparatus and related product
WO2021082746A1 (en) Operation apparatus and related product
CN109740730B (en) Operation method, device and related product
WO2021082723A1 (en) Operation apparatus
CN112084023A (en) Data parallel processing method, electronic equipment and computer readable storage medium
CN111143766A (en) Method and apparatus for processing two-dimensional complex matrix by artificial intelligence processor
CN111124995A (en) Method and apparatus for processing a one-dimensional complex array by an artificial intelligence processor
WO2021082724A1 (en) Operation method and related product
WO2021082721A1 (en) Winograd convolution operation method, apparatus, and device, and storage medium
CN111047005A (en) Operation method, operation device, computer equipment and storage medium
WO2021223642A1 (en) Data processing method and apparatus, and related product
WO2021082722A1 (en) Computing device and method, and related product
CN111061507A (en) Operation method, operation device, computer equipment and storage medium
WO2021223644A1 (en) Data processing method and device, and related product
WO2021223638A1 (en) Data processing method and device, and related product
WO2021169914A1 (en) Data quantification processing method and apparatus, electronic device and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20881646

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20881646

Country of ref document: EP

Kind code of ref document: A1