WO2021082747A1 - Appareil d'exploitation et produit associé - Google Patents

Appareil d'exploitation et produit associé Download PDF

Info

Publication number
WO2021082747A1
WO2021082747A1 PCT/CN2020/114057 CN2020114057W WO2021082747A1 WO 2021082747 A1 WO2021082747 A1 WO 2021082747A1 CN 2020114057 W CN2020114057 W CN 2020114057W WO 2021082747 A1 WO2021082747 A1 WO 2021082747A1
Authority
WO
WIPO (PCT)
Prior art keywords
sub
winograd
tensor
result
input data
Prior art date
Application number
PCT/CN2020/114057
Other languages
English (en)
Chinese (zh)
Inventor
张英男
江广
刘少礼
高钰峰
于涌
周徐达
Original Assignee
中科寒武纪科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中科寒武纪科技股份有限公司 filed Critical 中科寒武纪科技股份有限公司
Publication of WO2021082747A1 publication Critical patent/WO2021082747A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30036Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining

Definitions

  • the present disclosure relates to the field of data processing technology, and in particular to a computing device and related products.
  • neural network algorithm is a very popular machine learning algorithm recently, and has achieved very good results in various fields, such as image recognition, speech recognition, natural language processing, etc.
  • image recognition speech recognition
  • speech recognition natural language processing
  • the complexity of the algorithm is getting higher and higher.
  • the scale of the model is gradually increasing.
  • Using GPU and CPU to process these large-scale models requires a lot of computing time and consumes a lot of power.
  • an arithmetic device including: a master instruction processing unit, a master functional unit, a slave instruction processing unit, and a slave functional unit,
  • the main instruction processing unit is configured to send a first control signal to the main function unit according to the input instruction after receiving the input instruction;
  • the main functional unit is used to decompose the winograd positive transformation of the input data into a summation operation according to the first control signal, perform calculations to obtain the winograd positive transformation result of the input data, and send all the data to the slave functional unit.
  • the winograd positive transformation result of the input data is used to decompose the winograd positive transformation of the input data into a summation operation according to the first control signal, perform calculations to obtain the winograd positive transformation result of the input data, and send all the data to the slave functional unit.
  • the winograd positive transformation result of the input data includes the winograd positive transformation result of the input neuron
  • the master instruction processing unit is further configured to send a second control signal to the slave instruction processing unit, and the slave instruction processing unit is configured to send the second control signal to the slave functional unit;
  • the slave functional unit is configured to perform bit-multiplication on the winograd positive transformation result of the input neuron and the winograd positive transformation result of the weight value according to the second control signal to obtain the bit-multiply result, and to obtain the winograd result of the bit-multiply result.
  • the inverse transform is disassembled into a summation operation, and calculation is performed to obtain the winograd convolution result of the input data.
  • an artificial intelligence chip including the arithmetic device described above
  • an electronic device characterized in that the electronic device includes the artificial intelligence chip as described above.
  • the winograd forward transformation result of the input data is obtained by calculating the winograd forward transformation result of the input data by disassembling the winograd forward transformation of the input data into a summation operation, and the winograd forward transformation result of the input data includes the winograd of the input neuron The result of the positive transformation, and then multiply the result of the winograd positive transformation of the input neuron and the result of the weight of the winograd positive transformation to obtain the result of the parallel multiplication.
  • the inverse winograd transform of the result of the alignment multiply is disassembled into a summation operation, and Perform calculation to obtain the winograd convolution result of the input data. According to the arithmetic device of the present disclosure, disassembling the multiplication operation into a summation operation can save calculation time and reduce energy consumption.
  • Fig. 1 shows a block diagram of an arithmetic device according to an embodiment of the present disclosure
  • FIG. 2 shows a flowchart of a winograd positive transformation result of input data calculated according to a first subtensor according to an embodiment of the present disclosure
  • Fig. 3 shows a block diagram of an arithmetic device according to an embodiment of the present disclosure
  • Fig. 4 shows a flow chart of performing winograd inverse transformation on a bit multiplication result according to an embodiment of the present disclosure
  • Fig. 5 shows a schematic diagram of a processor according to an embodiment of the present disclosure
  • Fig. 6 shows a structural block diagram of a board according to an embodiment of the present disclosure.
  • the term “if” can be interpreted as “when” or “once” or “in response to determination” or “in response to detection” depending on the context.
  • the phrase “if determined” or “if detected [described condition or event]” can be interpreted as meaning “once determined” or “in response to determination” or “once detected [described condition or event]” depending on the context ]” or “in response to detection of [condition or event described]”.
  • Winograd convolution is a convolution acceleration implementation method based on polynomial interpolation algorithm. It passes the two inputs of the convolution operation: neurons and weights are divided into a certain scale and then linearly transformed (winograd positive transformation), and then the transformed neurons and weights are multiplied by bit, and finally the pair The bit multiplication result is linearly transformed again (winograd inverse transformation) to obtain a convolution result equivalent to the original convolution operation.
  • g represents the weight value
  • G represents the left multiplication positive transformation matrix corresponding to the weight value
  • G T represents the right multiplication positive transformation matrix corresponding to the weight value
  • d represents the input neuron
  • B represents the right multiplication positive transformation matrix corresponding to the input neuron
  • B T represents the left multiplication positive transformation matrix corresponding to the input neuron
  • A represents the right multiplication inverse transformation matrix
  • AT represents the left multiplication inverse transformation matrix.
  • the present disclosure provides an arithmetic device that can disassemble the multiplication operation in the winograd convolution process into an addition operation, thereby saving calculation time and reducing energy consumption.
  • Fig. 1 shows a block diagram of an arithmetic device according to an embodiment of the present disclosure.
  • the computing device provided by the present disclosure may include a master instruction processing unit, a master function unit, and a master memory unit, a slave instruction processing unit, a slave function unit, and a slave memory unit.
  • the main instruction processing unit is configured to send a first control signal to the main function unit according to the input instruction after receiving the input instruction;
  • the main functional unit is used to decompose the winograd positive transformation of the input data into a summation operation according to the first control signal, perform calculations to obtain the winograd positive transformation result of the input data, and send all the data to the slave functional unit.
  • the winograd positive transformation result of the input data is used to decompose the winograd positive transformation of the input data into a summation operation according to the first control signal, perform calculations to obtain the winograd positive transformation result of the input data, and send all the data to the slave functional unit.
  • the winograd positive transformation result of the input data includes the winograd positive transformation result of the input neuron
  • the master instruction processing unit is further configured to send a second control signal to the slave instruction processing unit, and the slave instruction processing unit is configured to send the second control signal to the slave functional unit;
  • the slave functional unit is configured to perform bit-multiplication on the winograd positive transformation result of the input neuron and the winograd positive transformation result of the weight value according to the second control signal to obtain the bit-multiply result, and to obtain the winograd result of the bit-multiply result.
  • the inverse transform is disassembled into a summation operation, and calculation is performed to obtain the winograd convolution result of the input data.
  • the winograd forward transformation result of the input data is obtained by calculating the winograd forward transformation result of the input data by disassembling the winograd forward transformation of the input data into a summation operation, and the winograd forward transformation result of the input data includes the winograd of the input neuron The result of the positive transformation, and then multiply the result of the winograd positive transformation of the input neuron and the result of the weight of the winograd positive transformation to obtain the result of the parallel multiplication.
  • the inverse winograd transform of the result of the alignment multiply is disassembled into a summation operation, and Perform calculation to obtain the winograd convolution result of the input data. According to the arithmetic device of the present disclosure, disassembling the multiplication operation into a summation operation can save calculation time and reduce energy consumption.
  • the above-mentioned input instruction may refer to the "WINO_CONV" instruction, which includes the processes of winograd forward transformation, bitwise multiplication, and winograd inverse transformation.
  • the input instruction may carry information about the operation and operand corresponding to the input instruction, and the operand information may include: the address information of the operand, the size of the operand, and so on.
  • the operand information can include the address information of the input neuron, the address information of the weight, and the output operand obtained (the input operand of the next layer, or the input neuron of the next layer) to be stored Address information and so on.
  • the input data of this layer can include input neurons
  • the input neurons of this layer can be the output results of the previous layer
  • the input neurons can be the initial Input data.
  • the initial input data can be image data, sound data, or video data.
  • winograd forward transformation bit multiplication
  • winograd inverse transformation The processes of winograd forward transformation, bit multiplication, and winograd inverse transformation are described below.
  • the main instruction processing unit is further configured to send a first control signal to the main memory unit according to the input instruction after receiving an input instruction; the main memory unit is configured to send a first control signal to the main functional unit according to the first control signal The input data.
  • the main instruction processing unit can decode and parse the input instruction to obtain the address information of the operation and the operand. Then the main instruction processing unit may send the first control signal to the main memory unit and the main function unit according to the information obtained by the analysis.
  • the main memory unit may obtain input data according to the first control signal.
  • the input data may include input neurons, etc., and the input data may be data represented in the form of a tensor.
  • the input data can be expressed in the form of NHWC (batch, height, width, channels), N represents the number of images, HW can respectively represent the number of pixels in the height and width directions of the image, C It can represent the number of channels, for example, C can represent three channels of RGB (Red, Green, Blue). It should be noted that the above representation is only an example of the present disclosure, and the present disclosure is not limited to this.
  • the main memory unit can send the input data to the main function unit after obtaining the input data.
  • the main functional unit After receiving the first control signal and the input data, the main functional unit can perform calculations according to the operation in the first control signal and the input data, and disassemble the winograd forward transformation of the input data into a summation operation, and perform calculations to obtain the The winograd of the input data is transforming the result.
  • the specific process may be that the main function unit is used to disassemble the input data into a plurality of first sub-tensors according to the first control signal, and perform winograd forward transformation on the plurality of first sub-tensors and sum them. Obtain the winograd positive transformation result of the input data.
  • the number of the plurality of first sub-tensors is the same as the number of elements of the input data.
  • one element in each first sub-tensor is the same as the element at the corresponding position in the input data, and other elements are all zero.
  • the number of the plurality of first sub-tensors is the same as the number of non-zero elements of the input data, and each of the plurality of first sub-tensors One element in the first sub-tensor is the same as the element at the corresponding position in the input data, and the other elements are all 0.
  • the input neuron is a 4 ⁇ 4 matrix including 16 elements. Therefore, the input data can be decomposed into 16 first sub-tensors.
  • the 16 first sub-tensors are:
  • each first subtensor there is an element in each first subtensor that is the same as the element at the corresponding position in the input data, and the other elements are all 0.
  • the above disassembly methods are only some examples of the present disclosure, and do not limit the present disclosure in any way.
  • the number of first subtensors obtained by the disassembly can be The number of elements less than the input data, for example, the number of multiple first subtensors is the same as the number of non-zero elements of the input data.
  • Fig. 2 shows a flow chart of a winograd positive transformation result of input data calculated according to a first subtensor according to an embodiment of the present disclosure.
  • performing winograd forward transformation on the multiple first subtensors and summing them to obtain the winograd forward transformation result of the input data may include the following process:
  • Step S21 Obtain the winograd positive transformation result of the first sub-tensor corresponding to the first sub-tensor; wherein, the first sub-tensor corresponding to the first sub-tensor is: the first position in the first sub-tensor The value of the element is 1, where the position of the first position in the first sub-tensor is the same as the position of the non-zero element in the first sub-tensor;
  • Step S22 multiplying the value of an element in the first sub-tensor that is not 0 as a coefficient by the winograd positive transformation result of the corresponding first-element sub-tensor to obtain the winograd positive transformation result of the first sub-tensor;
  • Step S23 adding the winograd positive transformation results of the multiple first subtensors to obtain the winograd positive transformation result of the input data.
  • the first-element sub-tensor corresponding to d 00 can be
  • the first sub-tensor is to extract the values of non-zero elements in the first sub-tensor, and the values of non-zero elements can be used as coefficients of the first sub-tensor.
  • the winograd positive transformation result of the first sub-tensor corresponding to the first sub-tensor can be obtained in advance through the following process: For each first sub-tensor, the first sub-tensor corresponding to the first sub-tensor The left side of the sub-tensor is multiplied by the positive transformation, the left-multiplied matrix, and the right is multiplied by the positive transformation, and the right-multiplied matrix is used to obtain the winograd positive transformation result of the first sub-tensor.
  • the form of the corresponding first element sub-tensor is determined, and the corresponding positive transformation left-multiplication matrix and forward transformation right-multiplication matrix are also determined.
  • the winograd positive transformation result of the first sub-tensor can be calculated in advance, and the specific process is as described above.
  • the corresponding winograd positive transformation result of the first sub-tensor is:
  • the winograd positive transformation result of the corresponding first-element sub-tensor is:
  • the matrix multiplication operation can be broken down into an addition operation.
  • the process of calculating the winograd positive transformation result of the first element sub-tensor involves more multiplication operations.
  • the pre-calculated winograd positive transformation results of the first element subtensor of various scales can be stored in In the computing device, in this way, in the actual computing process, it can be directly obtained without repeated computing, thereby shortening computing time and saving computing resources.
  • the value of the non-zero element in the first sub-tensor can be multiplied by the winograd positive transformation result of the corresponding first sub-tensor, then The winograd positive transformation result of the first subtensor can be obtained.
  • the corresponding winograd positive transformation result is:
  • the winograd positive transformation results of all the first sub-tensors are calculated through the above process, and the winograd positive transformation results of multiple first sub-tensors are added to obtain the winograd positive transformation results of the input data.
  • the winograd positive transformation result of the first sub-tensor corresponding to the first sub-tensor obtained in advance and the first The non-zero element value of the subtensor can be summed to obtain the winograd positive transformation result of the input data.
  • disassembling the multiplication operation into a summation operation can save calculation time and reduce energy consumption.
  • the main functional unit includes a cache module, and the main functional unit stores the winograd positive transformation result of the input data in the cache module.
  • Fig. 3 shows a block diagram of an arithmetic device according to an embodiment of the present disclosure.
  • the main function unit may include a main processing module and a cache module, and the main processing module may be used to execute the main function unit of all the above-mentioned embodiments to input data according to the first control signal.
  • the winograd positive transformation is decomposed into a summation operation, and the process of calculating the winograd positive transformation result of the input data is performed.
  • the main functional unit can store the winograd positive conversion result of the input data in the cache module, and the cache module sends the winograd positive conversion result of the input data to the slave functional unit, so that the slave functional unit of the computing device can check the winograd positive conversion result.
  • the transformation result is a process of bitwise multiplication and winograd inverse transformation.
  • the master instruction processing unit is also used to send a second control signal to the slave instruction processing unit.
  • the master instruction processing unit can The second control signal is sent to the slave instruction processing unit according to the WINO_MIT instruction, and the master functional unit (specifically, the cache module in the master functional unit) is also used to send the winograd positive transformation result of the input data to the slave functional unit.
  • the second control signal may be used to instruct the slave command processing unit to control the slave functional unit to further process the winograd conversion result of the input data.
  • the slave instruction processing unit is configured to receive the second control signal sent by the master instruction processing unit, and send the second control signal to the slave functional unit and the slave memory unit.
  • the slave memory unit is configured to send the winograd positive transformation result of the weight to the slave functional unit according to the second control signal.
  • the slave functional unit is used to receive the winograd positive transformation result of the input data sent by the main functional unit, and the winograd positive transformation result of the input data includes the winograd positive transformation result of the input neuron.
  • the winograd positive transformation result of the weight can be pre-calculated, and the calculation method of the winograd positive transformation result of the weight can adopt a traditional matrix multiplication operation, or refer to the disassembly mentioned above as a summation operation. Calculated.
  • the weights are disassembled into a plurality of first sub-tensors, and winograd positive transformations are performed on the plurality of first sub-tensors and summed to obtain the winograd positive transformation results of the weights.
  • a matrix with a weight of 3 ⁇ 3 includes 9 elements. Therefore, the input data can be decomposed into 9 first sub-tensors.
  • the 9 first sub-tensors are:
  • each first subtensor is the same as the element at the corresponding position in the weight, and the other elements are all zero.
  • step S21-step S23 the winograd positive transformation result of the weight can be calculated, which will not be repeated.
  • the functional unit is used to perform bit-multiplication on the winograd forward transformation result of the input neuron and the winograd forward transformation result of the weight value according to the second control signal to obtain the bit-on multiplication result, and to inverse the winograd result of the bit-on multiplication result
  • the transformation and disassembly are summation operations, and calculations are performed to obtain the winograd convolution result of the input data.
  • bitwise multiplication may refer to the data obtained by multiplying the data at the corresponding positions of the two tensors as the value of the corresponding position in the bitwise multiplication result.
  • Weighted winograd positive transformation result It can be expressed as:
  • the result of counter multiplication can be:
  • the winograd convolution result of the input data can be expressed as
  • the slave function processing unit of the present disclosure can combine
  • the disassembly is a summation operation, and calculation is performed to obtain the winograd convolution result of the input data, which can further save calculation time and reduce energy consumption.
  • the slave function unit is used to disassemble the alignment multiplication result into a plurality of second sub-tensors.
  • the multiple second sub-tensors are subjected to winograd inverse transformation and summed to obtain the winograd convolution result of the input data.
  • the number of the multiple second sub-tensors is the same as the number of elements of the parametric multiplication result, and each of the multiple second sub-tensors has one
  • the element is the same as the element at the corresponding position in the result of the bitwise multiplication, and the other elements are all 0.
  • the number of the plurality of second sub-tensors is the same as the number of non-zero elements in the result of the bitwise multiplication, and each second sub-tensor of the plurality of second sub-tensors One element is the same as the element at the corresponding position in the result of the bitwise multiplication, and the other elements are all 0.
  • the result of the alignment multiplication is disassembled into multiple second sub-tensors, for example, it can be disassembled into 16, and the 16 second sub-tensors are:
  • winograd inverse transformation can be performed on the multiple second sub-tensors and summed to obtain the winograd convolution result of the input data.
  • Fig. 4 shows a flow chart of performing winograd inverse transformation on a bit multiplication result according to an embodiment of the present disclosure.
  • performing winograd inverse transformation on the multiple second subtensors and summing them to obtain the winograd convolution result of the input data may include the following process:
  • Step S41 Obtain the winograd inverse transform result of the second sub-tensor corresponding to the second sub-tensor; wherein, the second sub-tensor corresponding to the second sub-tensor is: the second position in the second sub-tensor The value of the element is 1, where the position of the second position in the second sub-tensor is the same as the position of the non-zero element in the second sub-tensor;
  • Step S42 multiplying the non-zero element value of the second sub-tensor as a coefficient by the winograd inverse transform result of the corresponding second-element sub-tensor to obtain the winograd inverse transform result of the second sub-tensor;
  • Step S43 Add the winograd inverse transform results of the multiple second sub-tensors to obtain the winograd convolution result of the input data.
  • the method for determining the second meta-sub-tensor corresponding to the second sub-tensor is the same as the method for determining the first meta-sub-tensor above, and will not be repeated here.
  • the winograd inverse transform result of the second sub-tensor is obtained in advance through the following process: For each second sub-tensor, the left side of the second sub-tensor corresponding to the second sub-tensor is multiplied by the inverse transform Multiplying the matrix on the left, multiplying the matrix on the right by the inverse transformation, and multiplying the matrix on the right to obtain the winograd inverse transformation result of the second element subtensor.
  • the form of the corresponding second-element sub-tensor is determined, and the corresponding inverse transform left multiplication matrix and inverse transform right multiplication matrix are also determined. Therefore, the winograd inverse transformation result of the second sub-tensor can be calculated in advance, and the specific process is as described above.
  • the left multiplication matrix of the inverse transformation is a 2 ⁇ 4 matrix, for example:
  • the inverse transformation right multiplication matrix is a 4 ⁇ 2 matrix, for example:
  • the dimension of the inverse transformation matrix can be determined according to the dimension of the input neuron and the dimension of the weight value and the convolution step length.
  • the above is only an example, and the present disclosure is not limited in any way.
  • the inverse transformation matrix is given by Therefore, the matrix multiplication operation of the inverse transformation can be broken down into addition and shift operations. Multiply the inverse transformation matrix by the second-element sub-tensor to obtain the winograd inverse transformation result of the second-element sub-tensor.
  • the element value in the winograd inverse transformation result of the second-element sub-tensor is determined by With other configurations, fractions can be calculated by simple shift operations, which can still save calculation time compared to multiplication operations.
  • steps S42 and S43 please refer to the above steps S22 and S23, except that the winograd inverse transformation result of the second element subtensor is not completely 0, ⁇ 1, but the score can be calculated by a simple shift operation, Compared with the multiplication operation, the present disclosure can still achieve the effects of saving calculation time and reducing energy consumption after disassembling the ordinary inverse transformation process.
  • multiple second sub-tensors are obtained by disassembling the bit-multiplication results, and the winograd inverse transform results of the second-element sub-tensors corresponding to the second sub-tensors obtained in advance and The non-zero element value of the second subtensor can be summed to obtain the winograd convolution result of the input data.
  • disassembling the multiplication operation into a summation operation can save calculation time and reduce energy consumption.
  • the slave function unit may include an alignment multiplication module and an inverse transformation module, wherein the alignment multiplication module can be used to perform the above alignment multiplication operation to obtain the alignment.
  • the result of the bit multiplication can be sent to the inverse transform module, and the inverse transform module executes the above-mentioned winograd inverse transform of the result of the bit multiplication into a summation operation, and performs calculation to obtain the Winograd convolution result of input data.
  • the slave function unit is further configured to perform post-processing on the winograd convolution result of the input data, and the post-processing includes a bitwise rounding operation and a revolution operation.
  • the rounding operation may round the winograd convolution result of the input data according to the set number of rounding bits.
  • the number of revolution operations may refer to processing the arrangement of the winograd convolution result of the input data.
  • the arrangement of the winograd convolution result of the input data can be changed according to storage requirements. Post-processing the winograd convolution results of the input data is more conducive to subsequent operations and calculations.
  • the slave function unit is further configured to send the winograd convolution result of the input data to the main memory unit as the input neuron of the next layer of convolution operation.
  • the computing device can be applied to a processor, and the processor can be a general-purpose processor, such as a CPU (Central Processing Unit, central processing unit), or an artificial intelligence processor for performing artificial intelligence operations. (IPU).
  • Artificial intelligence operations can include machine learning operations, brain-like operations, and so on. Among them, machine learning operations include neural network operations, k-means operations, support vector machine operations, and so on.
  • the artificial intelligence processor may, for example, include GPU (Graphics Processing Unit), NPU (Neural-Network Processing Unit, neural network processing unit), DSP (Digital Signal Process, digital signal processing unit), field programmable gate array (Field-Programmable Gate Array, FPGA) One or a combination of chips.
  • GPU Graphics Processing Unit
  • NPU Neuro-Network Processing Unit
  • DSP Digital Signal Process, digital signal processing unit
  • field programmable gate array Field-Programmable Gate Array
  • FPGA Field-Programmable Gate Array
  • the processor mentioned in the present disclosure may include multiple processing units, and each processing unit can independently run various tasks assigned to it, such as convolution computing tasks and pooling tasks. Or fully connected tasks, etc.
  • the present disclosure does not limit the processing unit and the tasks run by the processing unit.
  • Fig. 5 shows a schematic diagram of a processor according to an embodiment of the present disclosure.
  • the processor is used to perform machine learning calculations.
  • the processor includes a controller unit 141 and an arithmetic unit 142.
  • the controller unit 141 is connected to the arithmetic unit 142.
  • the arithmetic unit 142 includes: a main processing circuit and Multiple slave processing circuits;
  • the controller unit 141 is configured to obtain input data and calculation instructions; the calculation instructions obtained by the controller unit 141 may be one or more operators in the first fusion set after the operators are fused by the first processor.
  • one main processing circuit and multiple slave processing circuits may have a tree structure, an H-type structure or a pulse array machine structure.
  • the present disclosure does not limit the connection between the main processing circuit and the slave processing circuit.
  • the input data and calculation instructions may be obtained through a data input and output unit, and the data input and output unit may specifically be one or more data I/O interfaces or I/O pins.
  • the above calculation instructions include but are not limited to: forward operation instructions or reverse training instructions, or other neural network operation instructions, such as convolution operation instructions, or may also be the “WINO_CONV” instruction mentioned above.
  • This application specifically The implementation manner does not limit the specific expression form of the foregoing calculation instruction.
  • the controller unit 141 is further configured to parse the calculation instruction to obtain multiple calculation instructions, and send the multiple calculation instructions and the input data to the main processing circuit;
  • the main processing circuit 101 is configured to perform pre-processing on the input data and transmit data and operation instructions with the plurality of slave processing circuits;
  • a plurality of slave processing circuits 102 are configured to perform intermediate operations in parallel to obtain a plurality of intermediate results according to the data and operation instructions transmitted from the main processing circuit, and transmit the plurality of intermediate results to the main processing circuit;
  • the main processing circuit 101 is configured to perform subsequent processing on the multiple intermediate results to obtain the calculation result of the calculation instruction.
  • the technical solution provided by this application sets the arithmetic unit into a master-multi-slave structure.
  • it can split the data according to the calculation instructions of the forward operation, so that multiple slave processing circuits can be used
  • the larger part of the calculation is performed in parallel, thereby increasing the speed of the calculation, saving the calculation time, and thereby reducing the power consumption.
  • the above-mentioned machine learning calculation may specifically include: artificial neural network operations
  • the above-mentioned input data may specifically include: input neuron data and weight data.
  • the above calculation result may specifically be: the result of the artificial neural network operation is the output neuron data.
  • the operation in the neural network can be the operation of one layer in the neural network.
  • the realization process is that in the forward operation, when the operation of the artificial neural network of the previous layer is completed, the operation of the next layer The instruction will operate the output neuron calculated in the computing unit as the input neuron of the next layer (or perform some operations on the output neuron and then use it as the input neuron of the next layer), and at the same time, set the weight It is also replaced with the weight of the next layer; in the reverse operation, when the reverse operation of the artificial neural network of the previous layer is completed, the next layer operation instruction will use the input neuron gradient calculated in the operation unit as the next The output neuron gradient of one layer is calculated (or some operation is performed on the input neuron gradient and then used as the output neuron gradient of the next layer), and the weights are replaced with the weights of the next layer.
  • the above-mentioned machine learning calculations may also include support vector machine calculations, k-nearest neighbor (k-nn) calculations, k-means calculations, principal component analysis calculations, and so on.
  • k-nn k-nearest neighbor
  • k-means k-means
  • the input neurons and output neurons of the multi-layer operation do not refer to the neurons in the input layer and the neurons in the output layer of the entire neural network.
  • the neuron in the lower layer of the network forward operation is the input neuron
  • the neuron in the upper layer of the network forward operation is the output neuron.
  • the foregoing processor may further include: the storage unit 140 and the direct memory access unit 50.
  • the storage unit 140 may include one or any combination of a register and a cache. Specifically, the cache is used to store the Calculation instruction; the register is used to store the input data and scalar; the cache is a high-speed temporary storage cache.
  • the direct memory access unit 50 is used to read or store data from the storage unit 10.
  • the controller unit includes: an instruction storage unit 410, an instruction processing unit 411, and a storage queue unit 413;
  • the instruction storage unit 410 is configured to store calculation instructions associated with the artificial neural network operation
  • the instruction processing unit 411 is configured to parse the calculation instructions to obtain multiple operation instructions
  • the storage queue unit 413 is configured to store an instruction queue, and the instruction queue includes: a plurality of operation instructions or calculation instructions to be executed in the sequence of the queue.
  • the main arithmetic processing circuit may also include a controller unit, and the controller unit may include a main instruction processing unit, which is specifically configured to decode instructions into micro instructions.
  • the slave operation processing circuit may also include another controller unit, and the other controller unit includes a slave instruction processing unit, which is specifically configured to receive and process micro instructions.
  • the above-mentioned micro-instructions may be the next-level instructions of the instructions.
  • the micro-instructions can be obtained by splitting or decoding the instructions, and can be further decoded into control signals of various components, units or processing circuits.
  • the structure of the calculation instruction may be as shown in the following table.
  • the ellipsis in the above table indicates that multiple registers or immediate data can be included.
  • the calculation instruction may include: one or more operation domains and an operation code.
  • the calculation instruction may include a neural network operation instruction. Taking neural network operation instructions as an example, as shown in Table 1, among them, register number 0, register number 1, register number 2, register number 3, and register number 4 can be operation domains. Among them, each register number 0, register number 1, register number 2, register number 3, and register number 4 may be the numbers of one or more registers.
  • the above-mentioned register can be an off-chip memory. Of course, in practical applications, it can also be an on-chip memory for storing data.
  • controller unit may further include:
  • the dependency processing unit 412 is configured to determine whether there is an association relationship between the first operation instruction and the zeroth operation instruction before the first operation instruction when there are multiple operation instructions, such as the first operation instruction and the first operation instruction. If there is an association relationship between the zero operation instruction, the first operation instruction is cached in the instruction storage unit, and after the zeroth operation instruction is executed, the first operation instruction is fetched from the instruction storage unit and transmitted to The arithmetic unit;
  • the determining whether there is an association relationship between the first operation instruction and the zeroth operation instruction before the first operation instruction includes:
  • the first operation instruction extract the first storage address interval of the required data (such as a matrix) in the first operation instruction, and extract the zeroth of the required matrix in the zeroth operation instruction according to the zeroth operation instruction
  • the storage address interval if the first storage address interval and the zeroth storage address interval have an overlapping area, it is determined that the first operation instruction and the zeroth operation instruction have an association relationship, as in the first storage If the address interval and the zeroth storage address interval do not have an overlapping area, it is determined that the first operation instruction and the zeroth operation instruction do not have an association relationship.
  • steps in the flowchart are displayed in sequence according to the directions of the arrows, these steps are not necessarily executed in the order indicated by the arrows. Unless specifically stated in this article, there is no strict order for the execution of these steps, and these steps can be executed in other orders. Moreover, at least part of the steps in the flowchart may include multiple sub-steps or multiple stages. These sub-steps or stages are not necessarily executed at the same time, but can be executed at different times. The execution of these sub-steps or stages The sequence is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
  • the foregoing device embodiments are only illustrative, and the device of the present disclosure may also be implemented in other ways.
  • the division of units/modules in the above-mentioned embodiments is only a logical function division, and there may be other division methods in actual implementation.
  • multiple units, modules or components may be combined or integrated into another system, or some features may be omitted or not implemented.
  • the functional units/modules in the various embodiments of the present disclosure may be integrated into one unit/module, or each unit/module may exist alone physically, or two or more units/modules may exist.
  • the modules are integrated together.
  • the above-mentioned integrated unit/module can be implemented in the form of hardware or software program module.
  • the hardware may be a digital circuit, an analog circuit, and so on.
  • the physical realization of the hardware structure includes but is not limited to transistors, memristors and so on.
  • the artificial intelligence processor may be any appropriate hardware processor, such as CPU, GPU, FPGA, DSP, ASIC, and so on.
  • the storage unit may be any suitable magnetic storage medium or magneto-optical storage medium, such as RRAM (Resistive Random Access Memory), DRAM (Dynamic Random Access Memory), Static random access memory SRAM (Static Random-Access Memory), enhanced dynamic random access memory EDRAM (Enhanced Dynamic Random Access Memory), high-bandwidth memory HBM (High-Bandwidth Memory), hybrid storage cube HMC (Hybrid Memory Cube), etc. Wait.
  • RRAM Resistive Random Access Memory
  • DRAM Dynamic Random Access Memory
  • Static random access memory SRAM Static Random-Access Memory
  • enhanced dynamic random access memory EDRAM Enhanced Dynamic Random Access Memory
  • high-bandwidth memory HBM High-Bandwidth Memory
  • hybrid storage cube HMC Hybrid Memory Cube
  • the integrated unit/module is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer readable memory.
  • the technical solution of the present disclosure essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory, It includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present disclosure.
  • the aforementioned memory includes: U disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.
  • an artificial intelligence chip is also disclosed, which includes the aforementioned computing device.
  • a board card which includes a storage device, an interface device, a control device, and the aforementioned artificial intelligence chip; wherein, the artificial intelligence chip is connected to the storage device and the control device And the interface devices are respectively connected; the storage device is used to store data; the interface device is used to realize data transmission between the artificial intelligence chip and an external device; the control device is used to The state of the artificial intelligence chip is monitored.
  • Fig. 6 shows a structural block diagram of a board card according to an embodiment of the present disclosure.
  • the board card may include other supporting components in addition to the chip 389 described above.
  • the supporting components include, but are not limited to: a storage device 390, Interface device 391 and control device 392;
  • the storage device 390 is connected to the artificial intelligence chip through a bus for storing data.
  • the storage device may include multiple groups of storage units 393. Each group of the storage unit and the artificial intelligence chip are connected through a bus. It can be understood that each group of the storage units may be DDR SDRAM (English: Double Data Rate SDRAM, double-rate synchronous dynamic random access memory).
  • the storage device may include 4 groups of the storage units. Each group of the storage unit may include a plurality of DDR4 particles (chips).
  • the artificial intelligence chip may include four 72-bit DDR4 controllers. In the 72-bit DDR4 controller, 64 bits are used for data transmission and 8 bits are used for ECC verification. It can be understood that when DDR4-3200 particles are used in each group of the storage units, the theoretical bandwidth of data transmission can reach 25600MB/s.
  • each group of the storage unit includes a plurality of double-rate synchronous dynamic random access memories arranged in parallel.
  • DDR can transmit data twice in one clock cycle.
  • a controller for controlling the DDR is provided in the chip, which is used to control the data transmission and data storage of each storage unit.
  • the interface device is electrically connected with the artificial intelligence chip.
  • the interface device is used to implement data transmission between the artificial intelligence chip and an external device (such as a server or a computer).
  • the interface device may be a standard PCIE interface.
  • the data to be processed is transferred from the server to the chip through a standard PCIE interface to realize data transfer.
  • the interface device may also be other interfaces. The present disclosure does not limit the specific manifestations of the other interfaces mentioned above, as long as the interface unit can realize the switching function.
  • the calculation result of the artificial intelligence chip is still transmitted by the interface device back to an external device (such as a server).
  • the control device is electrically connected with the artificial intelligence chip.
  • the control device is used to monitor the state of the artificial intelligence chip.
  • the artificial intelligence chip and the control device may be electrically connected through an SPI interface.
  • the control device may include a single-chip microcomputer (Micro Controller Unit, MCU).
  • MCU Micro Controller Unit
  • the artificial intelligence chip may include multiple processing chips, multiple processing cores, or multiple processing circuits, and can drive multiple loads. Therefore, the artificial intelligence chip can be in different working states such as multi-load and light-load.
  • the control device can realize the regulation and control of the working states of multiple processing chips, multiple processing and or multiple processing circuits in the artificial intelligence chip.
  • an electronic device which includes the aforementioned artificial intelligence chip.
  • Electronic equipment includes data processing devices, robots, computers, printers, scanners, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cloud servers, cameras, cameras, projectors, watches, headsets , Mobile storage, wearable devices, vehicles, household appliances, and/or medical equipment.
  • the transportation means include airplanes, ships, and/or vehicles;
  • the household appliances include TVs, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods;
  • the medical equipment includes nuclear magnetic resonance, B-ultrasound and/or electrocardiograph.
  • the embodiments of the present disclosure also provide a computer-readable storage medium on which computer program instructions are stored, and the computer program instructions implement the above-mentioned method when executed by a processor.
  • the computer-readable storage medium may be a non-volatile computer-readable storage medium.
  • An embodiment of the present disclosure also provides an electronic device, including: a processor; a memory for storing executable instructions of the processor; wherein the processor is configured to call the instructions stored in the memory to execute the above method.
  • An arithmetic device comprising: a master instruction processing unit, a master functional unit, a slave instruction processing unit, and a slave functional unit,
  • the main instruction processing unit is configured to send a first control signal to the main function unit according to the input instruction after receiving the input instruction;
  • the main functional unit is used to decompose the winograd positive transformation of the input data into a summation operation according to the first control signal, perform calculations to obtain the winograd positive transformation result of the input data, and send all the data to the slave functional unit.
  • the winograd positive transformation result of the input data is used to decompose the winograd positive transformation of the input data into a summation operation according to the first control signal, perform calculations to obtain the winograd positive transformation result of the input data, and send all the data to the slave functional unit.
  • the winograd positive transformation result of the input data includes the winograd positive transformation result of the input neuron
  • the master instruction processing unit is further configured to send a second control signal to the slave instruction processing unit, and the slave instruction processing unit is configured to send the second control signal to the slave functional unit;
  • the slave functional unit is configured to perform bit-multiplication on the winograd positive transformation result of the input neuron and the winograd positive transformation result of the weight value according to the second control signal to obtain the bit-multiply result, and to obtain the winograd result of the bit-multiply result.
  • the inverse transform is disassembled into a summation operation, and calculation is performed to obtain the winograd convolution result of the input data.
  • Clause A2 The computing device according to clause A1, wherein the main function unit is used to decompose the input data into a plurality of first sub-tensors according to the first control signal, Perform winograd positive transformation on the tensor and sum to obtain the winograd positive transformation result of the input data.
  • Clause A3 The computing device according to clause A2, wherein the input data is expressed in the form of a tensor, and the number of the plurality of first sub-tensors is the same as the number of non-zero elements of the input data, One element in each first sub-tensor of the plurality of first sub-tensors is the same as the element at the corresponding position in the input data, and other elements are all zero.
  • Clause A4 According to the arithmetic device described in Clause A3, performing winograd forward transformation on the plurality of first subtensors and summing them to obtain the winograd forward transformation result of the input data, including:
  • the winograd positive transformation result of the first sub-tensor corresponding to the first sub-tensor is: the value of the element at the first position in the first sub-tensor Is 1, where the position of the first position in the first sub-tensor is the same as the position of the non-zero element in the first sub-tensor;
  • the winograd positive transformation results of the multiple first subtensors are added to obtain the winograd positive transformation result of the input data.
  • Clause A6 The computing device according to clause A2, wherein the main functional unit includes a cache module, and the main functional unit stores the winograd conversion result of the input data in the cache module, and the cache module also uses To send the winograd positive transformation result of the input data to the slave functional unit.
  • Clause A7 The computing device according to clause A1, the computing device comprising: a main memory unit,
  • the main instruction processing unit is configured to send a first control signal to the main memory unit according to the input instruction after receiving the input instruction;
  • the main memory unit is configured to send the input data to the main function unit according to the first control signal.
  • Clause A9 The arithmetic device according to clause A8, wherein the number of the plurality of second sub-tensors is the same as the number of non-zero elements of the result of the bitwise multiplication, and each of the plurality of second sub-tensors One element of the second subtensor is the same as the element at the corresponding position in the result of the bitwise multiplication, and the other elements are all 0.
  • Clause A10 The computing device according to clause A9, performing winograd inverse transformation on the plurality of second subtensors and summing them to obtain the winograd convolution result of the input data, including:
  • the second sub-tensor corresponding to the second sub-tensor is: the value of the element at the second position in the second sub-tensor Is 1, where the position of the second position in the second sub-tensor is the same as the position of the non-zero element in the second sub-tensor;
  • the winograd inverse transform results of the multiple second subtensors are added to obtain the winograd convolution result of the input data.
  • the slave instruction processing unit is also used to send the second control signal to the slave memory unit;
  • the slave memory unit is configured to send the winograd positive transformation result of the weight to the slave functional unit according to the second control signal.
  • Clause A13 The computing device according to any one of clauses A7-A12, wherein the computing device further includes a main memory unit, and the slave function unit is further configured to send the winograd convolution result of the input data to the master Memory unit.
  • Clause A14 The arithmetic device according to any one of clauses A7-A12, wherein the slave function unit is also used to perform post-processing on the winograd convolution result of the input data, and the post-processing includes bitwise rounding operations and Revolution operation.
  • Clause A15 An electronic device including the artificial intelligence chip as described in Clause A14.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Neurology (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Image Processing (AREA)
  • Complex Calculations (AREA)

Abstract

La présente invention concerne un appareil d'exploitation et un produit associé. Le produit comprend un module de commande. Le module de commande comprend : une unité de mise en mémoire cache d'instruction, une unité de traitement d'instruction et une unité de stockage de file d'attente, l'unité de mise en mémoire cache d'instruction étant utilisée pour stocker une instruction de calcul associée à une opération de réseau neuronal artificiel ; l'unité de traitement d'instruction est utilisée pour analyser l'instruction de calcul pour obtenir de multiples instructions d'opération ; et l'unité de stockage de file d'attente est utilisée pour stocker une file d'attente d'instructions, la file d'attente d'instructions comprenant de multiples instructions d'opération ou instructions de calcul à exécuter selon une séquence de la file d'attente. La présente invention peut améliorer l'efficacité de fonctionnement d'un produit associé lorsqu'une opération d'un modèle de réseau neuronal est effectuée.
PCT/CN2020/114057 2019-11-01 2020-09-08 Appareil d'exploitation et produit associé WO2021082747A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911060683.9A CN112766473B (zh) 2019-11-01 2019-11-01 运算装置及相关产品
CN201911060683.9 2019-11-01

Publications (1)

Publication Number Publication Date
WO2021082747A1 true WO2021082747A1 (fr) 2021-05-06

Family

ID=75692128

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/114057 WO2021082747A1 (fr) 2019-11-01 2020-09-08 Appareil d'exploitation et produit associé

Country Status (2)

Country Link
CN (1) CN112766473B (fr)
WO (1) WO2021082747A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117852600A (zh) * 2024-03-06 2024-04-09 北京壁仞科技开发有限公司 人工智能芯片及其操作方法和机器可读存储介质

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107168928A (zh) * 2017-05-03 2017-09-15 荣成市鼎通电子信息科技有限公司 无需重新排序的八点Winograd傅里叶变换器
CN108229654A (zh) * 2016-12-14 2018-06-29 上海寒武纪信息科技有限公司 神经网络卷积运算装置及方法
CN108229656A (zh) * 2016-12-14 2018-06-29 上海寒武纪信息科技有限公司 神经网络运算装置及方法
CN109117187A (zh) * 2018-08-27 2019-01-01 郑州云海信息技术有限公司 卷积神经网络加速方法及相关设备
US20190042923A1 (en) * 2017-08-07 2019-02-07 Intel Corporation System and method for an optimized winograd convolution accelerator
CN110096309A (zh) * 2018-11-14 2019-08-06 上海寒武纪信息科技有限公司 运算方法、装置、计算机设备和存储介质
CN110147249A (zh) * 2018-02-12 2019-08-20 上海寒武纪信息科技有限公司 一种网络模型的计算方法及装置
CN110163349A (zh) * 2018-02-12 2019-08-23 上海寒武纪信息科技有限公司 一种网络模型的计算方法及装置
CN110288086A (zh) * 2019-06-13 2019-09-27 天津大学 一种基于Winograd的可配置卷积阵列加速器结构

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325591B (zh) * 2018-09-26 2020-12-29 中国科学院计算技术研究所 面向Winograd卷积的神经网络处理器

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229654A (zh) * 2016-12-14 2018-06-29 上海寒武纪信息科技有限公司 神经网络卷积运算装置及方法
CN108229656A (zh) * 2016-12-14 2018-06-29 上海寒武纪信息科技有限公司 神经网络运算装置及方法
CN107168928A (zh) * 2017-05-03 2017-09-15 荣成市鼎通电子信息科技有限公司 无需重新排序的八点Winograd傅里叶变换器
US20190042923A1 (en) * 2017-08-07 2019-02-07 Intel Corporation System and method for an optimized winograd convolution accelerator
CN110147249A (zh) * 2018-02-12 2019-08-20 上海寒武纪信息科技有限公司 一种网络模型的计算方法及装置
CN110163349A (zh) * 2018-02-12 2019-08-23 上海寒武纪信息科技有限公司 一种网络模型的计算方法及装置
CN109117187A (zh) * 2018-08-27 2019-01-01 郑州云海信息技术有限公司 卷积神经网络加速方法及相关设备
CN110096309A (zh) * 2018-11-14 2019-08-06 上海寒武纪信息科技有限公司 运算方法、装置、计算机设备和存储介质
CN110288086A (zh) * 2019-06-13 2019-09-27 天津大学 一种基于Winograd的可配置卷积阵列加速器结构

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117852600A (zh) * 2024-03-06 2024-04-09 北京壁仞科技开发有限公司 人工智能芯片及其操作方法和机器可读存储介质

Also Published As

Publication number Publication date
CN112766473B (zh) 2023-12-05
CN112766473A (zh) 2021-05-07

Similar Documents

Publication Publication Date Title
CN109543832B (zh) 一种计算装置及板卡
TWI795519B (zh) 計算裝置、機器學習運算裝置、組合處理裝置、神經網絡芯片、電子設備、板卡及執行機器學習計算的方法
US20200097792A1 (en) Processing apparatus and processing method
WO2019148781A1 (fr) Module d'opération et procédé associé
CN109685201B (zh) 运算方法、装置及相关产品
CN110163357B (zh) 一种计算装置及方法
CN111047022A (zh) 一种计算装置及相关产品
CN110059797B (zh) 一种计算装置及相关产品
WO2021036362A1 (fr) Procédé et appareil de traitement de données et produit associé
WO2021083101A1 (fr) Procédé et appareil de traitement de données, et produit connexe
WO2021082725A1 (fr) Procédé d'opération de convolution winograd et produit associé
WO2021082747A1 (fr) Appareil d'exploitation et produit associé
WO2021082746A1 (fr) Appareil d'exploitation et produit associé
WO2021114903A1 (fr) Procédé et appareil de traitement de données, dispositif informatique et support d'enregistrement
CN109740730B (zh) 运算方法、装置及相关产品
CN109711538B (zh) 运算方法、装置及相关产品
WO2021082723A1 (fr) Appareil d'execution
WO2022001500A1 (fr) Appareil informatique, puce de circuit intégré, carte de circuit imprimé, dispositif électronique et procédé de calcul
CN111382852B (zh) 数据处理装置、方法、芯片及电子设备
WO2021082724A1 (fr) Procédé d'opération et produit associé
CN112784206A (zh) winograd卷积运算方法、装置、设备及存储介质
WO2021082722A1 (fr) Dispositif et procédé de calcul, et produit associé
WO2022134688A1 (fr) Circuit de traitement de données, procédé de traitement de données et produits associés
WO2021223644A1 (fr) Procédé et dispositif de traitement de données, et produit associé
JP7368512B2 (ja) 計算装置、集積回路チップ、ボードカード、電子デバイスおよび計算方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20880916

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20880916

Country of ref document: EP

Kind code of ref document: A1