CN109754062A - The execution method and Related product of convolution extended instruction - Google Patents
The execution method and Related product of convolution extended instruction Download PDFInfo
- Publication number
- CN109754062A CN109754062A CN201711086019.2A CN201711086019A CN109754062A CN 109754062 A CN109754062 A CN 109754062A CN 201711086019 A CN201711086019 A CN 201711086019A CN 109754062 A CN109754062 A CN 109754062A
- Authority
- CN
- China
- Prior art keywords
- activation
- convolution
- instruction
- subdomain
- address
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 230000004913 activation Effects 0.000 claims description 130
- 238000013527 convolutional neural network Methods 0.000 claims description 48
- 238000003860 storage Methods 0.000 claims description 35
- 238000004422 calculation algorithm Methods 0.000 claims description 27
- 238000004590 computer program Methods 0.000 claims description 15
- 230000001537 neural effect Effects 0.000 claims description 5
- 239000000284 extract Substances 0.000 claims description 3
- 238000013497 data interchange Methods 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 abstract description 13
- 230000008901 benefit Effects 0.000 abstract description 5
- 238000010586 diagram Methods 0.000 description 12
- 210000004027 cell Anatomy 0.000 description 9
- 230000006870 function Effects 0.000 description 9
- 239000013598 vector Substances 0.000 description 9
- 230000008569 process Effects 0.000 description 8
- 230000006399 behavior Effects 0.000 description 7
- 210000002569 neuron Anatomy 0.000 description 5
- 230000017105 transposition Effects 0.000 description 5
- 238000004806 packaging method and process Methods 0.000 description 4
- 230000009466 transformation Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 230000003213 activating effect Effects 0.000 description 2
- 230000021615 conjugation Effects 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 239000000243 solution Substances 0.000 description 2
- 230000001052 transient effect Effects 0.000 description 2
- 238000011144 upstream manufacturing Methods 0.000 description 2
- 238000005481 NMR spectroscopy Methods 0.000 description 1
- 238000004378 air conditioning Methods 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000005520 cutting process Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000010790 dilution Methods 0.000 description 1
- 239000012895 dilution Substances 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 238000002604 ultrasonography Methods 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
Abstract
Present disclosure provides the implementation method and Related product of a kind of convolution extended instruction, comprising: computing device reads input data, convolution kernel and the auxiliary operation that the convolution extended instruction obtains the convolution extended instruction from memory;The convolution extended instruction includes: operation code and operation domain, and the operation domain includes: register and auxiliary domain, and the register is used to determine the address of input data and the address of convolution kernel, and the auxiliary domain is for identifying auxiliary operation;Computing device executes convolution operation and auxiliary operation to the address of the input data and the address of convolution kernel.The technical solution that present disclosure provides has the advantages of reducing calculation amount, reducing power consumption.
Description
Technical field
Present disclosure is related to nerual network technique field, and in particular to a kind of implementation method and correlation of convolution extended instruction
Product.
Background technique
Convolutional neural networks are a kind of efficient identification calculations for being widely used in the fields such as pattern-recognition, image procossing in recent years
Method, it has the characteristics that structure is simple, training parameter is few and adaptable, translation, rotation, scaling.Due to the feature of CNN/DNN
Detection layers are learnt by training data, so the feature extraction of display is avoided when using CNN/DNN, and implicitly
Learnt from training data;Furthermore since the neuron weight on same Feature Mapping face is identical, so network can be simultaneously
Row study, this is also convolutional network is connected with each other a big advantage of network relative to neuron.
In the application of existing computer field, application relevant to convolution algorithm is very universal.The present invention is absorbed in volume
Product neural network, the mainstream device that can execute such operation at present are as follows:
In the prior art, a kind of known arrangement carrying out convolutional neural networks operation is using general processor, the party
Method executes universal command by general-purpose register and general utility functions component, thereby executing convolutional neural networks operation.However,
The disadvantages of this method first is that single general processor is chiefly used in Scalar operation, the operation when carrying out convolutional neural networks operation
It can be lower.And when being executed parallel using multiple general processors, the mutual communication between general processor and being likely to become property
It can bottleneck.
It is another in the prior art, use graphics processor (GPU) Lai Jinhang vector to calculate, wherein by using logical
General SIMD instruction is executed with register file and general stream processing unit to carry out convolutional neural networks operation.However, above-mentioned side
In case, GPU on piece caching is too small, needs constantly to carry out data outside piece when carrying out extensive convolutional neural networks operation and carries,
The outer bandwidth of piece becomes main performance bottleneck.
Disclosure content
Present disclosure embodiment provides a kind of implementation method of convolution extended instruction and convolution extended instruction and related produces
Product are, it can be achieved that improving performance bottleneck, the advantages of reducing power consumption.
In a first aspect, present disclosure embodiment provides a kind of execution method of convolution extended instruction, the method includes as follows
Step:
Computing device reads the convolution extended instruction from memory and obtains the input data of the convolution extended instruction, volume
Product core and activation operation;
The convolution extended instruction includes: operation code and operation domain, and the operation code includes: the convolution extended instruction
Mark;The operation domain include: convolution subdomain and activation subdomain, the convolution subdomain include: store input data address and
The address of convolution kernel, the activation subdomain include: the identification code of the activation operation or the interpolation table address of the activation operation;
Computing device executes convolution algorithm to the input data and convolution kernel and obtains intermediate result, passes through activation
Domain executes activation operation to the intermediate result and obtains the final result of described instruction.
Optionally, the activation operation includes: convolutional neural networks Maxout operation, convolutional neural networks PReLU behaviour
Make, convolutional neural networks RReLU operation, convolutional neural networks Leaky ReLU operation, nonlinear activation operates or linear activation
Operation operation.
Optionally, described to pass through the activation subdomain pair such as the interpolation table address that the activation subdomain includes: activation operation
The intermediate result executes activation operation and obtains the final result of described instruction, comprising:
Computing device extract it is described activation operation the corresponding interpolation table of interpolation table address, by the intermediate result with it is described
Interpolation table executes activation operation and obtains the final result of described instruction.
Optionally, as it is described activation subdomain include: activation operation identification code, it is described by the activation subdomain to described
Intermediate result executes activation operation and obtains the final result of described instruction, comprising:
Computing device identifies that the identification code of the activation operation determines the activation operation, reads inserting for the activation operation
It is worth table, the interpolation table and the intermediate result is executed into activation operation and obtain the final result of described instruction.
Optionally, the computing device executes convolution algorithm to the input data and convolution kernel and obtains intermediate result, wraps
It includes:
The input data is split into multiple portions and obtains multiple input subdatas by the main computing module of computing device, will
Multiple input subdatas be distributed to it is multiple from computing module, convolution kernel is sent to it is multiple from computing module, it is the multiple from fortune
Calculate modular concurrent execute input subdata and convolution kernel multiplying obtain it is multiple son as a result, computing device main computing module
Splice the multiple sub- result to obtain the intermediate result.
Second aspect, provides a kind of computing device, the computing device include: memory, arithmetic element, interconnection module,
Arithmetic element, controller unit and data access unit;
Wherein, the arithmetic element, comprising: adder calculator, multiplicative operator;
Controller unit, for reading the input that the convolution extended instruction obtains the convolution extended instruction from memory
Data, convolution kernel and activation operation;
The convolution extended instruction includes: operation code and operation domain, and the operation code includes: the convolution extended instruction
Mark;The operation domain include: convolution subdomain and activation subdomain, the convolution subdomain include: store input data address and
The address of convolution kernel, the activation subdomain include: the identification code of the activation operation or the interpolation table address of the activation operation;
Data access unit, for obtain the input data address and convolution kernel the corresponding input data in address with
And convolution kernel;
The arithmetic element obtains intermediate result for executing convolution algorithm to the input data and convolution kernel, passes through
The activation subdomain executes activation operation to the intermediate result and obtains the final result of described instruction.
Optionally, the activation operation includes: convolutional neural networks Maxout operation, convolutional neural networks PReLU behaviour
Make, convolutional neural networks RReLU operation, convolutional neural networks Leaky ReLU operation, nonlinear activation operates or linear activation
Operation operation.
Optionally, if the activation subdomain includes: the interpolation table address that activation operates;
The data access unit, for extracting the corresponding interpolation table of interpolation table address of the activation operation;
The arithmetic element obtains described instruction for the intermediate result and the interpolation table to be executed activation operation
Final result.
Optionally, if the activation subdomain includes: the identification code that activation operates;The arithmetic element further include: activation fortune
Calculate device;
The controller unit, the identification code of the activation operation determines the activation operation for identification;
The activation arithmetic unit, for taking the interpolation table of the activation operation, by the interpolation table and the intermediate result
It executes activation operation and obtains the final result of described instruction.
Optionally, the arithmetic element further include: main computing module and multiple from computing module, the main computing module packet
Include: adder calculator and multiplicative operator, it is described from computing module include: adder calculator and multiplicative operator;
The main computing module obtains multiple input subdatas for the input data to be split into multiple portions, will
Multiple input subdatas be distributed to it is multiple from computing module, convolution kernel is sent to it is multiple from computing module, it is the multiple from fortune
Module is calculated, the multiplying for executing input subdata and convolution kernel parallel obtains multiple sons as a result, the main computing module,
For splicing the multiple sub- result to obtain the intermediate result.
The third aspect provides a kind of computer readable storage medium, which is characterized in that it, which is stored, is used for electronic data interchange
Computer program, wherein the computer program make computer execute first aspect provide method.
Fourth aspect, provides a kind of computer program product, and the computer program product includes storing computer journey
The non-transient computer readable storage medium of sequence, the computer program are operable to execute computer described in first aspect
Method.
5th aspect, provides a kind of chip, and the chip includes the computing device that second aspect provides.
6th aspect, provides a kind of chip-packaging structure, and the chip-packaging structure includes the chip that the 5th aspect provides.
7th aspect, provides a kind of board, and the board includes the chip-packaging structure that the 6th aspect provides.
Eighth aspect, provides a kind of electronic device, and the electronic device includes a kind of board that the 7th aspect provides.
As can be seen that realizing convolution algorithm by present disclosure embodiment with single instruction and activating the excellent of operation
Point, so it, which has, reduces the advantages of calculating the time, saving power consumption.
Detailed description of the invention
In order to illustrate more clearly of the technical solution in present disclosure embodiment, will make below to required in embodiment description
Attached drawing is briefly described, it should be apparent that, the accompanying drawings in the following description is some embodiments of present disclosure, for ability
For the those of ordinary skill of domain, without creative efforts, it can also be obtained according to these attached drawings other attached
Figure.
Fig. 1 is a kind of structural schematic diagram for calculating equipment that present disclosure provides.
Fig. 2 is the schematic block diagram for the interconnection module that present disclosure embodiment provides.
Fig. 2 a is main operation mould in the device for executing convolutional neural networks forward operation of present disclosure embodiment offer
The schematic block diagram of block.
Fig. 2 b is in the device for executing convolutional neural networks forward operation of present disclosure embodiment offer from operation mould
The schematic block diagram of block.
Fig. 3 is the flow chart that the convolutional neural networks arithmetic unit that present disclosure embodiment provides executes convolution transform instruction.
Fig. 3 a is a kind of schematic diagram for convolution kernel that present disclosure embodiment provides.
Fig. 3 b is a kind of schematic diagram for input data that present disclosure embodiment provides.
Fig. 3 c is a kind of schematic diagram of the movement for convolution kernel that present disclosure embodiment provides.
Fig. 3 d is the schematic diagram of the movement for another convolution kernel that present disclosure embodiment provides.
Fig. 3 e is the schematic diagram of the movement for another convolution kernel that present disclosure embodiment provides.
Specific embodiment
Below in conjunction with the attached drawing in present disclosure embodiment, the technical solution in present disclosure embodiment is carried out clear, complete
Site preparation description, it is clear that described embodiment is present disclosure a part of the embodiment, instead of all the embodiments.Based on originally draping over one's shoulders
Embodiment in dew, every other implementation obtained by those of ordinary skill in the art without making creative efforts
Example belongs to the range of present disclosure protection.
The specification and claims of present disclosure and term " first ", " second ", " third " and " in the attached drawing
Four " etc. are not use to describe a particular order for distinguishing different objects.In addition, term " includes " and " having " and it
Any deformation, it is intended that cover and non-exclusive include.Such as it contains the process, method of a series of steps or units, be
System, product or equipment are not limited to listed step or unit, but optionally further comprising the step of not listing or list
Member, or optionally further comprising other step or units intrinsic for these process, methods, product or equipment.
Referenced herein " embodiment " is it is meant that a particular feature, structure, or characteristic described can wrap in conjunction with the embodiments
It is contained at least one embodiment of present disclosure.Each position in the description occur the phrase might not each mean it is identical
Embodiment, nor the independent or alternative embodiment with other embodiments mutual exclusion.Those skilled in the art explicitly and
Implicitly understand, embodiment described herein can be combined with other embodiments.
Illustrate the method for convolution ordering calculation by taking convolution algorithm instructs as an example below, convolution instruction can be applied in mind
Through certainly in practical applications, also can be applied in other calculating scenes, present disclosure is not intended to limit above-mentioned convolution in network
Concrete implementation scene is instructed, convolution algorithm instruction is referred to as convolutional neural networks.For convolution instruction, in fact
The formula that border needs to be implemented can be s=s (∑ wxi+ b) wherein, i.e., by convolution kernel w (may include multiple data) multiplied by input
Data xi,
It sums, then primary Calculation result h can be obtained plus biasing b according to actual calculate, then preliminary meter
Activation operation s (h) can also be done by calculating result, to obtain final output result S.Calculating topology can be obtained according to the formula
Structure is multiplicative operator-adder calculator-activation arithmetic unit.
Existing convolution is instructed, if necessary to execute activation operation, needs to execute by multiple instruction, it is above
For stating formula, firstly, it needs to instruct by convolution algorithm to obtain primary Calculation result h, then refer to by convolution activation
It enables and activation operation is executed to the h, that is, need the instruction of at least two convolution to obtain the result S of above-mentioned formula, such mode is first
Multiple quantity are needed for the quantity of convolution instruction, in addition, it is called due to needing to repeat for chip or computing device
Data, so it needs more computing costs, and power consumption is also higher.
The present disclosure provides a kind of computing device, the computing device is as shown in Figure 1, comprising: storage medium 111, register
Unit 112, interconnection module 113, arithmetic element 114, controller unit 115 and data access unit 116;
Wherein, arithmetic element 114 may include: multiplier and adder calculator, and certain arithmetic element can be with
It include: at least one of comparator, activation arithmetic unit, OP converter.
Interconnection module 113, the connection relationship for controlling calculator in arithmetic element 114 make at least two kinds of calculator groups
At different calculating topological structures.
Register cell 112 refers to for storing operational order, input data, convolution kernel in the address of storage medium, convolution
Enable corresponding calculating topological structure.
Storage medium 111 can be chip external memory, certainly in practical applications, or on-chip memory is used for
Input data, convolution kernel are stored, the input data, convolution kernel are specifically as follows vector, matrix or multidimensional data.
Controller unit 115, for extracted out of register cell 112 operational order (be specifically as follows convolution instruction),
The corresponding operation domain of the operational order and the operational order corresponding first calculate topological structure, which is decoded into
It executes instruction, this, which is executed instruction, executes arithmetic operation for controlling arithmetic element, which is transmitted to data access unit
116, by the calculating topology transmission to interconnection module 113.
Data access unit 116, for extracting the corresponding input data of the operation domain, convolution kernel from storage medium 111,
And the input data, convolution kernel are transmitted to arithmetic element 114.
Interconnection module 113 calculates topology for forming first according to the connection relationship of calculator in control arithmetic element 114
Structure.
Arithmetic element 114, for by first calculating topological structure and this execute instruction call calculator to the data block
It executes arithmetic operation and obtains operation result, which is transmitted to data access unit and is stored in storage medium.
The operational order can be as shown in Figure 1, comprising: operation domain and operation code, by taking convolution algorithm instructs as an example, and operation
Domain may include: convolution subdomain and activate subdomain, as shown in table 1, wherein (optional, register is also possible to post register number
Storage heap) 0, register number (optional, register file) 1, register number (optional, register file) 2, register number it is (optional
, register file) 3, register number 4, register number 0, register number 1, register number 2, register number 3 can be convolution
Domain, specifically, register number 4 can be activation subdomain.
Table 1:
When for activation primitive interpolation table address, for computing device, the setting of activation calculator can be saved, and
Decoder parsing expense can also be saved for the setting of activation primitive interpolation table address, calculation amount is reduced, save chip power-consumption
And area.It include activation primitive interpolation table address the following detailed description of its concrete implementation mode, such as CONV_ACTIVATE, then
CONV_ACTIVATE instructs the result (i.e. intermediate result) that convolution operation is obtained after having executed convolution operation, then extracts
The corresponding interpolation table of activation primitive interpolation table address executes activation operation to the result of the convolution operation and directly obtains result.This
Mode only needs to read a CONV_ACTIVATE instruction, and executes and execute without individual activation calculator, so
The advantages of it is small with expense is analyzed the instruction, and reduces calculation amount, saves hardware configuration, if CONV_ACTIVATE includes activation
Function operation code, then CONV_ACTIVATE instruction obtains swashing as a result, parsing this for convolution operation after having executed convolution operation
Function operation code living obtains corresponding activation primitive, and the activation primitive is then sent to activation calculator, activation calculator according to
Interpolation table is extracted according to the activation primitive, activation operation is executed to the result of the convolution operation, need to parse repeatedly instruction, in addition
It needs that calculator is individually activated to execute activation operation.
The operational order can also be as shown in table 2, comprising: operation code CONV_AC_OP, register number (optional, deposit
Device heap) 0, register number (optional, register file) 1, register number (optional, register file) 2, register number (it is optional,
Register file) 3, register number (optional, register file) 4, auxiliary operation code, register number 0, register number 1, register number
2, register number 3 can be convolution subdomain, and register number 4 can be activation subdomain, and OP operation code can be OP subdomain, specifically
, as shown in table 2.
Table 2:
Above-mentioned operational order may include convolution instruction set, which includes the convolutional neural networks of different function
CONV instruction, CONV_ACTIVATE instruction, CONV_OP and CONFIG instruction, I/O instruction, NOP instruction, JUMP instruct,
MOVE instruction.
It includes calculating operation and calculator connection relationship that auxiliary operation code as shown in Table 1 and Table 2, which is specifically as follows,.
By taking OP is operated as an example, for there are many OP operations, it is assumed that 1 indicates transposition, and 0 indicates conjugation, it is assumed that the auxiliary operation code can be
4bit, in practical applications or other amount of bits, such as 6bit, 8bit etc., for the auxiliary of CONV_OP
Operation code, if its be 1111, then can be expressed as, transposition operation, need to be implemented transposition operation may include: input data,
Convolution kernel, primary Calculation result, it is assumed here that 1111 the 2nd bit indicates whether input data executes OP operation, the 3rd bit
Indicate whether convolution kernel executes OP operation, the 4th bit indicates whether primary Calculation result executes OP operation, it is assumed that 1 can be to hold
Row OP operation, it is assumed that 0 can operate not execute OP.Certainly it is also possible to other operations in practical applications.
In one embodiment, CONV_ACTIVATE, which is instructed, includes:
Convolution activation instruction, according to the instruction, device is respectively from the finger of memory (preferred, to be scratchpad)
Determine address and take out the input data being sized and convolution kernel, convolution operation is done in convolution algorithm component, then ties output
Fruit is cooked activation primitive operation;Above-mentioned be sized can be by producer or user's self-defining.
Convolution activation instruction can specifically include:
Convolutional neural networks Maxout instruction, can specifically include: device is (preferably, temporary for high speed from memory respectively
Deposit memory) specified address take out the input data that is sized and convolution kernel, do convolution operation in convolution algorithm component,
Then output result is done into Maxout activation;Above-mentioned be sized can be by producer or user's self-defining.For convolutional Neural
The specific manifestation form of network Maxout instruction can be, in the register number 4 of the operation domain of CONV_ACTIVATE instruction
Add the interpolation table or Maxout operation code of Maxout.
For Maxout, mathematic(al) representation can be with are as follows:
whereZij=xTWij+bij
Wherein, hiIndicate the output of Maxout as a result, WijIndicate convolution kernel, bijIndicate biasing, XTIndicate input data
Transposition.
Convolutional neural networks PReLU instruction, for doing PReLU activation according to output result of the instruction to computing device,
Device takes out the input data being sized and convolution kernel from the specified address of scratchpad respectively, in convolution algorithm portion
Convolution operation is done in part, and output result is then done into PReLU activation;For the specific manifestation of convolutional neural networks PReLU instruction
Form can be that interpolation table or the PReLU behaviour of PReLU are added in the register number 4 of the operation domain of CONV_ACTIVATE instruction
Make code.
Convolutional neural networks RReLU instruction, for doing RReLU activation according to output result of the instruction to computing device,
Device takes out the input data being sized and convolution kernel from the specified address of scratchpad respectively, in convolution algorithm portion
Convolution operation is done in part, and output result is then done into RReLU activation;For the specific manifestation of convolutional neural networks RReLU instruction
Form can be that interpolation table or the RReLU behaviour of RReLU are added in the register number 4 of the operation domain of CONV_ACTIVATE instruction
Make code.
Convolutional neural networks Leaky ReLU instruction, for being Leaky according to output result of the instruction to computing device
ReLU activation, device take out the input data being sized and convolution kernel from the specified address of scratchpad respectively,
Convolution operation is done in convolution algorithm component, and output result is then done into Leaky ReLU activation;For convolutional neural networks
The specific manifestation form of Leaky ReLU instruction can be, in the register number 4 of the operation domain of CONV_ACTIVATE instruction
Add the interpolation table or Leaky ReLU operation code of RReLU.
For ReLU, mathematic(al) representation are as follows: f (X)=max (0, x);
Leaky ReLU, RReLU, PReLU, mathematic(al) representation can be with are as follows:
F (X)=α x (x < 0), f (X)=x (x >=0);
For above-mentioned mathematic(al) representation, Leaky ReLU, RReLU or PReLU are corresponded to for different α values, as α > 0
When, it is PReLU;It is Leaky ReLU as α < 0, is RReLU when α is the random number of Gaussian Profile.
CONV_ACTIVATE instruction also may include other operational orders, carry out nonlinear activation or linear activation behaviour
Make.
In one embodiment, CONV_OP, which is instructed, includes:
Convolution transform instruction, according to the instruction, device is respectively from the finger of memory (preferred, to be scratchpad)
Determine address and take out the input data that is sized and convolution kernel, in OP (conjugation or transposition) arithmetic unit to input data and/
Or convolution kernel does map function, and convolution operation is then done in convolution algorithm component, then converts output result;It is above-mentioned to set
Determining size, OP type can be by producer or user's self-defining.
Convolution transform instruction specifically includes:
Convolutional neural networks Reshape instruction, for being Reshape according to output result of the instruction to computing device
Operation, device take out the input number being sized from the specified address of memory (preferred, to be scratchpad) respectively
According to and convolution kernel, reshape (dimension reform, such as nchw- > chwn) operation is done in OP arithmetic unit, is then transported in convolution
It calculates in component and does convolution operation, output result is then done into reshape operation;Above-mentioned be sized can be by producer or user
Self-defining.
So-called dimension is reformed, and the four dimensions of the input data and convolution kernel that refer to convolution algorithm are reformed.
M convolution kernel shown in Fig. 3 a, each convolution kernel is the three-dimensional data block of 5*3*3, then its operation window is also
The three-dimensional data block of 5*3*3, in M convolution kernel as shown in Figure 3a KH and KW indicate that the corresponding dimension of its KH is
The H dimension of input data, the corresponding dimension which indicates are the W dimension of input data.Grey parts side in Fig. 3 c, 3d, 3e
Block is that sliding operation window carries out the data that operation uses each time, after the direction of sliding can be using H as glide direction
Taking W as glide direction or is being after glide direction is completed using W using H as glide direction.Specifically, it is for convolution,
Operation at each sliding window is that the data block that grey parts square indicates in figure and " Fig. 3 a convolution 1- convolution kernel " indicate
M convolution kernel data block carry out inner product operation respectively, convolution will correspond to each convolution kernel to each sliding window position
A numerical value is exported, i.e., there is M output numerical value for each sliding window;A side is used in " Fig. 3 a- Fig. 3 e "
Block indicates a numerical value, is referred to as a weight;Number used in schematic diagram only limits for example, actual conditions
Middle dimension data may be that any number (includes the case where that some dimension is 1, in this case, the 4 D data block is automatic
As three-dimensional data block, for example, input data is exactly a three-dimensional data in the case that the sample size that ought be calculated simultaneously is 1
Block;For example, convolution sum data are a three-dimensional data block in the case that convolution nuclear volume is 1).It is filled using the chip
Set the convolution algorithm carried out between input data B and convolution kernel A;
For a convolutional layer, weight (all convolution kernels) such as shown in " Fig. 3 a convolution 1- convolution kernel ", remembers its convolution
The quantity of core is M, and each convolution kernel is made of the matrix that C KH row KW is arranged, so the weight of convolutional layer can be expressed as one
Four dimensions are M, C, KH, the 4 D data block of KW respectively;The input data of convolutional layer is 4 D data block, by N number of three dimension
It is formed according to block, each three-dimensional data block is made of that (i.e. four dimensions are N, C, H, W respectively the eigenmatrix that C H row W is arranged
Data block);
Convolutional neural networks Pad instruction, for doing Pad operation, device according to output result of the instruction to computing device
The input data and convolution being sized are taken out from the specified address of memory (preferred, to be scratchpad) respectively
Core does pad (periphery expands) operation to convolution kernel in OP arithmetic unit, then does convolution operation in convolution algorithm component;
Above-mentioned be sized can be by producer or user's self-defining.It can for the specific manifestation form of convolutional neural networks Pad instruction
Think, Pad operation code is added in the auxiliary operation code for the operation domain that CONV_OP or CONV_AC_OP is instructed.
Periphery amplification refers to that having added N for the periphery of convolution kernel encloses data, and N is positive integer.At this moment N can be 1.
It is constant to wait this instruction format.Circle means the two-dimensional blocks of data of original kh*kw size, expands as (kh+ by peripheral complement
2N)*(kw+2N).
N increases an operation domain (register 5) if it is greater than 1 or instruction format and exists to store the numerical value of this n
The operation domain of CONV_OP increases a register 5, which is used to store the numerical value of n.If instruction format is constant, execute
The method of instruction changes, and before executing CONV instruction, the numerical value of n is called using config instruction, is executing CONV instruction
Pad operation is executed before.
In addition data can be all 0, this is most basic pad operation.
Optionally, data can be 0 and 1 random distribution.In this case, operation code must be changed to conv-pad-
random.Method much step are as follows: generate the number that pad needs to fill using random number generator, altogether (kh+2N) * (kw+2N)-
Kh*hw data
Convolutional neural networks Crop instruction is filled for doing Crop operation according to output result of the instruction to computing device
It sets and takes out the input data and convolution being sized from the specified address of memory (preferred, to be scratchpad) respectively
Core is crop (size cutting) to input in OP arithmetic unit, convolution operation is then done in convolution algorithm component, above-mentioned to set
Determining size can be by producer or user's self-defining.
The definition that size is cut is to intercept out the two-dimemsional number of wherein H1*W1 size from the two-dimensional blocks of data of script H*W size
According to block, wherein H1 and W1 is less than or equal to H and W.
Convolutional neural networks Dilate instruction, for being Dilate behaviour according to output result of the instruction to computing device
Make, device takes out the input data being sized from the specified address of memory (preferred, to be scratchpad) respectively
And convolution kernel, dilate (inserting 0 in inside) operation is done to convolution kernel in OP arithmetic unit, is then done in convolution algorithm component
Convolution operation;Above-mentioned be sized can be by producer or user's self-defining.
The definition of dilate (inside insert 0) is: for the convolution kernel of kh*kw size, (above-mentioned pad inside it
It is in periphery) 0 or random number are evenly or randomly inserted, the effect to convolution kernel " dilution " is played, does so and volume can be enhanced
The feature extraction effect of product core.
CONV_OP instruction also may include other transformation directives, such as do BLAS transformation etc. to input, weight.
Above-metioned instruction collection include different function convolutional neural networks CONV_AC_OP instruction and CONFIG instruction,
I/O instruction, NOP instruction, JUMP instruction and MOVE instruction.
In one embodiment, CONV_AC_OP can realize CONV, ACTICATE by the setting of auxiliary operation code
With any combination of OP operation.
Fig. 2 diagrammatically illustrates a kind of embodiment of interconnecting modules 113: H tree module.Interconnecting modules 113 constitute main fortune
Module 5 and multiple data paths between computing module 6 are calculated, is the binary tree access being made of multiple nodes, each node
The data that two nodes in downstream return are merged, and returned by two nodes that the data of upstream are similarly issued to downstream
Back to the node of upstream.For example, starting calculation stages in convolutional neural networks, the neuron number evidence in main computing module 5 passes through
Interconnecting modules 4 are sent to each from computing module 6;After the completion of the calculating process from computing module 6, when the meter from computing module
After the completion of calculation process, each from computing module export neuron value can be combined into step by step in interconnecting modules one completely by
The vector of neuron composition.It illustrates, it is assumed that share N number of from computing module in device, then input data Xi is sent to N number of
From computing module, each from computing module by input data Xi with should do convolution algorithm from the corresponding convolution kernel of computing module, obtain
To a scalar data, respectively from the scalar data of computing module be interconnected module 4 be merged into the centre containing N number of element to
Amount.Assuming that convolution window traverses in total and obtains A*B (X-direction is A, and Y-direction is B, and X, Y are three-dimensional orthogonal coordinate system
Reference axis) input data Xi, then above-mentioned convolution operation is executed to A*B Xi, obtained institute's directed quantity closes in main computing module
And obtain the three-dimensional intermediate result of A*B*N.
Fig. 2 a shows main fortune in the device for executing convolutional neural networks forward operation according to present disclosure embodiment
Calculate the example block diagram of the structure of module 5.As shown in Figure 2 b, main computing module 5 includes the first arithmetic element 51, the first data dependence
Relationship judging unit 52 and the first storage unit 53.
Wherein, the first arithmetic element 51 includes vectorial addition unit 511 and activation unit 512.First arithmetic element 51
The control signal from controller unit is received, the various calculation functions of main computing module 5 are completed, vectorial addition unit 511 is used
In realizing in the calculating of convolutional neural networks forward direction plus bias operation, which aligns phase for biased data and the intermediate result
Add and is biased as a result, 512 pairs of biasing results of arithmetic element is activated to execute activation primitive operation.The biased data can be from
What external address space was read in, it is also possible to be stored in local.
First data dependence relation judging unit 52 is the port that the first arithmetic element 51 reads and writes the first storage unit 53, is protected
Demonstrate,prove the read-write consistency of data in the first storage unit 53.Meanwhile first data dependence relation judging unit 52 be also responsible for will be from
The data that first storage unit 53 is read are sent to from computing module by interconnecting modules 4, and from the output data of computing module 6
The first arithmetic element 51 is transmitted directly to by interconnecting modules 4.The instruction that controller unit 2 exports is sent to 51 He of computing unit
First data dependence relation judging unit 52, to control its behavior.
Storage unit 53 is for caching input data and output data that main computing module 5 is used in calculating process.
Fig. 2 b is shown in the device for executing convolutional neural networks forward operation according to present disclosure embodiment from fortune
Calculate the example block diagram of the structure of module 6.As shown in Figure 2 A, each include the second arithmetic element 61, data dependence from computing module 6
Relationship judging unit 62, the second storage unit 63 and third storage unit 64.
Second arithmetic element 61 receives the control signal that controller unit 2 issues and carries out convolution algorithm.Second operation list
Member includes OP converter unit 808, and vector multiplies unit 611 and summing elements 612, and the vector being each responsible in convolution algorithm multiplies fortune
Calculation, accumulating operation and OP map function.
Second data dependence relation judging unit 62 is responsible in calculating process to the read-write operation of the second storage unit 63.The
Two data dependence relation judging units 62 can guarantee data used between instruction first before executing read-write operation there is no read
Write consistency conflict.For example, all control signals for being sent to data dependence relation unit 62 can all be stored into data dependence relation
In instruction queue inside unit 62, in the queue, if the range of the reading data of reading instruction and queue position are forward
The range that write command writes data clashes, then the instruction must can execute after relied on write command is performed.
Second storage unit 63 caches the input data and output scalar data from computing module 6.
Third storage unit 64 caches the convolution Nuclear Data needed in calculating process from computing module 6.
Fig. 3 is the flow chart that the convolutional neural networks arithmetic unit that present disclosure embodiment provides executes convolution transform instruction,
As shown in figure 3, the process for executing convolutional neural networks instruction includes, the instruction of convolutional neural networks here is with CONV_AC_OP
For, in practical applications, or other extended instructions, such as CONV_ACTIVATE or CONV_OP instruction: as being somebody's turn to do
When extended instruction is that CONV_OP is instructed, it is only necessary to execute OP operation, execute activation to biased data without executing in S9
Operation, i.e., when being that CONV_OP is instructed such as the extended instruction, which is final output result.Such as extended instruction
When instructing for CONV_ACTIVATE, which is not necessarily to OP module, converts in step S7 without execution OP.
In step S1, an I/O instruction is pre-deposited at the first address of register cell 112.
In step S2, operation starts, and controller unit 115 reads this I/O instruction from the first address of register cell 112,
According to the control signal translated, data access unit 116 reads corresponding all convolutional neural networks operations from storage medium 111
Instruction, and be buffered in register cell 112.
In step S3, controller unit 115 reads in next I/O instruction from register cell 11, is believed according to the control translated
Number, data access unit 116 read from storage medium 111 all data that main computing module 5 needs (e.g., including input number
According to, the interpolation table for making quick activation primitive operation, the constant table for configuring arithmetic unit parameter, biased data etc.)
To the first storage unit 53 of main computing module 5.
In step S4, controller unit 115 reads in next I/O instruction from register cell 11, is believed according to the control translated
Number, data access unit 116 reads the convolution Nuclear Data needed from computing module 6 from storage medium 111.
In step S5, controller unit 115 reads in next CONFIG instruction from register cell 11, according to the control translated
Signal processed, device configure the various constants of this layer of neural computing needs.For example, the first arithmetic element 51, the second operation list
For member 61 according to the value of the parameter configuration unit internal register in control signal, the parameter includes that such as activation primitive needs
Data;And every constant needed for OP operation, such as pad N, crop H1 and W1, reshape dimension order.
In step S6, controller unit 115 then reads in next CONV_AC_OP instruction, root from register cell 11
According to the control signal translated, main computing module 5 first by interconnection module 113 by the input data in convolution window issue respectively from
Computing module 6 is saved to the second storage unit 63 from computing module 6, and then according to the mobile convolution window of instruction.
In step S7, the control signal translated is instructed according to CONV_AC_OP, from the arithmetic element 61 of computing module 6 from the
Three storage units 64 read convolution kernel, read input data from the second storage unit 63, OP module is to the input data and volume
Product core does OP variation, then executes input data (OP transformation) and convolution kernel (OP transformation) from the arithmetic element 61 of computing module 6
Convolution algorithm, intermediate result is returned by interconnection module 113.
In step S8, in interconnection module 113, respectively it is combined into step by step completely from the intermediate result that computing module 6 returns
Intermediate vector.
In step S9, main computing module 5 obtains the intermediate vector of the return of interconnecting modules 4, and convolution window traverses all inputs
All return vectors are spliced into intermediate result by data, main computing module;(optional) translates according to CONV_AC_OP instruction
Signal is controlled, biased data is read from the first storage unit 53, adds unit 511 to be added by vector with intermediate result and biased
As a result, main computing module 5 reads the corresponding interpolation table of activation primitive interpolation table address in CONV_AC_OP register number 4, it will be inclined
It sets result and interpolation table does the output data for activating operation to obtain to the end, and last output data is written back to the first storage list
In member 53.
In step S10, controller unit 115 then reads in next I/O instruction from the location of instruction, according to what is translated
Signal is controlled, the output data in the first storage unit 53 is deposited to external address space and specifies address by data access unit 116,
Operation terminates.
Present disclosure embodiment also provides a kind of computer storage medium, wherein computer storage medium storage is for electricity
The computer program of subdata exchange, it is as any in recorded in above method embodiment which execute computer
A kind of some or all of the implementation method of convolution extended instruction step.
Present disclosure embodiment also provides a kind of computer program product, and the computer program product includes storing calculating
The non-transient computer readable storage medium of machine program, the computer program are operable to that computer is made to execute such as above-mentioned side
Some or all of the implementation method for any convolution extended instruction recorded in method embodiment step.
Another embodiment of the disclosure, also disclose a kind of chip comprising the neural computing dress of above-described embodiment
Set (as shown in Figure 1).
Another embodiment of the present disclosure also discloses a kind of chip-packaging structure comprising said chip.
Another embodiment of the present disclosure also discloses a kind of board comprising said chip encapsulating structure.
Another embodiment of the present disclosure also discloses a kind of electronic device comprising above-mentioned board.
Electronic device include data processing equipment, robot, computer, printer, scanner, tablet computer, intelligent terminal,
Mobile phone, automobile data recorder, navigator, sensor, camera, cloud server, camera, video camera, projector, wrist-watch, earphone,
Mobile storage, the wearable device vehicles, household electrical appliance, and/or Medical Devices.
The vehicles include aircraft, steamer and/or vehicle;The household electrical appliance include TV, air-conditioning, micro-wave oven,
Refrigerator, electric cooker, humidifier, washing machine, electric light, gas-cooker, kitchen ventilator;The Medical Devices include Nuclear Magnetic Resonance, B ultrasound instrument
And/or electrocardiograph.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of
Combination of actions, but those skilled in the art should understand that, present disclosure is not limited by the described action sequence because
According to present disclosure, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know
It knows, embodiment described in this description belongs to alternative embodiment, related actions and modules not necessarily present disclosure
It is necessary.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, there is no the portion being described in detail in some embodiment
Point, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed device, it can be by another way
It realizes.For example, the apparatus embodiments described above are merely exemplary, such as the division of the unit, it is only a kind of
Logical function partition, there may be another division manner in actual implementation, such as multiple units or components can combine or can
To be integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual
Coupling, direct-coupling or communication connection can be through some interfaces, the indirect coupling or communication connection of device or unit,
It can be electrical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, each functional unit in each embodiment of present disclosure can integrate in one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.
Present disclosure embodiment is described in detail above, specific case used herein to the principle of present disclosure and
Embodiment is expounded, the method and its core concept for present disclosure that the above embodiments are only used to help understand;
At the same time, for those skilled in the art can in specific embodiments and applications according to the thought of present disclosure
There is change place, in conclusion the content of the present specification should not be construed as the limitation to present disclosure.
Claims (13)
1. a kind of execution method of convolution extended instruction, which is characterized in that described method includes following steps:
Computing device reads the convolution extended instruction from memory and obtains the input data of the convolution extended instruction, convolution kernel
And activation operation;
The convolution extended instruction includes: operation code and operation domain, and the operation code includes: the mark of the convolution extended instruction
Know;The operation domain includes: convolution subdomain and activation subdomain, and the convolution subdomain includes: address and the volume for storing input data
The address of product core, the activation subdomain include: the identification code of the activation operation or the interpolation table address of the activation operation;
Computing device executes convolution algorithm to the input data and convolution kernel and obtains intermediate result, passes through the activation subdomain pair
The intermediate result executes activation operation and obtains the final result of described instruction.
2. the method according to claim 1, wherein
The activation operation includes: convolutional neural networks Maxout operation, convolutional neural networks PReLU operation, convolutional Neural net
Network RReLU operation, convolutional neural networks Leaky ReLU operation, nonlinear activation operation or linear activation operation operation.
3. the method according to claim 1, wherein if the activation subdomain includes: the interpolation table of activation operation
Address, described executed by the activation subdomain to the intermediate result are activated operation to obtain the final result of described instruction, are wrapped
It includes:
Computing device extracts the corresponding interpolation table of interpolation table address of the activation operation, by the intermediate result and the interpolation
Table executes activation operation and obtains the final result of described instruction.
4. the method according to claim 1, wherein as it is described activation subdomain include: activation operation identification code,
Described executed by the activation subdomain to the intermediate result activates operation to obtain the final result of described instruction, comprising:
Computing device identifies that the identification code of the activation operation determines the activation operation, reads the interpolation of the activation operation
The interpolation table and the intermediate result are executed activation operation and obtain the final result of described instruction by table.
5. the method according to claim 1, wherein the computing device holds the input data and convolution kernel
Row convolution algorithm obtains intermediate result, comprising:
The input data is split into multiple portions and obtains multiple input subdatas by the main computing module of computing device, will be multiple
Input subdata be distributed to it is multiple from computing module, convolution kernel is sent to it is multiple from computing module, it is the multiple from operation mould
Block executes input subdata parallel and the multiplying of convolution kernel obtains multiple sons as a result, the main computing module of computing device is by institute
Multiple sub- results are stated to splice to obtain the intermediate result.
6. a kind of computing device, which is characterized in that the computing device includes: memory, arithmetic element, interconnection module, operation
Unit, controller unit and data access unit;
Wherein, the arithmetic element, comprising: adder calculator, multiplicative operator;
Controller unit, for reading the input number that the convolution extended instruction obtains the convolution extended instruction from memory
According to, convolution kernel and activation operation;
The convolution extended instruction includes: operation code and operation domain, and the operation code includes: the mark of the convolution extended instruction
Know;The operation domain includes: convolution subdomain and activation subdomain, and the convolution subdomain includes: address and the volume for storing input data
The address of product core, the activation subdomain include: the identification code of the activation operation or the interpolation table address of the activation operation;
Data access unit, for obtaining the corresponding input data in address and volume of the address and convolution kernel of the input data
Product core;
The arithmetic element obtains intermediate result for executing convolution algorithm to the input data and convolution kernel, by described
Activation subdomain executes activation operation to the intermediate result and obtains the final result of described instruction.
7. computing device according to claim 6, which is characterized in that
The activation operation includes: convolutional neural networks Maxout operation, convolutional neural networks PReLU operation, convolutional Neural net
Network RReLU operation, convolutional neural networks Leaky ReLU operation, nonlinear activation operation or linear activation operation operation.
8. computing device according to claim 6, which is characterized in that if the activation subdomain includes: inserting for activation operation
It is worth table address;
The data access unit, for extracting the corresponding interpolation table of interpolation table address of the activation operation;
The arithmetic element obtains the final of described instruction for the intermediate result to be executed activation operation with the interpolation table
As a result.
9. computing device according to claim 6, which is characterized in that if the activation subdomain includes: the mark of activation operation
Know code;The arithmetic element further include: activation arithmetic unit;
The controller unit, the identification code of the activation operation determines the activation operation for identification;
The activation arithmetic unit executes the interpolation table and the intermediate result for taking the interpolation table of the activation operation
Activation operation obtains the final result of described instruction.
10. computing device according to claim 8, which is characterized in that the arithmetic element further include: main computing module and
It is multiple from computing module, the main computing module includes: adder calculator and multiplicative operator, described to include: from computing module
Adder calculator and multiplicative operator;
The main computing module obtains multiple input subdatas for the input data to be split into multiple portions, will be multiple
Input subdata be distributed to it is multiple from computing module, convolution kernel is sent to it is multiple from computing module, it is the multiple from operation mould
Block, the multiplying for executing input subdata and convolution kernel parallel obtain multiple sons as a result, the main computing module, is used for
Splice the multiple sub- result to obtain the intermediate result.
11. a kind of computer readable storage medium, which is characterized in that it stores the computer program for being used for electronic data interchange,
Wherein, the computer program makes computer execute the method according to claim 1 to 5.
12. a kind of computer program product, which is characterized in that the computer program product includes storing computer program
Non-transient computer readable storage medium, the computer program are operable to that computer is made to execute such as claim 1-5
Method described in one.
13. a kind of electronic device, which is characterized in that the electronic device includes processor, and the processor includes that right such as is wanted
Seek computing device described in 6-10 any one.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711086019.2A CN109754062A (en) | 2017-11-07 | 2017-11-07 | The execution method and Related product of convolution extended instruction |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711086019.2A CN109754062A (en) | 2017-11-07 | 2017-11-07 | The execution method and Related product of convolution extended instruction |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109754062A true CN109754062A (en) | 2019-05-14 |
Family
ID=66400175
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711086019.2A Pending CN109754062A (en) | 2017-11-07 | 2017-11-07 | The execution method and Related product of convolution extended instruction |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109754062A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110363168A (en) * | 2019-07-19 | 2019-10-22 | 山东浪潮人工智能研究院有限公司 | A kind of 3 dimensional drawing identifying system based on convolutional neural networks |
CN111047036A (en) * | 2019-12-09 | 2020-04-21 | Oppo广东移动通信有限公司 | Neural network processor, chip and electronic equipment |
CN111090524A (en) * | 2019-12-18 | 2020-05-01 | 山东浪潮人工智能研究院有限公司 | Instruction data structure suitable for edge AI (Artificial Intelligence) calculation and design method thereof |
CN111199273A (en) * | 2019-12-31 | 2020-05-26 | 深圳云天励飞技术有限公司 | Convolution calculation method, device, equipment and storage medium |
CN112257843A (en) * | 2020-09-23 | 2021-01-22 | 浙江大学 | System for expanding instruction set based on MobileNetV1 network inference task |
CN112650974A (en) * | 2020-12-30 | 2021-04-13 | 南京大学 | Efficient transposition convolution calculation method |
Citations (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102637157A (en) * | 2011-02-15 | 2012-08-15 | 郑磊 | DTSOC (digital template system on chip) |
CN102947818A (en) * | 2010-05-19 | 2013-02-27 | 加利福尼亚大学董事会 | Neural processing unit |
CN104756129A (en) * | 2012-10-01 | 2015-07-01 | Arm有限公司 | A secure mechanism to switch between different domains of operation in a data processor |
CN105468335A (en) * | 2015-11-24 | 2016-04-06 | 中国科学院计算技术研究所 | Pipeline-level operation device, data processing method and network-on-chip chip |
EP3035204A1 (en) * | 2014-12-19 | 2016-06-22 | Intel Corporation | Storage device and method for performing convolution operations |
CN106228240A (en) * | 2016-07-30 | 2016-12-14 | 复旦大学 | Degree of depth convolutional neural networks implementation method based on FPGA |
CN106355244A (en) * | 2016-08-30 | 2017-01-25 | 深圳市诺比邻科技有限公司 | CNN (convolutional neural network) construction method and system |
US9600763B1 (en) * | 2015-10-20 | 2017-03-21 | Fujitsu Limited | Information processing method, information processing device, and non-transitory recording medium for storing program |
CN106845631A (en) * | 2016-12-26 | 2017-06-13 | 上海寒武纪信息科技有限公司 | One kind stream performs method and device |
CN106990940A (en) * | 2016-01-20 | 2017-07-28 | 南京艾溪信息科技有限公司 | A kind of vector calculation device |
CN106991077A (en) * | 2016-01-20 | 2017-07-28 | 南京艾溪信息科技有限公司 | A kind of matrix computations device |
CN106991477A (en) * | 2016-01-20 | 2017-07-28 | 南京艾溪信息科技有限公司 | A kind of artificial neural network compression-encoding device and method |
CN107239824A (en) * | 2016-12-05 | 2017-10-10 | 北京深鉴智能科技有限公司 | Apparatus and method for realizing sparse convolution neutral net accelerator |
CN107292458A (en) * | 2017-08-07 | 2017-10-24 | 北京中星微电子有限公司 | A kind of Forecasting Methodology and prediction meanss applied to neural network chip |
CN107305486A (en) * | 2016-04-19 | 2017-10-31 | 北京中科寒武纪科技有限公司 | A kind of neutral net maxout layers of computing device |
CN107305538A (en) * | 2016-04-22 | 2017-10-31 | 北京中科寒武纪科技有限公司 | One Seed Matrix arithmetic unit and method |
WO2017185335A1 (en) * | 2016-04-29 | 2017-11-02 | 北京中科寒武纪科技有限公司 | Apparatus and method for executing batch normalization operation |
WO2017185418A1 (en) * | 2016-04-29 | 2017-11-02 | 北京中科寒武纪科技有限公司 | Device and method for performing neural network computation and matrix/vector computation |
CN107315717A (en) * | 2016-04-26 | 2017-11-03 | 北京中科寒武纪科技有限公司 | A kind of apparatus and method for performing vectorial arithmetic |
CN107315566A (en) * | 2016-04-26 | 2017-11-03 | 北京中科寒武纪科技有限公司 | A kind of apparatus and method for performing vector circulant shift operation |
CN107316078A (en) * | 2016-04-27 | 2017-11-03 | 北京中科寒武纪科技有限公司 | Apparatus and method for performing artificial neural network self study computing |
CN107315568A (en) * | 2016-04-26 | 2017-11-03 | 北京中科寒武纪科技有限公司 | A kind of device for being used to perform vector logic computing |
CN107315715A (en) * | 2016-04-26 | 2017-11-03 | 北京中科寒武纪科技有限公司 | A kind of apparatus and method for performing matrix plus/minus computing |
CN107315574A (en) * | 2016-04-26 | 2017-11-03 | 北京中科寒武纪科技有限公司 | A kind of apparatus and method for performing matrix multiplication |
CN107315716A (en) * | 2016-04-26 | 2017-11-03 | 北京中科寒武纪科技有限公司 | A kind of apparatus and method for performing Outer Product of Vectors computing |
CN107315718A (en) * | 2016-04-26 | 2017-11-03 | 北京中科寒武纪科技有限公司 | A kind of apparatus and method for performing inner product of vectors computing |
-
2017
- 2017-11-07 CN CN201711086019.2A patent/CN109754062A/en active Pending
Patent Citations (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102947818A (en) * | 2010-05-19 | 2013-02-27 | 加利福尼亚大学董事会 | Neural processing unit |
CN102637157A (en) * | 2011-02-15 | 2012-08-15 | 郑磊 | DTSOC (digital template system on chip) |
CN104756129A (en) * | 2012-10-01 | 2015-07-01 | Arm有限公司 | A secure mechanism to switch between different domains of operation in a data processor |
EP3035204A1 (en) * | 2014-12-19 | 2016-06-22 | Intel Corporation | Storage device and method for performing convolution operations |
US9600763B1 (en) * | 2015-10-20 | 2017-03-21 | Fujitsu Limited | Information processing method, information processing device, and non-transitory recording medium for storing program |
CN105468335A (en) * | 2015-11-24 | 2016-04-06 | 中国科学院计算技术研究所 | Pipeline-level operation device, data processing method and network-on-chip chip |
CN106991077A (en) * | 2016-01-20 | 2017-07-28 | 南京艾溪信息科技有限公司 | A kind of matrix computations device |
CN106991477A (en) * | 2016-01-20 | 2017-07-28 | 南京艾溪信息科技有限公司 | A kind of artificial neural network compression-encoding device and method |
CN106990940A (en) * | 2016-01-20 | 2017-07-28 | 南京艾溪信息科技有限公司 | A kind of vector calculation device |
CN107305486A (en) * | 2016-04-19 | 2017-10-31 | 北京中科寒武纪科技有限公司 | A kind of neutral net maxout layers of computing device |
CN107305538A (en) * | 2016-04-22 | 2017-10-31 | 北京中科寒武纪科技有限公司 | One Seed Matrix arithmetic unit and method |
CN107315718A (en) * | 2016-04-26 | 2017-11-03 | 北京中科寒武纪科技有限公司 | A kind of apparatus and method for performing inner product of vectors computing |
CN107315716A (en) * | 2016-04-26 | 2017-11-03 | 北京中科寒武纪科技有限公司 | A kind of apparatus and method for performing Outer Product of Vectors computing |
CN107315574A (en) * | 2016-04-26 | 2017-11-03 | 北京中科寒武纪科技有限公司 | A kind of apparatus and method for performing matrix multiplication |
CN107315715A (en) * | 2016-04-26 | 2017-11-03 | 北京中科寒武纪科技有限公司 | A kind of apparatus and method for performing matrix plus/minus computing |
CN107315717A (en) * | 2016-04-26 | 2017-11-03 | 北京中科寒武纪科技有限公司 | A kind of apparatus and method for performing vectorial arithmetic |
CN107315566A (en) * | 2016-04-26 | 2017-11-03 | 北京中科寒武纪科技有限公司 | A kind of apparatus and method for performing vector circulant shift operation |
CN107315568A (en) * | 2016-04-26 | 2017-11-03 | 北京中科寒武纪科技有限公司 | A kind of device for being used to perform vector logic computing |
CN107316078A (en) * | 2016-04-27 | 2017-11-03 | 北京中科寒武纪科技有限公司 | Apparatus and method for performing artificial neural network self study computing |
WO2017185335A1 (en) * | 2016-04-29 | 2017-11-02 | 北京中科寒武纪科技有限公司 | Apparatus and method for executing batch normalization operation |
WO2017185418A1 (en) * | 2016-04-29 | 2017-11-02 | 北京中科寒武纪科技有限公司 | Device and method for performing neural network computation and matrix/vector computation |
CN106228240A (en) * | 2016-07-30 | 2016-12-14 | 复旦大学 | Degree of depth convolutional neural networks implementation method based on FPGA |
CN106355244A (en) * | 2016-08-30 | 2017-01-25 | 深圳市诺比邻科技有限公司 | CNN (convolutional neural network) construction method and system |
CN107239824A (en) * | 2016-12-05 | 2017-10-10 | 北京深鉴智能科技有限公司 | Apparatus and method for realizing sparse convolution neutral net accelerator |
CN106845631A (en) * | 2016-12-26 | 2017-06-13 | 上海寒武纪信息科技有限公司 | One kind stream performs method and device |
CN107292458A (en) * | 2017-08-07 | 2017-10-24 | 北京中星微电子有限公司 | A kind of Forecasting Methodology and prediction meanss applied to neural network chip |
Non-Patent Citations (3)
Title |
---|
SHIJIN ZHANG 等: "Cambricon-X: An Accelerator for Sparse Neural Networks", 《2016 IEEE》, 31 December 2016 (2016-12-31), pages 1 - 12 * |
牛玉虎: "卷积稀疏自编码神经网络", no. 02, pages 22 - 29 * |
陈旭 等: "卷积网络深度学习算法与实例", no. 06, pages 24 - 30 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110363168A (en) * | 2019-07-19 | 2019-10-22 | 山东浪潮人工智能研究院有限公司 | A kind of 3 dimensional drawing identifying system based on convolutional neural networks |
CN111047036A (en) * | 2019-12-09 | 2020-04-21 | Oppo广东移动通信有限公司 | Neural network processor, chip and electronic equipment |
CN111047036B (en) * | 2019-12-09 | 2023-11-14 | Oppo广东移动通信有限公司 | Neural network processor, chip and electronic equipment |
CN111090524A (en) * | 2019-12-18 | 2020-05-01 | 山东浪潮人工智能研究院有限公司 | Instruction data structure suitable for edge AI (Artificial Intelligence) calculation and design method thereof |
CN111199273A (en) * | 2019-12-31 | 2020-05-26 | 深圳云天励飞技术有限公司 | Convolution calculation method, device, equipment and storage medium |
CN111199273B (en) * | 2019-12-31 | 2024-03-26 | 深圳云天励飞技术有限公司 | Convolution calculation method, device, equipment and storage medium |
CN112257843A (en) * | 2020-09-23 | 2021-01-22 | 浙江大学 | System for expanding instruction set based on MobileNetV1 network inference task |
CN112257843B (en) * | 2020-09-23 | 2022-06-28 | 浙江大学 | System for expanding instruction set based on MobileNet V1 network inference task |
CN112650974A (en) * | 2020-12-30 | 2021-04-13 | 南京大学 | Efficient transposition convolution calculation method |
CN112650974B (en) * | 2020-12-30 | 2023-10-13 | 南京大学 | Efficient transpose convolution calculation method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109754062A (en) | The execution method and Related product of convolution extended instruction | |
CN109101273B (en) | Neural network processing device and method for executing vector maximum value instruction | |
CN107329734B (en) | Apparatus and method for performing convolutional neural network forward operation | |
KR102443546B1 (en) | matrix multiplier | |
CN109117948B (en) | Method for converting picture style and related product | |
CN108229654B (en) | Neural network convolution operation device and method | |
CN112612521A (en) | Apparatus and method for performing matrix multiplication operation | |
CN107341547A (en) | A kind of apparatus and method for being used to perform convolutional neural networks training | |
CN107632965B (en) | Restructural S type arithmetic unit and operation method | |
CN109032670A (en) | Processing with Neural Network device and its method for executing vector duplicate instructions | |
CN111047022B (en) | Computing device and related product | |
CN110147249A (en) | A kind of calculation method and device of network model | |
EP3561732A1 (en) | Operation apparatus and method for artificial neural network | |
CN109726353A (en) | Convolution algorithm device and method | |
CN109993273A (en) | The convolution implementation method and Related product of convolutional neural networks | |
CN107305486B (en) | Neural network maxout layer computing device | |
CN107957977A (en) | A kind of computational methods and Related product | |
CN110059797A (en) | A kind of computing device and Related product | |
CN109389213A (en) | Storage device and method, data processing equipment and method, electronic device | |
CN109754061A (en) | The execution method and Related product of convolution extended instruction | |
CN112801276B (en) | Data processing method, processor and electronic equipment | |
CN113469326A (en) | Integrated circuit device and board card for executing pruning optimization in neural network model | |
CN110472734A (en) | A kind of computing device and Related product | |
CN113469365B (en) | Reasoning and compiling method based on neural network model and related products thereof | |
CN114692846A (en) | Data processing device, data processing method and related product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |