Background technology
Data processing is the step of most of algorithm needs to pass through or stage, after computer introduces data processing field,
More and more data processings realize have computing device carrying out the calculating of matrix data in existing algorithm by computer
Shi Sudu is slow, and efficiency is low.
Apply for content
The embodiment of the present application provides a kind of computational methods and Related product, can lift the processing speed of computing device, carry
High efficiency.
First aspect, there is provided a kind of computational methods, applied in computing device, the computing device include storage medium,
Register cell and matrix operation unit, the described method includes:
The computing device controls the matrix operation unit to obtain the first operational order, and first operational order is used for
Realize the computing between vector and matrix, the vector that first operational order includes performing needed for described instruction reads instruction,
The required vector is at least one vector, and at least one vector is the vector that length is identical or length is different;
The computing device controls the matrix operation unit to read instruction according to the vector and is sent out to the storage medium
Send reading order;
The computing device controls the matrix operation unit to be read using batch reading manner from the storage medium
The vector reads the corresponding vector of instruction, and performs first operational order to the vector.
It is described that the vector execution first operational order is included in some possible embodiments:
The computing device controls the matrix operation unit to be held using the calculation of multistage pipelining-stage to the vector
Row first operational order.
In some possible embodiments, each pipelining-stage in the multistage pipelining-stage includes at least one computing
Device,
The computing device controls the matrix operation unit to be held using the calculation of multistage pipelining-stage to the vector
Row first operational order includes:
The computing device controls the matrix operation unit to utilize first order pipelining-stage according to the selection of multiple selector
In first choice arithmetic unit to the vector be calculated first as a result, first result is input to second level stream
The second Selecting operation device in water level perform be calculated second as a result, and so on, until inputting the i-th -1 result to the
The i-th Selecting operation device in i grades of pipelining-stages, which performs, is calculated i-th of result;
I-th of result is inputted to the storage medium and is stored;
Wherein, the quantity i of the multistage pipelining-stage is determined according to the calculating topological structure of first operational order,
And i is positive integer.
In some possible embodiments, each pipelining-stage in the multistage pipelining-stage is each configured with corresponding multichannel
Selector, the multiple selector, which is set, free option, and the sky option is used to indicate the kth being connected with the multiple selector
Level pipelining-stage and follow-up kth+1 are not performed to i-stage pipelining-stage calculates operation, wherein, k is less than or equal to i just
Integer.
In some possible embodiments, it is described multistage pipelining-stage in each pipelining-stage included by arithmetic unit and institute
The quantity for stating arithmetic unit is by user side or the self-defined setting in computing device side.
In some possible embodiments, each pipelining-stage in the multistage pipelining-stage includes pre-set fixation
Arithmetic unit, the fixation arithmetic unit in each pipelining-stage differ,
The computing device controls the matrix operation unit to be held using the calculation of multistage pipelining-stage to the vector
Row first operational order includes:
The computing device controls the matrix operation unit to utilize the fixation arithmetic unit in first order pipelining-stage to described
Vector be calculated first as a result, the fixation arithmetic unit first result being input in the pipelining-stage of the second level performs
Be calculated second as a result, and so on, until the i-th -1 result inputted to the fixation arithmetic unit in i-stage pipelining-stage holding
I-th of result is calculated in row;
I-th of result is inputted to the storage medium and is stored;
Wherein, the quantity i of the multistage pipelining-stage is determined according to the calculating topological structure of first operational order,
And i is positive integer.
In some possible embodiments, the arithmetic unit in the multistage pipelining-stage in each pipelining-stage include it is following in
Any one or multinomial combination:Addition of matrices arithmetic unit, matrix-vector multiplication arithmetic unit, nonlinear operator and matrix compare
Arithmetic unit.
In some possible embodiments, first operational order includes any one of following:Bivector rotates
Instruction SVRO, three-dimensional vector rotation instruction TVRO, vector translation instruction VTRAN, vector scaling instruction VZOOM, vector shearing refer to
Make VSHEAR.
In some possible embodiments, first operational order is bivector rotation instruction SVRO or three-dimensional vector
Rotation instruction TVRO,
The computing device controls the matrix operation unit to be held using the calculation of multistage pipelining-stage to the vector
Row first operational order includes:
The computing device controls the matrix operation unit to utilize first order pipelining-stage according to the selection of multiple selector
In nonlinear operator 1 computing mended into row vector to the vector obtain first as a result, inputting first result to the
Move and calculate according to the pivot of acquisition and rotation angle execution matrix element in the nonlinear operator of two level pipelining-stage
To second Matrix Multiplication is performed into the matrix-vector multiplication arithmetic unit in third level pipelining-stage as a result, inputting second result
The 3rd result is calculated in vector;3rd result is inputted to the storage medium and is stored.
In some possible embodiments, first operational order is any one of to give an order:Vector translation refers to
VTRAN, vector scaling instruction VZOOM, vectorial shearing instruction VSHEAR are made,
The computing device controls the matrix operation unit to be held using the calculation of multistage pipelining-stage to the vector
Row first operational order includes:
The computing device controls the matrix operation unit to utilize first order pipelining-stage according to the selection of multiple selector
In nonlinear operator 1 computing mended into row vector to the vector obtain first as a result, inputting first result to the
It is corresponding in the nonlinear operator of two level pipelining-stage to perform any one of following operation to obtain the second result:According to acquisition
Shift factor performs the computing of translation matrix structure, the computing of scaled matrix structure is performed according to the zoom factor of acquisition, according to acquisition
Shear factor perform shearing matrix structure computing, second result is inputted to the matrix-vector multiplication in third level pipelining-stage
Matrix Multiplication vector is performed in method arithmetic unit the 3rd result is calculated;3rd result is inputted to the storage medium and is carried out
Storage.
In some possible embodiments, the instruction format of first operational order includes command code and at least one behaviour
Make domain, command code is used for the function of indicating the operational order, and arithmetic element is by identifying that the command code can carry out different matrixes
Computing, operation domain are used to indicate the data message of the operational order, wherein, data message can be immediate or register number,
For example, when obtaining a vector, vector start address and vector can be obtained in corresponding register according to register number
Length, the vector of appropriate address storage is obtained further according to vector start address and vector length in storage medium.Alternatively, may be used
Any one of following middle information or multinomial combination are obtained in corresponding registers:Vectorial line number, row needed for described instruction
Number, data type, mark, storage address (first address) and dimension length, the dimension length refer to vector line length and/
Or the length of vector row.
In some possible embodiments, the vector, which reads instruction, to be included:Vectorial storage needed for described instruction
Vectorial mark needed for location or described instruction.
In some possible embodiments, when the vector, which is read, is designated as mark vectorial needed for described instruction,
The computing device controls the matrix operation unit to read instruction according to the vector and is sent out to the storage medium
Reading order is sent to include:
It is single that the computing device controls the matrix operation unit to be used according to the mark from the register cell
Position reading manner reads the corresponding storage address of the mark;
The computing device controls the matrix operation unit to be sent to the storage medium and reads the storage address
Reading order simultaneously obtains the vector using batch reading manner.
In some possible embodiments, the computing device further includes:Buffer unit, the method further include:
Pending operational order is cached in the buffer unit by the computing device.
In some possible embodiments, control the matrix operation unit to obtain the first computing in the computing device and refer to
Before order, the method further includes:
The computing device determines first operational order and the second operational order before first operational order
With the presence or absence of incidence relation, if first operational order and second operational order there are incidence relation, will described in
First operational order is cached in the buffer unit, after second operational order is finished, from the buffer unit
Extract first operational order and be transmitted to the arithmetic element;
Described definite first operational order whether there is with the second operational order before the first operational order to be associated
System includes:
Required the first vectorial storage address section in first operational order is extracted according to first operational order,
Required the second vectorial storage address section in second operational order is extracted according to second operational order, if described
First storage address section has overlapping region with the second storage address section, it is determined that first operational order with
Second operational order has incidence relation, if the first storage address section and the second storage address section are not
With overlapping region, it is determined that first operational order does not have incidence relation with second operational order.
Second aspect, there is provided a kind of computing device, the computing device include being used for the method for performing above-mentioned first aspect
Functional unit.
The third aspect, there is provided a kind of computer-readable recording medium, it stores the computer journey for electronic data interchange
Sequence, wherein, the computer program causes computer to perform the method that first aspect provides.
Fourth aspect, there is provided a kind of computer program product, the computer program product include storing computer journey
The non-transient computer-readable recording medium of sequence, the computer program are operable to make computer perform first aspect offer
Method.
5th aspect, there is provided a kind of chip, the chip include the computing device that as above second aspect provides.
6th aspect, there is provided a kind of chip-packaging structure, the chip-packaging structure include as above the 5th aspect and provide
Chip.
7th aspect, there is provided a kind of board, the board include the chip-packaging structure that as above the 6th aspect provides.
Eighth aspect, there is provided a kind of electronic equipment, the electronic equipment include the board that as above the 7th aspect provides.
In certain embodiments, the electronic equipment includes data processing equipment, robot, computer, printer, scanning
Instrument, tablet computer, intelligent terminal, mobile phone, automobile data recorder, navigator, sensor, camera, server, cloud server,
Camera, video camera, projecting apparatus, wrist-watch, earphone, mobile storage, wearable device, the vehicles, household electrical appliance, and/or medical treatment
Equipment.
In certain embodiments, the vehicles include aircraft, steamer and/or vehicle;The household electrical appliance include electricity
Depending on, air-conditioning, micro-wave oven, refrigerator, electric cooker, humidifier, washing machine, electric light, gas-cooker, kitchen ventilator;The Medical Devices include
Nuclear Magnetic Resonance, B ultrasound instrument and/or electrocardiograph.
Implement the embodiment of the present application, have the advantages that:
As can be seen that by the embodiment of the present application, computing device is provided with register cell and storage medium, is respectively used to
Store scalar data and vector data, and unit reading manner and batch are read the application for two kinds of memory distributions
Mode, the characteristics of by vector data distribution match the data reading mode of its feature, can be good at utilizing bandwidth, avoid
Because influence of the bottleneck of bandwidth to vectorial calculating speed, in addition, for register cell, since its storage is scalar
Data, there is provided the reading manner of scalar data, improve the utilization rate of bandwidth, so the technical solution that the application provides can
Utilize bandwidth well, avoid influence of the bandwidth to calculating speed, so it is fast with calculating speed, it is efficient the advantages of.
Embodiment
Below in conjunction with the attached drawing in the embodiment of the present application, the technical solution in the embodiment of the present application is carried out clear, complete
Site preparation describes, it is clear that described embodiment is some embodiments of the present application, instead of all the embodiments.Based on this Shen
Please in embodiment, the every other implementation that those of ordinary skill in the art are obtained without creative efforts
Example, shall fall in the protection scope of this application.
Term " first ", " second ", " the 3rd " in the description and claims of this application and the attached drawing and "
Four " etc. be to be used to distinguish different objects, rather than for describing particular order.In addition, term " comprising " and " having " and it
Any deformation, it is intended that cover non-exclusive include.Such as contain the process of series of steps or unit, method, be
The step of system, product or equipment are not limited to list or unit, but alternatively further include the step of not listing or list
Member, or alternatively further include for the intrinsic other steps of these processes, method, product or equipment or unit.
Referenced herein " embodiment " is it is meant that a particular feature, structure, or characteristic described can wrap in conjunction with the embodiments
It is contained at least one embodiment of the application.Each position in the description occur the phrase might not each mean it is identical
Embodiment, nor the independent or alternative embodiment with other embodiments mutual exclusion.Those skilled in the art explicitly and
Implicitly understand, embodiment described herein can be combined with other embodiments.
It should be noted that this application involves matrix be specifically as follows m*n matrixes, wherein, m and N are more than or equal to 1
Integer, when m or n is 1, is represented by 1*n matrixes or m*1 matrixes, is referred to as vector;When m and n at the same time for 1 when, can be with
It is considered as the Special matrix of 1*1.Following matrixes all can be in above-mentioned three types matrix any one, do not repeating below.
The embodiment of the present application provides a kind of computational methods, which can be applied in computing device.It is this such as Fig. 1
A kind of structure diagram of possible computing device shown in inventive embodiments.Computing device as shown in Figure 1 includes:
Storage medium 201, for storage matrix (or vector).The preferable storage medium can be that scratchpad is deposited
Reservoir, it would be preferable to support the matrix data (or vector data) of different length;Necessary calculating data are temporarily stored in by the application
In scratchpad (Scratchpad Memory), make this arithmetic unit can be more during matrix operation is carried out
Flexibly effectively support the data of different length.Above-mentioned storage medium can also be the outer database of piece, database or other energy
Medium enough stored etc..
Register cell 202, for storing scalar data, wherein, which includes but not limited to:Matrix data or
Vector data (the application is also referred to as matrix/vector) storage medium 201 storage address and vector with matrix operation when
Scalar.In one embodiment, register cell can be scalar register heap, there is provided scalar is posted needed for calculating process
Storage, scalar register not only store matrix address, and also storage has scalar data.(i.e. matrix is deposited it is to be understood that matrix address
Store up address, such as first address) also it is scalar.When being related to the computing of matrix and vector, arithmetic element not only will be from register list
Matrix address is obtained in member, corresponding scalar, such as the line number of matrix, columns, matrix function are also obtained from register cell
According to type (alternatively referred to as data type), matrix dimension length (the concretely length of row matrix, length of rectangular array etc.).
Arithmetic element 203 (the application is also referred to as matrix operation unit 203), for obtaining and performing the first operational order.
As shown in Fig. 2, the arithmetic element includes multiple arithmetic units, which includes but not limited to:Addition of matrices arithmetic unit 2031, square
Battle array multiplicative operator 2032, size comparison operation device 2033 (or matrix comparison operation device), 2034 and of nonlinear operator
Matrix-vector multiplication arithmetic unit 2035.
This method is as shown in figure 3, include the following steps:
Step S301, arithmetic element 203 obtains the first operational order, and first operational order is used for realization vector and square
The computing of battle array, first operational order include:Perform the vector needed for the instruction and read instruction.
In step S301, needed for the above-mentioned execution instruction vector read instruction be specifically as follows it is a variety of, for example, this
Apply in an optional technical solution, it can be required vectorial storage that the vector needed for the above-mentioned execution instruction, which reads instruction,
Address.And for example, in the application in another optional technical solution, the vector needed for the above-mentioned execution instruction reads instruction can be with
For required vectorial mark, the form of expression of the mark can be a variety of, for example, the title of vector, and for example, vectorial identification
Number, register number or storage address of the vector in register cell for another example.
Illustrate that above-mentioned first operational order includes below by the example of a reality perform needed for the instruction to
Amount reads instruction, it is assumed here that and the vector operation formula is f (x)=A+B, wherein, A, B are vector.So in the first computing
In instruction in addition to carrying the vector operation formula, storage address vectorial needed for the vector operation formula can also be carried, is had
Body, such as the storage address of A is 0000-0FFF, the storage address of B is 1000-1FFF.And for example, it can carry A's and B
Mark, for example, A be identified as 0101, B be identified as 1010.
Step S302, arithmetic element 203 reads instruction according to the vector and sends reading order to the storage medium 201.
The implementation method of above-mentioned steps S302 is specifically as follows:
It can be required vectorial storage address that such as the vector, which reads instruction, and arithmetic element 203 is sent out to the storage medium 201
Give the reading order of the reading storage address and corresponding vector is obtained using batch reading manner.
And for example the vector reads instruction when can be required vectorial mark, and arithmetic element 203 is according to the mark from deposit
The corresponding storage address of the mark is read using unit reading manner at device unit, then arithmetic element 203 is to the storage medium
201 send the reading order of the reading storage address and obtain corresponding vector using batch reading manner.
Above-mentioned single reading manner is specifically as follows, and reads every time as the data of unit, i.e. 1bit data.Set at this time
The reason for unit reading manner i.e. 1 reading manner, is, for scalar data, the capacity that it takes is very small, if adopted
With batch data reading manner, then the data volume of reading is easily more than the capacity of required data, can so cause bandwidth
Waste, so using unit reading manner here for the data of scalar and reading to reduce the waste of bandwidth.
Step S303, arithmetic element 203 reads the corresponding vector of the instruction using batch reading manner, which is performed
First operational order.
Batch reading manner is specifically as follows in above-mentioned steps S303, and reading every time is the data of multidigit, such as every time
The data bits of reading is 16bit, 32bit or 64bit, i.e., no matter the data volume needed for it is how many, it reads equal every time
For the data of fixed long number, the data mode that this batch is read is very suitable for the reading of big data, for vector, due to
Capacity shared by it is big, if using single reading manner, its speed read can be very slow, so being read here using batch
Mode is taken to obtain the data of multidigit so as to quickly read vector data, is avoided because read vector data influences to gauge slowly excessively
The problem of calculating speed.
The computing device for the technical solution that the application provides is provided with register cell and storage medium, it stores mark respectively
Measure data and vector data, and the application for two kinds of memory distributions unit reading manner and batch reading manner,
The characteristics of by vector data distribution match its feature data reading mode, can be good at utilize bandwidth, avoid because of
Influence of the bottleneck of bandwidth to vectorial calculating speed, in addition, for register cell, since its storage is scalar number
According to there is provided the reading manner of scalar data, the utilization rate of bandwidth being improved, so the technical solution that the application provides can be very
Good utilization bandwidth, avoids influence of the bandwidth to calculating speed, so it is fast with calculating speed, it is efficient the advantages of.
Optionally, it is above-mentioned that vector execution first operational order is specifically as follows:
Arithmetic element 203 can use the calculation of multistage pipelining-stage, and the embodiment of the present application can use the meter of i grades of pipelining-stages
Calculation mode performs first operational order to the vector.Specifically include following several embodiments.
In the first embodiment, the computing device can also design at least one multiple selector MMUX to realize multistage
The calculating of pipelining-stage.It is specific as what Fig. 4 A and 4B were shown respectively pipelining-stage more than two kinds realizes framework.Such as Fig. 4 A, the computing device
Can be that each pipelining-stage designs a multiple selector MMUX;Fig. 4 B are shown as multiple pipelining-stages and set a multiple selector
MMUX.Multiple selector it is corresponding needed for control/selection pipelining-stage connection, for select the arithmetic unit in the pipelining-stage with
Realize relevant calculating.It is to be understood that arithmetic unit of the multiple selector selected in pipelining-stage is according to the first operational order
What corresponding calculating network topology determined, specifically it will be described in detail later.
In the specific implementation, arithmetic element 203 can be according to the selection of multiple selector, using selected by first (level) pipelining-stage
First choice arithmetic unit to the vector be calculated first and utilized as a result, the first result then is inputted the second pipelining-stage
The selected second Selecting operation device of second pipelining-stage perform be calculated second as a result, and so on, the i-th -1 result is defeated
Enter into the i-th pipelining-stage to perform using its selected i-th Selecting operation device and i-th of result is calculated.Here i-th of result
As export result (being specially output vector).Further, arithmetic element 203 can store the output result to storage medium
201。
What the quantity i of the multistage pipelining-stage was specifically determined according to the calculating topological structure of first operational order, i
For positive integer.In general, i=3.May be provided with arithmetic unit correspondingly in each pipelining-stage, the arithmetic unit include but not limited to
Any one of lower or multinomial combination:Addition of matrices arithmetic unit, matrix-vector multiplication arithmetic unit, nonlinear operator, matrix
Comparison operation device and other matrix operation devices.It is the quantity of arithmetic unit and arithmetic unit included in each pipelining-stage
Can there are user side or the self-defined setting in computing device side, not limit.
With i=3, exemplified by three-level flowing water, arithmetic element can select the first pipelining-stage respectively extremely by three multiple selector
The respective required arithmetic unit (alternatively referred to as arithmetic unit) used in 3rd pipelining-stage;Meanwhile the first flowing water is performed to the vector
Level is calculated first as a result, the first result is input to calculating for the second pipelining-stage the second pipelining-stage of execution by (optional)
To second as a result, (optional) by the second result be input to the 3rd pipelining-stage perform the 3rd pipelining-stage be calculated the 3rd as a result,
(optional) stores the 3rd result to storage medium 201.As Fig. 5 shows a kind of operating process schematic diagram of pipelining-stage.
Above-mentioned first pipelining-stage includes but not limited to:Matrix multiplication operation device etc..
Above-mentioned second pipelining-stage includes but not limited to:Addition of matrices arithmetic unit, size comparison operation device etc..
Above-mentioned 3rd pipelining-stage includes but not limited to:Nonlinear operator, matrix-vector multiplication arithmetic unit, matrix scalar multiplication
Method arithmetic unit etc..
By three pipelining-stage computings of vector point primarily to improving the speed of computing, for the calculating for vector, example
Such as using general processor when calculating, the step of its computing, is specifically as follows, and processor carries out vector to be calculated first
As a result, then by the storage of the first result in memory, processor from memory reads the first result and performs is calculated for the second time
Two as a result, then by the storage of the second result in memory, processor is calculated for the third time from interior performed from the second result of reading
3rd as a result, then by the storage of the 3rd result in memory.It can be seen that from the step of above-mentioned calculating and carried out in general processor
When vector calculates, it does not shunt water level and is calculated, then be required to after calculating every time the data that will have been calculated into
Row preservation, needs to again read off when next time calculates, so this scheme needs repetition storage to read multiple data, for the application's
For technical solution, the first result that the first pipelining-stage calculates is directly entered the grading row calculating of the second flowing water, the second pipelining-stage meter
The second result calculated enters directly into the 3rd pipelining-stage and is calculated, the first result that the first pipelining-stage and the second pipelining-stage calculate
With the second result without storage, which reduce the occupied space of memory first, secondly, which obviate result multiple storage and
Read, improve the utilization rate of bandwidth, further increase computational efficiency.
In another embodiment of the application, each flowing water component can be freely combined or take level-one pipelining-stage.Such as will
Second pipelining-stage and the 3rd pipelining-stage merge, and either all merge first and second and the 3rd assembly line or each
Pipelining-stage is responsible for different computings can be with permutation and combination.For example, first order flowing water is responsible for comparison operation, and partial product computing,
Two level flowing water is responsible for the combination such as nonlinear operation and matrix-vector multiplication.It is that the i pipelining-stage designed in the application is supported to appoint
Multiple pipelining-stages of anticipating are in parallel, connect and merge, and to form different permutation and combination, the application does not limit.
It should be noted that being also provided with sky option in each multiple selector, i.e., it is connected with the multiple selector
Pipelining-stage and follow-up pipelining-stage are not involved in computing.It that is to say, sky option described herein is used to indicate to select with the multichannel
Select the kth level pipelining-stage of device connection and follow-up kth+1 not performs to i-stage pipelining-stage and calculates operation, wherein, k is
Positive integer less than or equal to i.
With i=3, exemplified by three-level flowing water, if the multiple selector selected as sky option of the 3rd pipelining-stage of connection, the 3rd
Pipelining-stage is not involved in computing, and the current pipelining-stage for performing arithmetic operation is less than three;For example, certain operational order is instructed comprising two-stage,
Then the corresponding multiple selector of the 3rd pipelining-stage chooses sky option.
Using above-mentioned computing device (be designed with multiple selector select to need in every grade of pipelining-stage the arithmetic unit that uses/
Arithmetic unit), have the advantages that:In addition to bandwidth is improved, while have that logic is clear and definite, the output result of no arithmetic unit
From rear pipelining-stage redirecting to preceding pipelining-stage, the characteristics of input interface and output interface are single, good operability.
In second of embodiment, the computing device can be that the fixation pipelining-stage of every kind of operational order design correspondingly is real
Existing framework.The corresponding three-level pipelining-stage of a kind of operational order as shown in figure 5 above realizes framework.It is to refer to for certain computing
For order, arithmetic unit included in each pipelining-stage is that the fixed setting in advance of user side or the computing device side is good, this
The alternatively referred to as fixed arithmetic unit of application.In addition, fixation arithmetic unit in each pipelining-stage can it is identical also can be different, typically not
Identical.For example, the first pipelining-stage is addition of matrices arithmetic unit, second level flowing water is matrix multiplication operation device, third level flowing water
For nonlinear operator;And for example, the first pipelining-stage is addition of matrices arithmetic unit, and second level flowing water is addition of matrices arithmetic unit, the
Three-level flowing water is matrix multiplication operation device etc..I.e. different operational orders, is related to different flowing water stage arrangements (realizing framework).
It is to be understood that realizing demand according to what nonidentity operation instructed, the quantity i of its pipelining-stage being related to can be different, can correspond to increase or
Less, the application does not limit.
In the specific implementation, arithmetic element 203 utilizes the fixation arithmetic unit in first (level) pipelining-stage to the vector successively
First be calculated as a result, the first result is inputted the second pipelining-stage to be calculated using the fixation arithmetic unit execution in it
To second as a result, and so on, until the i-th -1 result is inputted to the i-th pipelining-stage to be held using the fixation arithmetic unit in it
I-th of result is calculated in row.Here i-th of result is to export result (being specially output vector).Further, computing list
Member 203 can store the output result to storage medium 201.Quantity i and each pipelining-stage on the multistage pipelining-stage
The fixation arithmetic unit of middle design can be found in the related elaboration in previous embodiment, and which is not described herein again.
With i=3, exemplified by three-level flowing water.Referring to earlier figures 5, a kind of the fixed real of the corresponding pipelining-stage of operational order is shown
Existing framework.Specifically, the multiplication that arithmetic element performs the vector the first pipelining-stage is calculated first as a result, by the first result
The additional calculation for being input to the second pipelining-stage the second pipelining-stage of execution obtains second as a result, the second result is input to the 3rd flowing water
The NONLINEAR CALCULATION that level performs the 3rd pipelining-stage obtains the 3rd as a result, storing the 3rd result (exporting result) to storage medium
201。
It should be noted that the arithmetic unit in above-mentioned computing device in each pipelining-stage be in advance it is self-defined set,
Once it is determined that do not allow to change;That is i grades of pipelining-stage may be designed as the permutation and combination of any arithmetic unit, i grades of pipelining-stages once driving not
Change again, different operational orders can design different i level flowing water stage arrangements.Wherein, which can be according to specific instruction
Demand, the quantity of adaptability increase/less pipelining-stage.Finally, the flowing water stage arrangement designed for different instruction can be combined
Together, the computing device is formed.
Using above-mentioned computing device (i.e. arithmetic unit/arithmetic unit design in every grade of pipelining-stage is fixed), have with following
Beneficial effect:In addition to bandwidth is improved, have specificity high, without unnecessary logic judgment, further improve operational performance, arithmetic speed
The characteristics of fast.
Optionally, above-mentioned computing device can also include:Buffer unit 204, for caching the first operational order.Instruction exists
In implementation procedure, while also it is buffered in instruction cache unit, after an instruction has performed, if the instruction is at the same time
It is not to be submitted an instruction earliest in instruction in instruction cache unit, which will carry on the back and submits, once submit, this instruction
Change of the operation of progress to unit state will be unable to cancel.In one embodiment, instruction cache unit can be reset
Sequence caches.
Optionally, the above method can also include before step S301:
Determine that first operational order whether there is incidence relation with the second operational order before the first operational order, such as
First operational order there are incidence relation, is then performed with the second operational order before the first operational order in the second operational order
After finishing, first operational order is extracted from buffer unit and is transferred to arithmetic element 203.If the first operational order is with being somebody's turn to do
Instruction onrelevant relation before first operational order, then be directly transferred to arithmetic element by the first operational order.
Above-mentioned definite first operational order whether there is with the second operational order before the first operational order to be associated
The concrete methods of realizing of system can be:
Required the first vectorial storage address section, foundation in first operational order are extracted according to first operational order
Second operational order extracts required the second vectorial storage address section in second operational order, such as the first stored address area
Between with the second storage address section there is overlapping region, it is determined that the first operational order has with the second operational order to be associated
System.Such as the first storage address section and the non-overlapping region in the second storage address section, it is determined that the first operational order and second
Operational order does not have incidence relation.
There is overlapping region the first operational order of explanation occur in trivial of this storage and have accessed phase with the second operational order
Same vector, for vector, is as judgement since the space of its storage is bigger, such as using identical storage region
The no condition for incidence relation, in fact it could happen that situation be, the second operational order access storage region contain the first computing
The storage region accessed is instructed, is deposited for example, the second operational order accesses A vectors storage region, B vectors storage region and C vectors
Storage area domain, if A, B storage region are adjacent or A, C storage region are adjacent, the second operational order access storage region be, A,
B storage regions and C storage regions, or A, C storage region and B storage regions.In this case, if the first operational order
The storage region for A vectors and D vectors accessed, then the vectorial storage region that the first operational order accesses can not be with second
The vectorial storage region of operational order model essay is identical, if using identical Rule of judgment, it is determined that the first operational order with
Second operational order does not associate, but it was verified that the first operational order and the second operational order belong to incidence relation at this time, institute
With the application by whether have overlapping region to determine whether for incidence relation condition, the erroneous judgement of the above situation can be avoided.
Illustrate which kind of situation belongs to incidence relation below with the example of a reality, which kind of situation belongs to dereferenced pass
System.It is assumed here that the vector needed for the first operational order is A vector sums D vectors, the storage region of wherein A vectors is【0001,
0FFF】, the storage region of D vectors is【A000, AFFF】, it is A vectors, B vector sums C for the vector needed for the second operational order
Vector, its corresponding storage region are【0001,0FFF】、【1000,1FFF】、【B000, BFFF】, refer to for the first computing
For order, its corresponding storage region is:【0001,0FFF】、【A000, AFFF】, for the second operational order, it is corresponded to
Storage region be:【0001,1FFF】、【B000, BFFF】, so the storage region of the second operational order and the first operational order
Storage region there is overlapping region【0001,0FFF】, so the first operational order has incidence relation with the second operational order.
It is assumed here that the vector needed for the first operational order is E vector sums D vectors, the storage region of wherein A vectors is
【C000, CFFF】, the storage region of D vectors is【A000, AFFF】, it is A vectors, B for the vector needed for the second operational order
Vector sum C vectors, its corresponding storage region are【0001,0FFF】、【1000,1FFF】、【B000, BFFF】, for
For one operational order, its corresponding storage region is:【C000, CFFF】、【A000, AFFF】, come for the second operational order
Say, its corresponding storage region is:【0001,1FFF】、【B000, BFFF】, so the storage region of the second operational order and the
The storage region of one operational order does not have overlapping region, so the first operational order and the second operational order onrelevant relation.
In the application, if Fig. 6 A are a kind of instruction (concretely the first operational order, or operation that the application provides
Instruction) instruction set form schematic diagram, as shown in Figure 6A, operational order includes a command code and an at least operation domain, wherein,
Command code is used to indicate the function of the operational order, arithmetic element by identifying that the command code can carry out different vector operations,
Operation domain is used to indicate the data message of the operational order, wherein, data message can be immediate or register number, for example,
When obtaining a vector, vector start address and vector length can be obtained in corresponding register according to register number,
The vector of appropriate address storage is obtained in storage medium further according to vector start address and vector length.
I.e. the first operational order can include:Operation domain and at least one command code, by taking vector operation instruction as an example, such as
Shown in table 1, wherein, register 0, register 1, register file 2, register 3, register 4 can be operation domain.Wherein, each
Register 0, register 1, register 2, register 3, register 4 be used for marker register numbering, its can be one or
Multiple registers.It is to be understood that the quantity of register does not limit in command code, each register is used to storage computing and refers to
The related data information of order.
If Fig. 6 B are the fingers for another instruction (can be the first operational order, be alternatively referred to as operational order) that the application provides
Make collection form schematic diagram, as shown in Figure 6B, instruction include at least two command codes and at least an operation domain, wherein, it is described extremely
Few two command codes include the first command code and the second command code (diagram is respectively command code 1 and command code 2).The command code
1 is used for the type (i.e. certain major class instruction) of indicator, such as can concretely I/O instruction, logical order or operational order etc.
Deng, the command code 2 is used for the function (explanation of the specific instruction i.e. under major class instruction) of indicator, such as in operational order
Matrix operation command (such as Matrix Multiplication vector instruction MMUL, matrix inversion command M INV), vector operation instruction is (as vector is asked
Lead instruction VDIER etc.) etc., the application does not limit.
It is to be understood that the form of instruction can be user side or the self-defined setting in computing device side.The behaviour of instruction
Regular length, such as 8bit, 16bit etc. are may be designed as code.Instruction format as shown in Fig. 6 A has the advantage that feature:
Command code occupancy digit is few, decoding system design is simple.Instruction format as shown in Fig. 6 B has the advantage that feature:It is variable
Long, decoding average efficiency higher, in the case that certain major class instructs lower specific instruction less and calls frequency height, design its second
The length of command code (i.e. command code 2) is short and small, can improve decoding efficiency.Moreover it is possible to strengthen the readable and expansible of instruction
Property, optimizes the coding structure of instruction.
In the embodiment of the present application, instruction set includes the operational order of difference in functionality, concretely:
Bivector rotation instruction (SVRO), according to the instruction, device from memory (preferable scratchpad or
Person's scalar register heap) specified address take out formulate size vector data and make augmentation form into, from scalar register heap
The pivot coordinate data specified and pivoting angle data are taken out, spin matrix is generated, Matrix Multiplication is carried out in arithmetic element
The multiplying of vector, and result of calculation is written back to memory (preferable scratchpad or scalar register heap)
Specified address.What deserves to be explained is vector can be stored in as the matrix (the only matrix of a row element) of special shape
In scratchpad.Here memory can include but are not limited to scratchpad, similarly hereinafter.
Three-dimensional vector rotation instruction (TVRO), according to the instruction, device from memory (preferable scratchpad or
Person's scalar register heap) specified address take out formulate size vector data and make augmentation form into, from scalar register heap
Take out the rotation axis data specified and pivoting angle data, generate spin matrix, carried out in matrix operation unit Matrix Multiplication to
The multiplying of amount, and result of calculation is written back to the specified address of memory;What deserves to be explained is vector can be used as it is special
The matrix (the only matrix of a row element) of form is stored in scratchpad.
Vector translation instruction (VTRAN) according to the instruction, device taken out from the specified address of memory formulate size to
Amount data simultaneously make augmentation form into, and the data that each translation direction specified is taken out from scalar register heap generate translation matrix,
The multiplying of Matrix Multiplication vector is carried out in matrix operation unit, and result of calculation is recovered into original dimension from augmentation form and is write back
To the specified address of memory;What deserves to be explained is vector can be as matrix (the only square of a row element of special shape
Battle array) it is stored in scratchpad.
Vector scaling instruction (VZOOM) according to the instruction, device taken out from the specified address of memory formulate size to
Amount data simultaneously make augmentation form into, and the scaling amplitude data specified, generation shearing matrix, in matrix are taken out from scalar register heap
The multiplying of Matrix Multiplication vector is carried out in arithmetic element, and result of calculation is written back to the specified address of memory;It is worth saying
Bright, vector can be stored in scratchpad as the matrix (the only matrix of a row element) of special shape.
Vectorial shearing instruction (VSHEAR) according to the instruction, device taken out from the specified address of memory formulate size to
Amount data simultaneously make augmentation form into, and all directions amplitude data specified, generation shearing matrix, in square are taken out from scalar register heap
The multiplying of Matrix Multiplication vector is carried out in battle array arithmetic element, and result of calculation is written back to the specified address of memory;It is worth
Illustrate, vector can be stored in scratchpad as the matrix (the only matrix of a row element) of special shape
In.
It is to be understood that operation/operational order that the application proposes is mainly used for input vector form and output vector form
Arithmetic operation, not limit calculating process caused by intermediate data form.Designed in the application in every grade of pipelining-stage
Arithmetic unit is including but not limited to any one of following or multinomial combination:Addition of matrices arithmetic unit, matrix multiplication operation device, square
Battle array vector multiplication arithmetic unit, nonlinear operator, matrix comparison operation device.
Be exemplified below this application involves operational order (i.e. the first operational order) calculating.
By taking first operational order is bivector rotation instruction SVRO as an example, calculate the two dimension of given pivot to
Amount.During specific implementation, bivector X (x, y), pivot (a, b) and rotation angle c are given, is calculated according to equation below
The rotating vector of given vector X.
Correspondingly, the instruction format of bivector rotation instruction SVRO is specially:
With reference to previous embodiment, arithmetic element can obtain bivector rotation instruction SVRO, and after being decoded to it, from deposit
Vector X, pivot and rotation angle are read in device unit, non-linear fortune is chosen using the multiple selector of the first pipelining-stage
Calculation device carries out vectorial X the operation of benefit 1 and obtains the first result (what is concretely formed after the operation of benefit 1 treats rotating vector);Profit at the same time
Nonlinear operator is chosen with the multiple selector of the second pipelining-stage, and rotation is performed according to the pivot and rotation angle of reading
Matrix structure computing (optional, further relate to matrix element moves operation) obtains the second result with correspondence and (concretely rotates
Matrix);The multiple selector that first result and second result are inputted to the 3rd pipelining-stage chooses matrix-vector multiplication
Method arithmetic unit performs Matrix Multiplication vector operation (specially treat rotating vector and spin matrix performs Matrix Multiplication vector operation) and obtains
To the 3rd result (exporting result).Alternatively, the 3rd result is stored into storage medium.Optionally, can also be by described in
First result is inputted into the second pipelining-stage, is disregarded, to wait the second pipelining-stage to be obtained together after building spin matrix
Second result.I.e. second result includes spin matrix and treats rotating vector at this time, so as to subsequently only by second result
Input is handled to obtain output result into the 3rd pipelining-stage.
By taking first operational order is three-dimensional vector rotation instruction TVRO as an example, the rotation in three dimensions is carried out.Tool
When body is realized, a three-dimensional vector X (x, y, z), pivot (u, v, w) and rotation angle c are given, is calculated according to equation below
Postrotational three-dimensional vector.
Correspondingly, the instruction format of three-dimensional vector rotation instruction TVRO is specially:
With reference to previous embodiment, arithmetic element can obtain three-dimensional vector rotation instruction TVRO, and after being decoded to it, from deposit
Vector X, pivot and rotation angle are read in device unit, non-linear fortune is chosen using the multiple selector of the first pipelining-stage
Calculation device carries out vectorial X the operation of benefit 1 and obtains the first result (what is concretely formed after the operation of benefit 1 treats rotating vector);Profit at the same time
Nonlinear operator is chosen with the multiple selector of the second pipelining-stage, and rotation is performed according to the pivot and rotation angle of reading
Matrix structure computing (optional, further relate to matrix element moves operation) obtains the second result with correspondence and (concretely rotates
Matrix);The multiple selector that first result and second result are inputted to the 3rd pipelining-stage chooses matrix-vector multiplication
Method arithmetic unit performs Matrix Multiplication vector operation (specially treat rotating vector and spin matrix performs Matrix Multiplication vector operation) and obtains
To the 3rd result (exporting result).Alternatively, the 3rd result is stored into storage medium.Optionally, can also be by described in
First result is inputted into the second pipelining-stage, is disregarded, to wait the second pipelining-stage to be obtained together after building spin matrix
Second result.I.e. second result includes spin matrix and treats rotating vector at this time, so as to subsequently only by second result
Input is handled to obtain output result into the 3rd pipelining-stage.
By taking first operational order instructs VTRAN for vector translation as an example, the translation vector for giving vector is calculated.Specifically
When realizing, vector X (x are given1,x2,x3,..xn) and translation vector Y (dx1,dx2,…dxn) (alternatively referred to as shift factor), press
The vector after the translation of given vector X is calculated according to equation below.
Correspondingly, the instruction format of vector translation instruction VTRAN is specially:
With reference to previous embodiment, arithmetic element can obtain vector translation instruction VTRAN, and after being decoded to it, from register
Vector X and translation vector Y is read in unit, nonlinear operator is chosen to vectorial X using the multiple selector of the first pipelining-stage
Carry out the operation of benefit 1 and obtain the first result (what is concretely formed after the operation of benefit 1 treats translation vector);Utilize the second pipelining-stage at the same time
Multiple selector choose nonlinear operator according to the translation vector Y of reading perform the computing of translation matrix structure (it is optional, also
Be related to matrix element moves operation) the second result (concretely translation matrix) is obtained with correspondence;By first result and
Second result inputs to the multiple selector of the 3rd pipelining-stage and chooses matrix-vector multiplication arithmetic unit execution Matrix Multiplication vector
Computing (specially treat translation vector and translation matrix performs Matrix Multiplication vector operation) obtains the 3rd result (exporting result).
Alternatively, the 3rd result is stored into storage medium.Optionally, first result can be also inputted to the second pipelining-stage
In, disregard, to wait the second pipelining-stage to obtain the second result together after building translation matrix.That is second knot at this time
Fruit includes translation matrix and treats translation vector, is handled subsequently only to input second result into the 3rd pipelining-stage
Obtain output result.
By taking first operational order instructs VZOOM for vector scaling as an example, the scale vectors for giving vector are calculated.Specifically
When realizing, vector X (x are given1,x2,x3,..xn) and zoom factor a, according to equation below calculate the scaling of given vector X to
Amount.
Correspondingly, the instruction format of vector scaling instruction VZOOM is specially:
With reference to previous embodiment, arithmetic element can obtain vector scaling instruction VZOOM, and after being decoded to it, from register
Vector X and zoom factor a is read in unit, nonlinear operator is chosen to vectorial X using the multiple selector of the first pipelining-stage
Carry out the operation of benefit 1 and obtain the first result (what is concretely formed after the operation of benefit 1 treats scale vectors);Utilize the second pipelining-stage at the same time
Multiple selector choose nonlinear operator according to the zoom factor a of reading perform the computing of scaled matrix structure (it is optional, also
Be related to matrix element moves operation) the second result (concretely scaled matrix) is obtained with correspondence;By first result and
Second result inputs to the multiple selector of the 3rd pipelining-stage and chooses matrix-vector multiplication arithmetic unit execution Matrix Multiplication vector
Computing (specially treat scale vectors and scaled matrix performs Matrix Multiplication vector operation) obtains the 3rd result (exporting result).
Alternatively, the 3rd result is stored into storage medium.Optionally, first result can be also inputted to the second pipelining-stage
In, disregard, to wait the second pipelining-stage to obtain the second result together after building scaled matrix.That is second knot at this time
Fruit includes scaled matrix and treats scale vectors, is handled subsequently only to input second result into the 3rd pipelining-stage
Obtain output result.
By taking first operational order is vectorial shearing instruction VSHEAR as an example, the shearing matrix to set matrix is calculated.Tool
When body is realized, vector X (x are given1,x2,x3,..xn) and shear vector Y (d1,d2,…dn) (alternatively referred to as shear factor), press
The vector after the shearing of given vector X is calculated according to equation below.
Correspondingly, the instruction format of vectorial shearing instruction VSHEAR is specially:
With reference to previous embodiment, arithmetic element can obtain vectorial shearing instruction VSHEAR, and after being decoded to it, from register
Vector X and shear vector Y is read in unit, nonlinear operator is chosen to vectorial X using the multiple selector of the first pipelining-stage
Carry out the operation of benefit 1 and obtain the first result (what is concretely formed after the operation of benefit 1 treats shear vector);Utilize the second pipelining-stage at the same time
Multiple selector choose nonlinear operator according to the shear vector Y of reading perform the computing of shearing matrix structure (it is optional, also
Be related to matrix element moves operation) the second result (concretely shearing matrix) is obtained with correspondence;By first result and
Second result inputs to the multiple selector of the 3rd pipelining-stage and chooses matrix-vector multiplication arithmetic unit execution Matrix Multiplication vector
Computing (specially treat shear vector and shearing matrix performs Matrix Multiplication vector operation) obtains the 3rd result (exporting result).
Alternatively, the 3rd result is stored into storage medium.Optionally, first result can be also inputted to the second pipelining-stage
In, disregard, to wait the second pipelining-stage to obtain the second result together after building shearing matrix.That is second knot at this time
Fruit includes shearing matrix and treats shear vector, is handled subsequently only to input second result into the 3rd pipelining-stage
Obtain output result.
It should be noted that the acquisition and decoding of above-mentioned various operational orders will be described in detail later.Ying Li
Solution, each operational order (such as bivector rotation instruction SVRO, three-dimensional vector rotation are realized using the structure of above-mentioned computing device
Turn instruction TVRO etc.) calculating, following beneficial effect can be obtained:Support the vector format that stores at regular intervals, avoid pair
The executive overhead and the space hold of storage intermediate result that vector format is converted;Support to store information converting with scalar value,
Avoid the storage overhead of spin matrix.
Setting length in aforesaid operations instruction (i.e. vector operation instruction/first operational order) can voluntarily be set by user
Fixed, in an optional embodiment, which can be arranged to a value by user, certainly in practical applications,
The setting length can also be arranged to multiple values by user.The application embodiment does not limit the specific of the setting length
Value and number.For the purpose, technical scheme and advantage of the application are more clearly understood, below in conjunction with specific embodiment, and
Referring to the drawings, the application is further described.
Refering to Fig. 7, Fig. 7 is another computing device 50 that the application embodiment provides.Shown in Fig. 7, dress is calculated
Putting 50 includes:Storage medium 501, register cell 502 (preferably scalar data storage unit, scalar register unit),
Arithmetic element 503 (can also claim matrix operation unit 503) and control unit 504;
Storage medium 501, for storage matrix (or vector);
Scalar data storage unit 502, for storing scalar data, the scalar data includes at least:The matrix (
Can be vector) storage address in the storage medium;
Control unit 504, for controlling the arithmetic element to obtain the first operational order, first operational order is used for
Realize the computing between vector and matrix, the vector that first operational order includes performing needed for described instruction reads instruction;
Arithmetic element 503, reading order is sent for reading instruction according to the vector to the storage medium;Foundation is adopted
The vector is read with batch reading manner and reads the corresponding vector of instruction, and first operational order is performed to the vector.
Optionally, above-mentioned vector reads instruction and includes:Vectorial storage address or described instruction institute needed for described instruction
Need the mark of vector.
Optionally during mark vectorial as needed for the vector reading is designated as described instruction,
Control unit 504, for controlling the arithmetic element to go out according to the mark from the register cell using single
Position reading manner reads the corresponding storage address of the mark, controls the arithmetic element to be sent to the storage medium and reads institute
State the reading order of storage address and the vector is obtained using batch reading manner.
Optionally, arithmetic element 503, specifically for the calculation using multistage pipelining-stage, institute is performed to the vector
State the first operational order.
Optionally, each pipelining-stage in the multistage pipelining-stage includes at least one arithmetic unit,
Arithmetic element 503, specifically for the selection according to multiple selector, utilizes the first choice in first order pipelining-stage
Arithmetic unit to the vector be calculated first as a result, first result to be input to second in the pipelining-stage of the second level
Selecting operation device perform be calculated second as a result, and so on, until inputting the i-th -1 result into i-stage pipelining-stage
The i-th Selecting operation device perform i-th of result is calculated;I-th of result is inputted to the storage medium and is deposited
Storage;Wherein, the quantity i of the multistage pipelining-stage is determined according to the calculating topological structure of first operational order, and i is
Positive integer.
Optionally, each pipelining-stage in the multistage pipelining-stage is each configured with corresponding multiple selector, described more
Road selector, which is set, free option, the sky option be used to indicating the kth level pipelining-stage being connected with the multiple selector and
Follow-up kth+1 not performs to i-stage pipelining-stage and calculates operation, wherein, k is the positive integer less than or equal to i.
Optionally, the quantity of the arithmetic unit included by each pipelining-stage in the multistage pipelining-stage and the arithmetic unit
It is by user side or the self-defined setting in computing device side.
Optionally, each pipelining-stage in the multistage pipelining-stage includes pre-set fixed arithmetic unit, described every
Fixation arithmetic unit in a pipelining-stage differs,
Arithmetic element 503, for the vector to be calculated using the fixation arithmetic unit in first order pipelining-stage
First as a result, the fixation arithmetic unit first result being input in the pipelining-stage of the second level performs and the second knot is calculated
Fruit, and so on, it is calculated i-th until the i-th -1 result is inputted to the fixation arithmetic unit in i-stage pipelining-stage to perform
As a result;I-th of result is inputted to the storage medium and is stored;Wherein, the quantity i of the multistage pipelining-stage is root
Determined according to the calculating topological structure of first operational order, and i is positive integer.
Optionally, the arithmetic unit in the multistage pipelining-stage in each pipelining-stage includes any one of following or multinomial
Combination:Addition of matrices arithmetic unit, matrix multiplication operation device, matrix-vector multiplication arithmetic unit, nonlinear operator and matrix ratio
Compared with arithmetic unit.
Optionally, first operational order is bivector rotation instruction SVRO or three-dimensional vector rotation instruction TVRO,
Arithmetic element 503, for the selection according to multiple selector, utilizes the nonlinear operator in first order pipelining-stage
1 computing is mended into row vector to the vector and obtains first as a result, utilizing the nonlinear operator root in the pipelining-stage of the second level at the same time
Matrix element is performed according to the pivot and rotation angle of acquisition move be calculated second as a result, by first result and institute
State the second result and input into the matrix-vector multiplication arithmetic unit in third level pipelining-stage and perform Matrix Multiplication vector and be calculated the
Three results;3rd result is inputted to the storage medium and is stored.
Optionally, first operational order is any one of to give an order:Vector translation instruction VTRAN, vector contracting
Instruction VZOOM, vectorial shearing instruction VSHEAR are put,
Arithmetic element 503, for the selection according to multiple selector, utilizes the nonlinear operator in first order pipelining-stage
1 computing is mended into row vector to the vector and obtains first as a result, utilizing the nonlinear operator pair in the pipelining-stage of the second level at the same time
Any one of following operation should be performed to obtain the second result:Translation matrix structure fortune is performed according to the shift factor of acquisition
Calculate, the computing of scaled matrix structure is performed according to the zoom factor of acquisition, shearing matrix structure is performed according to the shear factor of acquisition
Computing, first result and second result are inputted into the matrix-vector multiplication arithmetic unit in third level pipelining-stage and held
Row matrix multiplies vector and the 3rd result is calculated;3rd result is inputted to the storage medium and is stored.
Optionally, the computing device further includes:
Buffer unit 505, for caching pending operational order;
Described control unit 504, for pending operational order to be cached in the buffer unit 504.
Optionally, control unit 504, for determining the before first operational order and first operational order
Two operational orders whether there is incidence relation, such as first operational order and second operational order there are incidence relation,
Then by first operational order caching with the buffer unit, after second operational order is finished, from described
Buffer unit extracts first operational order and is transmitted to the arithmetic element;
Described definite first operational order whether there is with the second operational order before the first operational order to be associated
System includes:
Required the first vectorial storage address section in first operational order is extracted according to first operational order,
Required the second vectorial storage address section in second operational order is extracted according to second operational order, such as described the
One storage address section has overlapping region with the second storage address section, it is determined that first operational order and institute
Stating the second operational order has incidence relation, and such as the first storage address section does not have with the second storage address section
Overlapping region, it is determined that first operational order does not have incidence relation with second operational order.
Optionally, above-mentioned control unit 503, can be used for obtaining operational order from instruction cache unit, and to the computing
After instruction is handled, there is provided to the arithmetic element.Wherein, control unit 503 can be divided into three modules, be respectively:
Fetching module 5031, decoding module 5032 and instruction queue module 5033,
Fetching module 5031, for obtaining operational order from instruction cache unit;
Decoding module 5032, for the operational order to acquisition into row decoding;
Instruction queue 5033, for after decoding operational order carry out sequential storage, it is contemplated that different instruction comprising
Register on there may exist dependence, for cache decode after instruction, launch after dependence is satisfied and refer to
Order.
Refering to Fig. 8, Fig. 8 is the flow chart that computing device provided by the embodiments of the present application performs operational order, such as Fig. 8 institutes
Show, the hardware configuration of the computing device is refering to the structure shown in Fig. 7, and storage medium as shown in Figure 7 is with scratchpad
Exemplified by, performing the process of two-dimensional/three-dimensional vector rotation instruction includes:
Step S601, computing device control fetching module take out two-dimensional/three-dimensional vector rotation instruction, and by two dimension/tri-
Decoding module is sent in dimensional vector rotation instruction.
Step S602, decoding module decode the two-dimensional/three-dimensional vector rotation instruction, and two-dimensional/three-dimensional vector is revolved
Turn instruction and be sent to instruction queue.
Step S603, in instruction queue, which needs to obtain from scalar register heap
The data in scalar register in instruction corresponding to six operation domains, which includes input vector address, input vector is grown
Degree, input pivot vector address, input pivot scalar, output vector address, output vector length.
Step S604, control unit determine the two-dimensional/three-dimensional vector rotation instruction and two-dimensional/three-dimensional vector rotation instruction
Operational order before whether there is incidence relation, such as there are incidence relation, two-dimensional/three-dimensional vector rotation instruction is deposited into slow
Memory cell, is such as not present associate management, which is transmitted to arithmetic element.
Step S605, the data in scalar register of the arithmetic element according to corresponding to six operation domains from storage (preferably
Scratchpad or scalar register heap) in take out the vector data (optional, also including scalar data) needed,
Then vector rotation operation is completed in arithmetic element.
Step S606, after the completion of arithmetic element computing, write the result into memory (preferable scratchpad or
Scalar register heap) specified address, reorder caching in the two-dimensional/three-dimensional vector rotation instruction be submitted.
Optionally, in above-mentioned steps S605 when arithmetic element performs vector rotation operation, the computing device can use
Nonlinear operator, which performs, mends 1 operation and spin matrix structure operation, then performs square using matrix-vector multiplication arithmetic unit
Battle array multiplies vector and corresponding rotating vector is calculated.
In the specific implementation, after decoding module decodes the two-dimensional/three-dimensional vector rotation instruction, according to caused by decoding
Control signal, by the vector input acquired in S603 to via the selected nonlinear operation of first order pipelining-stage multiple selector
Device performs the operation of benefit 1 and obtains the first result (treating rotating vector);At the same time by the pivot and rotation angle acquired in S603
Input builds to obtain the to performing spin matrix via the selected nonlinear operator of multiple selector in the pipelining-stage of the second level
Two results (i.e. spin matrix), then input first result and second result into third level pipelining-stage via more
Matrix Multiplication vector operation is performed in the selected matrix-vector multiplication arithmetic unit of road selector and obtains the 3rd result.Further,
The multiple selector of level Four pipelining-stage would know that the level is empty option according to control signal.Correspondingly, the 3rd result is made
To export as a result, being directly transferred to output terminal.
Operational order in above-mentioned Fig. 8 is by taking two-dimensional/three-dimensional vector rotation instruction as an example, in practical applications, as shown in Figure 8
Two-dimensional/three-dimensional vector rotation instruction in embodiment can use vector translation instruction, the instruction of vector scaling, vectorial shearing instruction etc.
Vector operation/operational order is replaced, and is not repeated one by one here.
The embodiment of the present application also provides a kind of computer-readable storage medium, wherein, computer-readable storage medium storage is used for electricity
The computer program that subdata exchanges, it is any as described in above-mentioned embodiment of the method which make it that computer is performed
Implementation section or Overall Steps.
The embodiment of the present application also provides a kind of computer program product, and the computer program product includes storing calculating
The non-transient computer-readable recording medium of machine program, the computer program are operable to make computer perform such as above-mentioned side
Any implementation section or Overall Steps described in method embodiment.
The embodiment of the present application additionally provides a kind of accelerator, including:Memory:It is stored with executable instruction;Processor:
For performing the executable instruction in storage unit, in execute instruction according to the embodiment described in above method embodiment into
Row operation.
Wherein, processor can be single processing unit, but can also include two or more processing units.In addition,
Processor can also include general processor (CPU) or graphics processor (GPU);It is additionally may included in field programmable logic
Gate array (FPGA) or application-specific integrated circuit (ASIC), to be configured to neutral net and computing.Processor can also wrap
Include the on-chip memory for caching purposes (i.e. including the memory in processing unit).
In some embodiments, a kind of chip is also disclosed, is used to perform above method embodiment institute that includes above-mentioned
Corresponding neural network processor.
In some embodiments, a kind of chip-packaging structure is disclosed, that includes said chip.
In some embodiments, a kind of board is disclosed, that includes said chip encapsulating structure.
In some embodiments, a kind of electronic equipment is disclosed, that includes above-mentioned board.
Electronic equipment include data processing equipment, robot, computer, printer, scanner, tablet computer, intelligent terminal,
Mobile phone, automobile data recorder, navigator, sensor, camera, server, cloud server, camera, video camera, projecting apparatus, hand
Table, earphone, mobile storage, wearable device, the vehicles, household electrical appliance, and/or Medical Devices.
The vehicles include aircraft, steamer and/or vehicle;The household electrical appliance include TV, air-conditioning, micro-wave oven,
Refrigerator, electric cooker, humidifier, washing machine, electric light, gas-cooker, kitchen ventilator;The Medical Devices include Nuclear Magnetic Resonance, B ultrasound instrument
And/or electrocardiograph.
It should be noted that for foregoing each method embodiment, in order to be briefly described, therefore it is all expressed as a series of
Combination of actions, but those skilled in the art should know, the application and from the limitation of described sequence of movement because
According to the application, some steps can use other orders or be carried out at the same time.Secondly, those skilled in the art should also know
Know, embodiment described in this description belongs to alternative embodiment, involved action and module not necessarily the application
It is necessary.
In the above-described embodiments, the description to each embodiment all emphasizes particularly on different fields, and does not have the portion being described in detail in some embodiment
Point, it may refer to the associated description of other embodiment.
In several embodiments provided herein, it should be understood that disclosed device, can be by another way
Realize.For example, device embodiment described above is only schematical, such as the division of the unit, it is only one kind
Division of logic function, can there is an other dividing mode when actually realizing, such as multiple units or component can combine or can
To be integrated into another system, or some features can be ignored, or not perform.Another, shown or discussed is mutual
Coupling, direct-coupling or communication connection can be by some interfaces, the INDIRECT COUPLING or communication connection of device or unit,
Can be electrical or other forms.
The unit illustrated as separating component may or may not be physically separate, be shown as unit
The component shown may or may not be physical location, you can with positioned at a place, or can also be distributed to multiple
In network unit.Some or all of unit therein can be selected to realize the mesh of this embodiment scheme according to the actual needs
's.
In addition, each functional unit in each embodiment of the application can be integrated in a processing unit, can also
That unit is individually physically present, can also two or more units integrate in a unit.Above-mentioned integrated list
Member can both be realized in the form of hardware, can also be realized in the form of software program module.
If the integrated unit is realized in the form of software program module and is used as independent production marketing or use
When, it can be stored in a computer-readable access to memory.Based on such understanding, the technical solution of the application substantially or
Person say the part to contribute to the prior art or the technical solution all or part can in the form of software product body
Reveal and, which is stored in a memory, including some instructions are used so that a computer equipment
(can be personal computer, server or network equipment etc.) performs all or part of each embodiment the method for the application
Step.And foregoing memory includes:USB flash disk, read-only storage (ROM, Read-Only Memory), random access memory
(RAM, Random Access Memory), mobile hard disk, magnetic disc or CD etc. are various can be with the medium of store program codes.
One of ordinary skill in the art will appreciate that all or part of step in the various methods of above-described embodiment is can
To instruct relevant hardware to complete by program, which can be stored in a computer-readable memory, memory
It can include:Flash disk, read-only storage (English:Read-Only Memory, referred to as:ROM), random access device (English:
Random Access Memory, referred to as:RAM), disk or CD etc..
The embodiment of the present application is described in detail above, specific case used herein to the principle of the application and
Embodiment is set forth, and the explanation of above example is only intended to help to understand the present processes and its core concept;
Meanwhile for those of ordinary skill in the art, according to the thought of the application, can in specific embodiments and applications
There is change part, in conclusion this specification content should not be construed as the limitation to the application.