CN109376112A - SLAM arithmetic unit and method - Google Patents
SLAM arithmetic unit and method Download PDFInfo
- Publication number
- CN109376112A CN109376112A CN201811521818.2A CN201811521818A CN109376112A CN 109376112 A CN109376112 A CN 109376112A CN 201811521818 A CN201811521818 A CN 201811521818A CN 109376112 A CN109376112 A CN 109376112A
- Authority
- CN
- China
- Prior art keywords
- instruction
- data
- vector
- matrix
- scalar
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 63
- 239000013598 vector Substances 0.000 claims abstract description 102
- 239000011159 matrix material Substances 0.000 claims abstract description 92
- 238000003860 storage Methods 0.000 claims description 26
- 230000008569 process Effects 0.000 claims description 19
- 230000006870 function Effects 0.000 claims description 11
- 230000005540 biological transmission Effects 0.000 claims description 8
- 238000009825 accumulation Methods 0.000 claims description 7
- 238000000354 decomposition reaction Methods 0.000 claims description 5
- 238000001914 filtration Methods 0.000 claims description 4
- 238000011022 operating instruction Methods 0.000 claims description 3
- 238000010977 unit operation Methods 0.000 claims 2
- 238000005265 energy consumption Methods 0.000 abstract description 3
- 238000010586 diagram Methods 0.000 description 17
- 238000012545 processing Methods 0.000 description 13
- 238000013461 design Methods 0.000 description 8
- 230000001133 acceleration Effects 0.000 description 5
- 238000005457 optimization Methods 0.000 description 5
- 230000006399 behavior Effects 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000001186 cumulative effect Effects 0.000 description 2
- 238000012432 intermediate storage Methods 0.000 description 2
- 230000003252 repetitive effect Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 241000207199 Citrus Species 0.000 description 1
- 241001269238 Data Species 0.000 description 1
- 238000005481 NMR spectroscopy Methods 0.000 description 1
- 240000007594 Oryza sativa Species 0.000 description 1
- 235000007164 Oryza sativa Nutrition 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000004378 air conditioning Methods 0.000 description 1
- 230000003139 buffering effect Effects 0.000 description 1
- 235000020971 citrus fruits Nutrition 0.000 description 1
- 230000001934 delay Effects 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000003706 image smoothing Methods 0.000 description 1
- 238000002513 implantation Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 235000009566 rice Nutrition 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000002604 ultrasonography Methods 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/3001—Arithmetic instructions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
- G06F15/161—Computing infrastructure, e.g. computer clusters, blade chassis or hardware partitioning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30021—Compare instructions, e.g. Greater-Than, Equal-To, MINMAX
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30036—Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
A kind of device of SLAM hardware accelerator inputs memory module, for storing input data;Scalar operation unit executes the scalar operation in EKF Update step for executing the scalar operation in Compute True Data step;Vector operation unit executes the vector operations during EKF Predict step, executes the vector operation in the EKF Update step in Extended Kalman filter method for executing the vector operation in Compute True Data step;Matrix operation unit, for executing EKF Predict step during matrix operation, execute EKF Update step in matrix operation.Apparatus and method of the present invention can effectively according to different needs accelerate algorithm, have many advantages, such as that strong flexibility, configurable degree is high, arithmetic speed is fast, low in energy consumption.
Description
Technical field
The present invention relates to a kind of SLAM, (simultaneous Localization and Mapping is positioned immediately and is built
Figure) arithmetic unit and method, for being accelerated according to different demands to the operation of SLAM algorithm.
Background technique
Independent navigation is a base of mobile robot (such as unmanned ground and aerial carrier etc.) in unknown environment
This ability.Immediately determination work of the main position for completing robot of positioning of figure in map is positioned and built in SLAM task,
The main task for building figure is that robot establishes the map for corresponding to environment according to environment.In the feelings for lacking the initial map of location circumstances
This is just needed robot that can construct map in real time and is completed the positioning of itself using map under condition, completes this task institute
The SLAM algorithm needed generates therewith.However under the limited computing capability of mobile robot and stringent power consumption requirements accurately
Realize that SLAM algorithm is one of the greatest problem faced in reality.Firstly, SLAM algorithm because have real-time requirement thus
High arithmetic speed is needed to complete a large amount of operations of similar frame and interframe short time, and secondly SLAM algorithm is due to being moved
The limitation of robot has harsh requirement to power consumption, and last SLAM algorithm huge number arithmetic type is wider, therefore design
Accelerator needs support various types of SLAM algorithms.
In the prior art, a kind of mode for realizing SLAM algorithm is directly to carry out operation on general processor (CPU),
The disadvantages of this method first is that the operational performance of single general processor is lower, being unable to satisfy common SLAM operation real-time needs
It asks.And multiple general processors, when executing parallel, the intercommunication of general processor becomes performance bottleneck again.
Another kind realizes that the mode of SLAM algorithm is that operation is carried out on graphics processor (GPU), and this method is by making
General SIMD instruction is executed with general-purpose register and general stream processing unit to support above-mentioned algorithm.Although this method is special
Equipment for executing graph image operation, but due to the complexity of SLAM algorithm operation, this method can not be good
Support its subsequent arithmetic, that is, can not the integral operation to SLAM algorithm effectively accelerated.Meanwhile GPU on piece caching is too
Small, it is even more impossible to meet the operation demand of a large amount of SLAM algorithm.Further, since in practical application area, by CPU or GPU etc.
It is a relatively difficult thing on similar structure implantation to robot, so, even without a practical, flexibility
High dedicated SLAM hardware accelerator architecture.The device that we design is a satisfactory dedicated SLAM hardware accelerator,
The corresponding method that we devise this covering device, it can be designed to the hardware such as special chip, embedded chip, so as to
To be applied in the application such as robot, computer, mobile phone.
Summary of the invention
(1) technical problems to be solved
The object of the present invention is to provide a kind of device and method of SLAM hardware accelerator.
(2) technical solution
According to an aspect of the present invention, a kind of device of SLAM hardware accelerator is provided, comprising:
Storage section, for storing input data, interim operation result data, final operation result data, calculating process
Required instruction set and/or algorithm parameter data;
Arithmetic section is connect with the storage section, for completing the calculating to SLAM related algorithm and application;
Control section connects the storage section and arithmetic section, for controlling and coordinating storage section and arithmetic section.
Preferably, the storage section includes:
Input memory module: for storing inputoutput data;
Intermediate result memory module: for storing intermediate calculation results;
Final result memory module: for storing final operation result;
Instruct memory module: for instruction set needed for storing calculating process;And/or
Buffered memory module: the buffer-stored for data.
Preferably, the arithmetic section includes:
The acceleration arithmetic unit of the acceleration and processing SLAM operation that are designed for SLAM related algorithm and application;
SLAM related algorithm and application in include but cannot by it is described acceleration arithmetic unit complete other operations other
Arithmetic unit.
Preferably, the acceleration arithmetic unit includes vector operation unit and matrix operation unit.
Preferably, other described arithmetic units use but not by acceleration arithmetic unit for completing in algorithm and application
The operation of completion.
Preferably, the arithmetic section is realized by hardware circuit.
Preferably, each module and arithmetic section of the control section connection storage section, control section is by an elder generation
Into first dequeue and a control processor composition, fifo queue is used for storage control signal, and control processor is for taking
Pending control signal out after analyzing control logic, is controlled and is coordinated to storage section and arithmetic section.
Preferably, described instruction collection includes:
Operational order class is controlled, for choosing the control of pending operating instruction;
Data manipulation instruction class, for controlling the transmission of data;
Macro operational order class is operated for complete operation;
Multidimensional data operational order class, for controlling the arithmetic operation of multidimensional data;And/or
One-dimensional data operational order class, for controlling the arithmetic operation of one-dimensional data.
Preferably, the control operational order class includes referring to jump instruction and branch instruction, and jump instruction includes directly jumping
Turn instruction and indirect jump instruction, branch instruction includes conditional branch instructions.
Preferably, the macro operational order class includes convolution algorithm instruction or pond operational order.
Preferably, the operation that the multidimensional data operational order class is used to that arithmetic element to be required to execute multidimensional data, multidimensional
The operation of data includes the operation between multidimensional data and multidimensional data, operation between multidimensional data and one-dimensional vector data and more
Operation between dimension data and one-dimensional scalar data.
Preferably, the one-dimensional data operational order class, it is described for requiring arithmetic element to execute the operation of one-dimensional data
One-dimensional data includes one-dimensional vector and one-dimensional scalar.
Preferably, the operation of the one-dimensional vector data includes the operation between one-dimensional vector and one-dimensional vector, Yi Jiyi
Operation between dimensional vector and scalar.
Preferably, the operation of the one-dimensional scalar data includes the operation between scalar and scalar.
It preferably, further include assembler, in the process of running, selection to use the instruction type in instruction set.
According to another aspect of the present invention, the method for carrying out SLAM operation according to any description above device is also provided,
The operation of transport, the operation and program of data is controlled by control section in the instruction set by storage section, including:
Step 1: the input data of storage section is transported to arithmetic section;
Step 2: operation is executed according to the required instruction set of calculating process in arithmetic section;
Step 3: transmitting and saves operation result data;
Step 4: repeating the above process until operation finishes.
(3) beneficial effect
The device and method of SLAM hardware accelerator provided by the invention can effectively according to different needs calculate SLAM
Method is accelerated, and can be suitable for various SLAM algorithms and a variety of different input data types, be met the operation of different demands,
Have many advantages, such as that strong flexibility, configurable degree is high, arithmetic speed is fast, low in energy consumption.
Apparatus and method of the present invention has the effect that compared with prior art
1) arithmetic section can carry out operation according to data of the different demands to different input types;
2) arithmetic section can also realize a degree of shared of data by buffered memory module, reduce data
Reuse distance;
3) various basic action types are supported in the design instructed, so that the configurability of device is very high;
4) design of matrix and vector operation unit cooperates the design of scalar operation unit that can support various types of again
Operation, and significant accelerate arithmetic speed;
5) power consumption when arrangement of design and the instruction of arithmetic section and storage section significantly reduces execution.
Detailed description of the invention
Fig. 1 is the structural schematic diagram of the device for the SLAM hardware accelerator that one embodiment of the invention provides.
Fig. 2 is the structural schematic diagram for the SLAM hardware accelerator that further embodiment of this invention provides.
Fig. 3 is the knot of one embodiment of the scalar operation unit for the SLAM hardware accelerator that one embodiment of the invention provides
Structure schematic diagram.
Fig. 4 is the knot of one embodiment of the vector operation unit for the SLAM hardware accelerator that one embodiment of the invention provides
Structure schematic diagram.
Fig. 5 is the knot of one embodiment of the matrix operation unit for the SLAM hardware accelerator that one embodiment of the invention provides
Structure schematic diagram.
Fig. 6 is the reality that the SLAM hardware accelerator that one embodiment of the invention provides completes three-dimensional coordinate L2 norm operation
Apply the schematic diagram of example.
Fig. 7 is one that the SLAM hardware accelerator that one embodiment of the invention provides completes 16 dimension square matrix matrix multiplication operations
The schematic diagram of embodiment.
Fig. 8 is the algorithm for the SLAM based on Extended Kalman filter method (EKF) that one embodiment of the invention provides at this
The schematic diagram of realization is configured on device.
Fig. 9 is the instruction type schematic diagram that one embodiment of the invention provides.
Figure 10 is a kind of application schematic diagram for macro operational order that one embodiment of the invention provides.
Figure 11 is a kind of one embodiment for one-dimensional data operational order that one embodiment of the invention provides.
Figure 12 is that a kind of SIFT feature extraction algorithm that one embodiment of the invention provides configures showing for realization in the present apparatus
It is intended to.
Figure 13 is that a kind of of one embodiment of the invention offer is configured in fact based on the figure optimization algorithm of G2O frame in the present apparatus
Existing schematic diagram.
Figure 14 is a kind of execution flow chart for convolution algorithm instruction that one embodiment of the invention provides.
Figure 15 is a kind of execution flow chart for image accumulated instruction that one embodiment of the invention provides.
Figure 16 is a kind of execution flow chart for filtering operation instruction that one embodiment of the invention provides.
Figure 17 is a kind of execution flow chart for local extremum instruction that one embodiment of the invention provides.
Figure 18 is a kind of execution flow chart for two-dimensional convolution arithmetic operation that one embodiment of the invention provides.
Figure 19 is a kind of execution flow chart for one-dimensional vector dot-product operation that one embodiment of the invention provides.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with specific embodiment, and reference
Attached drawing, the present invention is described in more detail.
Fig. 1 is the structural schematic diagram of the device for the SLAM hardware accelerator that one embodiment of the invention provides.As shown in Figure 1,
This accelerator is broadly divided into three parts, control section, arithmetic section and storage section.Control section is to arithmetic section and storage
Part issues control signal, to control the operation of the two, coordinates the data transmission between the two.Storage section is for storing dependency number
According to, including input data, intermediate result, final result, instruction, caching etc., it can be different according to demand, to specific storage
Data content, storage organization mode and access method of calling carry out different planning.Arithmetic section includes a variety of arithmetic units, is used for
The operation of data, one or more combinations including scalar operation unit, vector operation unit and matrix operation unit,
In, arithmetic unit can carry out operation according to data of the different demands to different input types.Arithmetic section can also pass through buffering
Memory module realizes a degree of shared of data, reduces the reuse distance of data.
Fig. 2 is the structural schematic diagram of the device of the SLAM hardware accelerator of another embodiment of the invention.As shown in Fig. 2,
The present embodiment is the calculating process for being required to accelerate the SLAM algorithm based on image, reduces data exchange, saves memory space.
Therefore the structure of the present apparatus is, control section connects each module and arithmetic section of storage section, by a fifo queue
It is formed with a control processor, fifo queue is used for storage control signal, and control processor is pending for taking out
Signal is controlled, after analyzing control logic, storage section and arithmetic section is controlled and is coordinated.Storage section is divided into
Four modules, input memory module, output memory module, intermediate result memory module and cache module.Arithmetic section is mainly used
In operation, the operation of images match and the operation of image optimization of operation, the cloud atlas building for accelerating image processing section, therefore operation
Unit is also broadly divided into three modules, scalar operation module, vector operation module and matrix operation module, and three modules can adopt
Being executed with the mode of assembly line can also be performed in parallel.
Fig. 3 is one embodiment of the present of invention, and the device for describing a kind of scalar operation unit that can be used for the present apparatus shows
It is intended to, SPE therein indicates individual scalar operation unit.Scalar operation unit is mainly used for solving to use in SLAM algorithm
Come the operation of the arithmetic section and some of complex that accelerate such as trigonometric function operation etc., while it also can solve memory access consistency and asking
Topic, is one of important composition of accelerator.The directly related memory module of scalar operation unit is intermediate result memory module
And buffered memory module.The operand that scalar operation needs can be in intermediate result memory module, can also be in buffer-stored
In module.The result of scalar operation can be stored in intermediate result memory module, can also be output in buffer module, be depended on
In actual needs.
Fig. 4 is one embodiment of the present of invention, and the device for describing a kind of vector operation unit that can be used for the present apparatus shows
It is intended to, entire vector operation unit is made of multiple basic processing units, and VPE is the basic processing unit of vector operation in figure.
Vector operation unit can be used for solving vector operation part and all operations with vector operation characteristic in SLAM algorithm
Part, such as the dot product of vector etc., also may be implemented that efficient data level is parallel and task-level parallelism.It is directly relevant to deposit
Storage module has intermediate result memory module and buffer module.Each basic unit of vector operation unit can be realized by configuring
It is performed in parallel same operation, can also realize different operations by configuring.Vector operation unit is directly related to be deposited
Storing up module is intermediate result memory module and buffered memory module.The operand that vector operation needs can be stored in intermediate result
It, can also be in buffered memory module in module.The result of vector operation can be stored in intermediate result memory module, can also
To be output in buffer module, actual needs is depended on.
Fig. 5 is another embodiment of the present invention, describes a kind of matrix operation cell arrangement signal that can be used for the present apparatus
Figure can satisfy wanting for the operation for accelerating all matrix operation type and arithmetic type similar with matrix operation type
It asks, the basic processing unit of MPE representing matrix arithmetic element therein.Matrix operation unit is by multiple basic processing unit structures
It is an arithmetic element array at the case where, diagram.There are many external data exchange modes of matrix operation unit, can be 2D
Switch mode be also possible to the switch mode of 1D.Arithmetic element supports the data access patterns between internal element simultaneously, can
To greatly reduce the reuse distance of locality data, realization efficiently accelerates.The storage directly related with matrix operation unit
Module is intermediate result memory module and buffered memory module.The operand that matrix operation needs can store mould in intermediate result
It, can also be in buffered memory module in block.The result of matrix operation can be stored in intermediate result memory module, can also be with
It is output in buffer module, depends on actual needs.
Fig. 6 is one embodiment of the present of invention, describes a kind of stream that three-dimensional coordinate L2 norm operation is carried out with the present apparatus
Cheng Tu.It is assumed that three data of three-dimensional coordinate are stored in intermediate storage module, pass through configuration-direct first from intermediate storage mould
Block takes out operand and is separately input on three basic processing unit VPE of vector operation unit, and each VPE of three VPE is executed
Operation be multiplying, two operands of multiplication be the coordinate taken out certain number and itself, the result meeting of multiplying
It is input in scalar operation unit again by buffered memory module, three multiplication results is completed in scalar operation unit
Then sum operation executes extracting operation.Last operation result is output to intermediate result memory module as needed or delays
It rushes in memory module.
Fig. 7 is one embodiment of the present of invention, describes a kind of one kind of present apparatus progress N-dimensional square matrix matrix multiplication operation
The flow chart of possible arrangement.Such as the situation for N=16, it is assumed that the multiplication that complete matrix A and matrix B operates to obtain matrix
C, the basic processing unit number in figure in matrix operation unit are 256, and each arithmetic element is responsible for meter in calculating process
Final result data are calculated, matrix data needed for operation is stored in intermediate result memory module.Operation starts first to tie from centre
Buffered memory module is counted in each operation that fruit memory module takes out A, and data are sequentially input to by buffered memory module according to row
In each basic processing unit MPE of matrix operation unit, the operand of B matrix also can equally be fetched into buffered memory module
In, it is input in each PE step by step by instruction scheduling according to column sequence.The number of each A in PE and the number of B can complete multiplication behaviour
Make, the result that each PE completes after multiplication operation every time will not be sent out, but tired with last time result in the register of PE is stored in
Add, the result that each PE is saved in this way after the number of all B is input to PE is exactly each position of finally obtained C matrix
Number.The data of last C are stored in intermediate result memory module as needed or stay in buffered memory module.
Fig. 8 is one embodiment of the present of invention, describes a kind of present apparatus and carries out based on Extended Kalman filter method
(EKF) configuration of the algorithm of SLAM and operation schematic diagram.EKF algorithm can be divided into three big steps substantially, be Compute respectively
True Data, EKF Predict (prediction) and EKF Update (update).In Compute True Data, pass through movement
Model obtains true coordinate.The predicted value and control input that the pose of new robot passes through last time in EKF Predict
The robot predicting of update.The related information with ambient enviroment reference point is calculated in EKF Update, updates the position predicted
Appearance and covariance matrix.The operation related generally in Compute True Data is that the Vector Processing operation of low-dimensional is sat as three-dimensional
Target Euclidean distance operation etc., therefore most of operation can be used vector operation unit and carry out operation, wherein also relating to angle
The typical scalar operation such as the trigonometric function operation of degree, therefore be also required to carry out a small amount of operation on scalar operation unit.EKF
Matrix manipulation such as matrix multiplication etc. repeatedly fairly large involved in this step of Predict, preferable in order to obtain to accelerate, this portion
Divide operation that can be placed on matrix operation unit to execute, while also therefore some lesser vector operations are also required to vector operation list
Member plays a role.This step operation type of EKF Update is more, and various operations alternate, such as have typical square
The operations such as battle array SVD (singular value decomposition, singular value decomposition) is decomposed, cholesky is decomposed, these behaviour
Work is made of tiny operations such as matrix multiplication, vector plus-minus, vector norm, trigonometric functions, while using matrix operation list
Member, vector operation unit and scalar operation unit.From memory module, the input of the SLAM algorithm based on EKF is
The coordinate of the points such as waypoints (path point) and landmarks (environment reference point), data volume is little, therefore only needs first
These data are loaded into from input memory module when the beginning.In intermediate calculating process, under normal circumstances due to the design of storage
Data volume does not exceed the size of intermediate result memory module, therefore does not need have frequent data with input memory module generally
Exchange, reduces energy consumption and runing time.Calculated result is output to output memory module in output by last SLAM algorithm, complete
At the hardware configuration and realization of entire algorithm.
Fig. 9 is instruction type schematic diagram provided by the invention.
Instruction set of the present invention includes control operational order class, data manipulation instruction class, macro operational order class, multidimensional data fortune
Calculate the multiple types such as instruction class, one-dimensional data operational order class.Every kind of instruction class can be subdivided into a variety of different instructions again, and every kind
Instruction is distinguished with the instruction encoding started, as shown in figure 9, having selected several representative fingers in every kind of instruction class
It enables and its encodes and listed.
Operational order class is controlled, is mainly used for controlling the operation of program.Instruction encoding is that JUMP indicates jump instruction, is used for
Execute turn function.According to the difference of subsequent operation code, direct jump instruction and indirect jump instruction can be divided into.Instruction is compiled
Code is that CB indicates conditional jump instructions, is used for execution of conditional jump function.
Data manipulation instruction class is mainly used for controlling the transmission of data.Instruction encoding is that LD/ST indicates to be used for DRAM
(Dynamic Random Access Memory, dynamic random access memory) and SRAM (Static Random Access
Memory, static random access memory) in transmit data, i.e. LD indicates data and to be loaded into SRAM from reading in DRAM,
Data in SRAM are transmitted in DRAM and are stored by ST expression.Instruction encoding is that number is transmitted in MOV expression between SRAM
According to.Instruction encoding is that RD/WR indicates that, for transmitting data between SRAM and BUFFER (buffer), wherein RD is indicated from SRAM
Data are read to BUFFER, the data in BUFFER are stored back into SRAM by WR expression.Macro operational order class, as coarseness
Data operation operational order is operated for opposite complete operation.
Instruction encoding is that CONV indicates convolution algorithm instruction, for realizing convolution and class convolution algorithm, i.e., the number of input
According to being multiplied and sum with corresponding weight respectively, and the local reusability of data, specific implementation procedure are considered in the instruction
For such as Figure 14:
S1 takes out image data according to the requirement of instruction, from weight data since the initial address of image data
Beginning address starts to take out weight data.
Image data is required to be transmitted in corresponding multidimensional operation unit, by weight data by S2 according to corresponding operation
The each operational element (PE) being broadcast in multidimensional operation unit.
The image data of input is multiplied by S3, each PE with corresponding weight data, and with the deposit inside arithmetic element
Data phase adduction in device is stored back into register (register need to be initialized as 0).
S4, for existing in the image data in multidimensional operation unit according to transmission rule as defined in multidimensional operation unit
It is transmitted inside multidimensional operation unit, the image data not in multidimensional operation unit is read and transmitted from BUFFER
To specified work location.The reusability of data when this method is utilized convolution algorithm, to greatly reduce data
Carry number.
S5 repeats step S3-S4, finishes until the PE is calculated, and result is exported to the destination stored as defined in instruction
It is saved in location.
S6 re-reads data and repeats aforesaid operations, until all pixels point in output image is all calculated and saved
It finishes, instruction terminates.
Instruction encoding is that POOL indicates pond operational order, for realizing Chi Huajileichiization operation, i.e., to defined amount
Data averaged or seek maximum/small value or carry out down-sampled operation, specific implementation flow and convolution algorithm instruction
It is similar.
Instruction encoding is that IMGACC indicates image accumulated instruction, for completing the processing of image and carrying out cumulative or similar
Calculation function.Its specific implementation procedure is as follows, such as Figure 15:
S1 requires to read image data since the initial address of image data according to instruction, and by multidimensional operation unit
In all operational elements (PE) be initialized as 0.
Former data in multidimensional operation unit are transmitted a line by S2, each clock cycle upwards in turn, and backward multidimensional is transported
The data that transmitting a line is new in unit are calculated, and a line of new incoming and the respective column of the data of former last line are added up,
Data of the accumulation result as new last line.Repetitive operation, until filling up multidimensional operation unit.
Data in multidimensional operation unit are successively transmitted and are added up to the right by S3, each clock cycle, i.e. first clock
Period transmits to the right the first column data, and secondary series adds the data transmitted from first row, and saves.Second clock cycle,
Second column data transmits to the right, and the data phase adduction that third column data is transmitted with secondary series saves, and so on.Finally obtain institute
The integral accumulation result of the image needed.
S4, save multidimensional operation unit in all data to instruction designated destination location, and to bottom line with
Most one column data of the right side is cached.
Multidimensional operation data initialization is 0, re-starts operation next time by S5, until all images calculating finishes.Its
In, it should be noted that in subsequent arithmetic, when the width or length of image are more than the single treatment size of multidimensional operation unit
When, need to add up the upper data cached when non-operation for the first time, to guarantee the correct of operation result.
Instruction encoding is that BOX indicates a kind of filter command, for completing the box filtering operation of image.The operation of the algorithm
Process is, for the sum of the local matrix for acquiring image, to initially set up an array A, wide height is equal with original image, then to this
A array assignment, in the rectangle that the value A [i] of each element is assigned to the point and image origin is constituted all pixels and, then
After acquiring local matrix, it is only necessary to can be completed by the plus-minus operation of 4 elements of A matrix.Therefore the macro-instruction is mainly divided
It is operated for two steps, such as Figure 16:
S1 reads required data from initial address according to instruction, is passed in multidimensional operation unit to incoming data successively
It adds up, and is stored in defined destination address 1.
S2 reads data from destination address 1, carries out plus-minus operation to data, filtered according to data needed for instruction
Wave is as a result, be saved in destination address 2, as required final result.
It due to during data accumulation, is similar to convolution algorithm and instructs, data have a local reusability, therefore the instruction branch
It holds and data is transmitted inside multidimensional operation unit.
Instruction encoding is that LOCALEXTERMA indicates local extremum instruction, judges local extremum when for completing and handle image
Operation, that is, judge whether the data of designated position are extreme values in this group of data.Specifically, the macro-instruction is broadly divided into two
Step operation, such as Figure 17:
Register value in each PE in multidimensional operation unit is initialized as sufficiently small/big value, from number by S1
It reads data according to initial address to be passed in multidimensional operation unit, then each PE in incoming data and register to saving
Data are compared operation, obtain larger/small value and save back in register, until regulation data relatively finish.In i.e. each PE
Maximum/small value of specified data flow is obtained.
S2, according to instruction, the data for reading designated position are passed in multidimensional data arithmetic element again, and each PE compares biography
Whether identical as the maximum/small value saved in register enter the data in the PE, identical output 1, difference output 0.
Instruction encoding is that operation is compared in COUNTCMP expression, for completing the operation compared using counter, i.e., reading to
Compare data and threshold value is transmitted in multidimensional operation unit, each PE is successively compared and counts with threshold value to incoming data flow
Number is more than or less than the number of the data of the threshold value wait traverse incoming data, output.
Multidimensional data operational order class is mainly used for controlling multidimensional data as one of fine granularity arithmetic operation instruction
Arithmetic operation.Multidimensional data include two dimension and two dimension more than data, wherein comprising multidimensional data respectively with multidimensional data, one
The operational order of the progress such as dimensional vector data, one-dimensional scalar data.By taking matrix as an example, MMmM is that matrix and multiplication of matrices are transported
Instruction is calculated, the one kind for the operational order that multidimensional data and multidimensional data carry out, similar also MMaM, i.e. matrix and matrix are belonged to
Add operation instruction;MMmV is the multiplying instruction of matrix and one-dimensional vector, belongs to multidimensional data and one-dimensional vector data
One kind of the operational order of progress, similar also MMaV, i.e. the add operation instruction of matrix and one-dimensional vector;MMmS is square
Battle array and the multiplying of one-dimensional scalar instruct, and belong to the one kind for the operational order that multidimensional data and one-dimensional scalar data carry out, class
As there are also MMaS, i.e., the add operation of matrix and one-dimensional scalar instructs.In addition to this, multidimensional data operational order class can also
Operation between compatible one-dimensional data, such as MVmV are accomplished that the multiplying of one-dimensional vector and one-dimensional vector instructs, MMoV
Realize the apposition operational order of one-dimensional vector and one-dimensional vector.
One-dimensional data operational order class is mainly used for controlling one-dimensional data as one of fine granularity arithmetic operation instruction class
Arithmetic operation, wherein one-dimensional data is broadly divided into one-dimensional vector data and two kinds of one-dimensional scalar data again.For example, VVmV, is
The multiplying of one-dimensional vector and one-dimensional vector instructs, similar VVaV, indicates the add operation of one-dimensional vector and one-dimensional vector
Instruction.VVmS is the multiplying instruction between one-dimensional vector and one-dimensional scalar.SSsS indicates the instruction of one-dimensional scalar operation,
For completing to seek the extracting operation of the one-dimensional scalar.SSrS indicates the operation for seeking random number.MV is that moving operation refers to
It enables, for taking register or immediate in calculating process.
Figure 10 is that a kind of macro-instruction operation CONV provided by the invention completes a two-dimensional convolution on a kind of hardware configuration
The embodiment of arithmetic operation.The calculating process of two-dimensional convolution is that, for a two-dimensional input image, have a convolution kernel inputting
Slided on image, each convolution kernel is filtered the data for the 2-D data image that current location covers, i.e., convolution kernel and by
The image data of covering carries out contraposition multiplication, and then the result after fragrant citrus adds up, and remembers required filter result.And
Afterwards, convolution kernel slides into the next position, and repetitive operation is completed until whole operations.Since convolution operation is using very extensive,
And largely occur, so the convolution operation of this patent design can make full use of the data reusability on hardware configuration, it will
Data are reasonably distributed and are transmitted, and the utilization rate of hardware is increased to maximum.To reinforce explanation, it is accompanied by a specific implementation
Example, as shown in Figure 10.In the present embodiment, definition input is that perhaps matrix output is also an image or square to an image
Battle array, is stored in specified position in the form of piecemeal.Hardware configuration is by taking a matrix manipulation unit (MPU) as an example, the operation
It include m*n matrix operation component (MPE) in unit, each arithmetic unit has included required arithmetic unit and has been used in temporary
Between data register.Such as Figure 18 concrete operation process are as follows:
S1 reads the macro-instruction of a convolution operation, by operation coding and groups of operands at.Instruction operation is encoded to
CONV, indicate to carry out is convolution algorithm.Operand shares 7, respectively DA, SA1, SA2, IX, IY, KX, KY, wherein DA
It indicates destination address, that is, exports the storage address of result;SA1 is initial address 1, indicates the starting point for reading the image to operation
Location;SA2 is initial address 2, indicates the initial address for reading the convolution kernel to operation;IX and IY respectively indicates image X-direction and Y
Size on direction, the i.e. size by the two variable-definitions to the image of operation;KX and KY respectively indicates the big of convolution kernel
It is small.
S2 waits operation input image data from the corresponding position read in BUFFER in SRAM according to instruction,
Require each of MPU MPE to calculate a pixel of output image here.
S3 transmits the image data inputted accordingly into each MPE.Due to the convolution nuclear phase in each MPE when operation
Together, therefore convolution kernel is broadcast to each MPE by the way of broadcast.Then input data and correspondence that each MPE will be passed to
Convolution Nuclear Data be multiplied, be then saved among the register of respective MPE.
S4, since the operational data of convolution operation has a local reusability, therefore input image data of next bat to operation
The MPE on as the right currently claps the data for carrying out operation, therefore input image data is successively transmitted to the left, the MPE institute of rightmost
The data needed need to read from BUFFER again not in MPU.Pending data end of transmission, each MPE is by input image data
It is multiplied with corresponding convolution Nuclear Data, and resulting product and the data in the register of the MPE is added up, deposit again
Enter in register.
S5 repeats step S4, until all convolution Nuclear Datas and corresponding input image data operation finish to get arriving
Each MPE has obtained 1 pixel of output image, and result is exported and is saved in the position that destination address defines in instruction.
S6 repeats the above steps, until the calculating of all pixels point finishes in output image.
The local reusability that data can be made full use of using macro-instruction is greatly reduced data and carries number, improved
Operation efficiency.For example, work as m=3, when n=3, which can carry out the convolution algorithm of 9 pixels simultaneously, when 9 time-consuming
Clock cycle.
Similar.We provide the operation of a large amount of macro-instruction, such as convolution, although its operation completed can have other classes
The instruction of type, which operates, to be completed, but due to the presence of macro-instruction operation, enables to operational order more succinct efficient.In addition,
Macro-instruction can be good at handling the reuse problem of data, can be improved the utilization rate of data, reduce the transmission of data, reduce function
Consumption improves performance.
Figure 11 is a kind of one embodiment of multidimensional data operational order provided by the invention, realizes one-dimensional vector and one
Dot-product operation between dimensional vector, similar, the operations such as relatively of vector multiplication, vectorial addition, vector are all using similar operation
Process.Each vector operation unit (VPU) includes mm vector operation component (VPE), and each VPE can complete a pair of of input number
According to operation.The mm data for treating operation are inputed to mm VPE first respectively, held respectively by detailed operation process such as Figure 19
It after multiplication of row, is stored in the register inside VPE, while inputting mm and treat the data of operation and inputing to mm respectively
VPE after executing a multiplication respectively, product is added up with the last product kept in internal register, is added up
As a result it is fed again into internal register and keeps in.Aforesaid operations are repeated until all inputs have all been finished by calculating.It then will be to
Data in register are directly passed to its left side by the result of amount arithmetic element left biography, VPE of right end since most right section
VPE carried out with the data in oneself internal register tired after the VPE on its left side receives the data transmitted from the right
After adding, accumulation result is continued into left biography, and so on.Finally, dot-product operation result will obtain in the VPE of left end, press
It is required that output.
Figure 12 is one embodiment provided by the invention, describes the configuration in SIFT feature extraction algorithm in the present apparatus
The process of realization.SIFT (Scale-invariant feature transform) feature extraction algorithm is RGBD SLAM algorithm
One of key operation.The first step is to establish image pyramid operation Gaussian Pyramid, contains the bases such as image smoothing
This image operation, can be further broken into multiple convolution (convolution) and pooling (down-sampled) in the present apparatus
Operation.Followed by the operation of difference of Gaussian DOG, this operation can regard as image pyramid tower different sides it
Between do matrix subtraction operation.Once DOG operation is completed, the operation of local extremum search can pass through call macroinstruction LOCAL
EXTREMA is completed.The determination of characteristic point is carried out after search local extremum, characteristic point filters (KP filter), this single stepping
It is made of a large amount of vector sum scalar operation, such as vector dot, matrix determinant etc..Finally by multiple vector sum scalars
It is sub (Key Point) to calculate the description of key point that arithmetic operation calculates the histogram of neighbor point.Wherein calculate histogram behaviour
Work can be completed by macro-instruction HIST, which relatively waits vector operations to operate and form by vector.The rotation in neighborhood pixels region
Turn the multiplication of operation matrix-vector to realize.Certain special function operation such as exponential etc. are main to be transported by scalar
Unit is calculated to realize.
Figure 13 is one embodiment provided by the invention, describes the configuration in the present apparatus and realizes G2O figure optimization algorithm
Schematic flow diagram.G2O is the frame for solving non-linear figure optimization problem, many typical SLAM algorithm such as RGBD SLAM
It is all based on the frame with the SLAM algorithm based on drawing method such as ORB SLAM.The pose constraint of given two node of graph
With initial pose, the operation of error matrix and Jacobian matrix can be operated by matrix operation operation and vector operation come complete
At, such as multiplication of matrices and accumulation operations etc..Then mesh can be optimized by establishing one by error matrix and Jacobian matrix
The linear system of scalar functions, this step can be completed by matrix and vector operation unit, wherein being also related to includes Matrix Multiplication
Method and cumulative etc. operates.Then this linear system is solved, Preconditioned Conjugate can be used in we
Gradient (PCG) algorithm come realize (we can also by cholesky decompose method or sparse matrix method or on
Triangle decomposition method is realized).PCG operation can be broken down into the matrix of piecemeal and the multiplication of vector and add operation, specifically
It can be realized by macro-instruction PCG when realization.The optimization operation of last pose can also by the multiplication of matrix and vector with
The operations such as addition are completed.
The device and method of the embodiment of the present invention can be applied in following (including but not limited to) scene: data processing,
Robot, unmanned plane, automatic Pilot, computer, printer, scanner, phone, tablet computer, intelligent terminal, mobile phone, driving note
Record instrument, navigator, sensor, camera, cloud server, camera, video camera, projector, wrist-watch, earphone, mobile storage, can
Each electronic product such as wearable device;All kinds of vehicles such as aircraft, steamer, vehicle;TV, air-conditioning, micro-wave oven, refrigerator, electricity
All kinds of household electrical appliance such as rice cooker, humidifier, washing machine, electric light, gas-cooker, kitchen ventilator;And including Nuclear Magnetic Resonance, B ultrasound, the heart
All kinds of Medical Devices such as electrograph instrument.
Particular embodiments described above has carried out further in detail the purpose of the present invention, technical scheme and beneficial effects
It describes in detail bright, it should be understood that the above is only a specific embodiment of the present invention, is not intended to restrict the invention, it is all
Within the spirit and principles in the present invention, any modification, equivalent substitution, improvement and etc. done should be included in guarantor of the invention
Within the scope of shield.
Claims (19)
1. a kind of arithmetic unit, which is characterized in that described device includes:
Scalar operation unit, for executing the scalar in the Compute True Data step in Extended Kalman filter method
Operation executes the scalar operation in the EKF Update step in Extended Kalman filter method;
Vector operation unit executes the EKF for executing the vector operation in the Compute True Data step
Vector operations during Predict step, execute Extended Kalman filter method in EKF Update step in
Measure operation;
Matrix operation unit, for executing the EKF Predict step during matrix operation, execute the EKF
Matrix operation in Update step.
2. the apparatus according to claim 1, which is characterized in that the scalar fortune in the Compute True Data step
It calculates and includes at least trigonometric function operation;
Vector operation in the Compute True Data step includes at least the Euclidean distance operation of three-dimensional coordinate;
Matrix operation during the EKF Predict step includes at least matrix multiplication operation;
Matrix operation in the EKF Update step includes Singular Value Decomposition Using operation, cholesky operation splitting.
3. arithmetic unit according to claim 1, which is characterized in that described device further include:
Memory module is inputted, for storing input data;
Intermediate result memory module, for storing the scalar operation unit, the vector operation unit, the matrix operation list
The operation result of member;
Buffered memory module, for caching the scalar operation unit, the vector operation unit, the matrix operation unit
Operation result;
Memory module is exported, for storing final operation result;
Final result memory module: for storing final operation result data;And/or
Instruct memory module: for instruction set needed for storing calculating process.
4. arithmetic unit according to claim 3, which is characterized in that described instruction collection includes:
Operational order class is controlled, for choosing the control of pending operating instruction;
Data manipulation instruction class, for controlling the transmission of data;
Macro operational order class is operated for complete operation;
Multidimensional data operational order class, for controlling the arithmetic operation of multidimensional data;And/or
One-dimensional data operational order class, for controlling the arithmetic operation of one-dimensional data.
5. arithmetic unit according to claim 4, which is characterized in that the control operational order class includes referring to jump instruction
And branch instruction, jump instruction include direct jump instruction and indirect jump instruction, branch instruction includes conditional branch instructions.
6. arithmetic unit according to claim 4, which is characterized in that the data manipulation instruction class includes following at least one
Kind:
LD/ST instruction, for transmitting data in DRAM and SRAM;
MOV instruction, for transmitting data between SRAM;
RD/WR instruction, indicates with being to transmit data between SRAM and BUFFER.
7. arithmetic unit according to claim 4, the macro operational order class includes: convolution algorithm instruction, convolution operation
Instruction, the instruction of image accumulation operations, the instruction of image BOX filtering operation, local extremum operational order, counter compare operational order
And/or pond operational order.
8. arithmetic unit according to claim 4, the macro operational order class is comprised at least one of the following:
Matrix is instructed with matrix multiplication, matrix and addition of matrices instruction, matrix and vector multiplication instruction, matrix and vectorial addition refer to
It enables, the instruction of matrix and scalar multiplication, matrix and scalar addition matrix, vector and vector multiplication instruction and vector and Outer Product of Vectors refer to
It enables.
9. arithmetic unit according to claim 4, the macro operational order class is comprised at least one of the following:
Vector is instructed with vector multiplication, vector and vectorial addition instruction, vector and scalar multiplication instruction, vector and scalar addition refer to
It enables, scalar extract instruction, scalar take stochastic instruction and move.
10. arithmetic unit according to claim 4, which is characterized in that the multidimensional data operational order class is for requiring
Arithmetic element executes the operation of multidimensional data, and the operation of multidimensional data includes the operation between multidimensional data and multidimensional data, multidimensional
The operation between operation and multidimensional data and one-dimensional scalar data between data and one-dimensional vector data.
11. arithmetic unit according to claim 4, which is characterized in that the one-dimensional data operational order class, for requiring
Arithmetic element executes the operation of one-dimensional data, and a data includes the one-dimensional scalar of a vector sum.
12. arithmetic unit according to claim 4 further includes assembler, in the process of running, selection to use instruction
The instruction type of concentration.
13. a kind of operation method, which is characterized in that the described method includes:
Scalar operation unit executes the scalar operation in the Compute True Data step in Extended Kalman filter method,
Execute the scalar operation in the EKF Update step in Extended Kalman filter method;
Vector operation unit executes the vector operation in the Compute True Data step, executes the EKF Predict
Vector operations during step execute the vector operation in the EKF Update step in Extended Kalman filter method;
Matrix operation unit executes the matrix operation during the EKF Predict step, executes the EKF Update
Matrix operation in step.
14. according to the method for claim 13, which is characterized in that the scalar in the Compute True Data step
Operation includes at least trigonometric function operation;
Vector operation in the Compute True Data step includes at least the Euclidean distance operation of three-dimensional coordinate;
Matrix operation during the EKF Predict step includes at least matrix multiplication operation;
Matrix operation in the EKF Update step includes Singular Value Decomposition Using operation, cholesky operation splitting.
15. according to the method for claim 13, which is characterized in that the method also includes:
Memory module is inputted, for storing input data;
Intermediate result memory module, for storing the scalar operation unit, the vector operation unit, the matrix operation list
The operation result of member;
Buffered memory module, for caching the scalar operation unit, the vector operation unit, the matrix operation unit
Operation result;
Memory module is exported, for storing final operation result;
Final result memory module stores final operation result data;And/or
Instruction set needed for instructing memory module storage calculating process.
16. according to the method for claim 15, which is characterized in that described instruction collection includes:
Operational order class is controlled, for choosing the control of pending operating instruction, the control operational order class includes referring to jump
Turn instruction and branch instruction, jump instruction include direct jump instruction and indirect jump instruction, branch instruction includes conditional branching
Instruction;
Data manipulation instruction class, for controlling the transmission of data;
Macro operational order class is operated for complete operation, and the macro operational order includes: convolution algorithm instruction, convolution operation
Instruction, the instruction of image accumulation operations, the instruction of image BOX filtering operation, local extremum operational order, counter compare operational order
And/or pond operational order;
Alternatively, the macro operational order class comprises at least one of the following:
Matrix is instructed with matrix multiplication, matrix and addition of matrices instruction, matrix and vector multiplication instruction, matrix and vectorial addition refer to
It enables, the instruction of matrix and scalar multiplication, matrix and scalar addition matrix, vector and vector multiplication instruction and vector and Outer Product of Vectors refer to
It enables;
Alternatively, macro operational order class comprises at least one of the following:
Vector is instructed with vector multiplication, vector and vectorial addition instruction, vector and scalar multiplication instruction, vector and scalar addition refer to
It enables, scalar extract instruction, scalar take stochastic instruction and move;
Multidimensional data operational order class, for controlling the arithmetic operation of multidimensional data;And/or
One-dimensional data operational order class, for controlling the arithmetic operation of one-dimensional data, the one-dimensional data include: one-dimensional vector and
One-dimensional scalar.
17. according to the method for claim 16, which is characterized in that data manipulation instruction class comprises at least one of the following:
LD/ST instruction, for transmitting data in DRAM and SRAM;
MOV instruction, for transmitting data between SRAM;
RD/WR instruction, indicates with being to transmit data between SRAM and BUFFER.
18. according to the method for claim 16, which is characterized in that the multidimensional data operational order class is for requiring operation
Unit executes the operation of multidimensional data, and the operation of multidimensional data includes the operation between multidimensional data and multidimensional data, multidimensional data
The operation between operation and multidimensional data and one-dimensional scalar data between one-dimensional vector data.
19. according to the method for claim 16, which is characterized in that the method also includes:
In the process of running, selection uses the instruction type in instruction set to assembler.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811521818.2A CN109376112B (en) | 2016-11-03 | 2016-11-03 | SLAM operation device and method |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610958847.XA CN108021528B (en) | 2016-11-03 | 2016-11-03 | SLAM operation device and method |
CN201811521818.2A CN109376112B (en) | 2016-11-03 | 2016-11-03 | SLAM operation device and method |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610958847.XA Division CN108021528B (en) | 2016-11-03 | 2016-11-03 | SLAM operation device and method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109376112A true CN109376112A (en) | 2019-02-22 |
CN109376112B CN109376112B (en) | 2022-03-15 |
Family
ID=62075642
Family Applications (12)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811529556.4A Active CN109634904B (en) | 2016-11-03 | 2016-11-03 | SLAM operation device and method |
CN201811521820.XA Active CN109376114B (en) | 2016-11-03 | 2016-11-03 | SLAM operation device and method |
CN201811545672.5A Pending CN109710558A (en) | 2016-11-03 | 2016-11-03 | SLAM arithmetic unit and method |
CN201811653568.8A Active CN109684267B (en) | 2016-11-03 | 2016-11-03 | SLAM operation device and method |
CN201811521818.2A Active CN109376112B (en) | 2016-11-03 | 2016-11-03 | SLAM operation device and method |
CN201811521819.7A Active CN109376113B (en) | 2016-11-03 | 2016-11-03 | SLAM operation device and method |
CN201811653560.1A Active CN109726168B (en) | 2016-11-03 | 2016-11-03 | SLAM operation device and method |
CN201811653558.4A Active CN109656867B (en) | 2016-11-03 | 2016-11-03 | SLAM arithmetic device and method |
CN201811529500.9A Active CN109697184B (en) | 2016-11-03 | 2016-11-03 | SLAM operation device and method |
CN201811654180.XA Pending CN109710559A (en) | 2016-11-03 | 2016-11-03 | SLAM arithmetic unit and method |
CN201610958847.XA Active CN108021528B (en) | 2016-11-03 | 2016-11-03 | SLAM operation device and method |
CN201811529557.9A Active CN109634905B (en) | 2016-11-03 | 2016-11-03 | SLAM operation device and method |
Family Applications Before (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811529556.4A Active CN109634904B (en) | 2016-11-03 | 2016-11-03 | SLAM operation device and method |
CN201811521820.XA Active CN109376114B (en) | 2016-11-03 | 2016-11-03 | SLAM operation device and method |
CN201811545672.5A Pending CN109710558A (en) | 2016-11-03 | 2016-11-03 | SLAM arithmetic unit and method |
CN201811653568.8A Active CN109684267B (en) | 2016-11-03 | 2016-11-03 | SLAM operation device and method |
Family Applications After (7)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811521819.7A Active CN109376113B (en) | 2016-11-03 | 2016-11-03 | SLAM operation device and method |
CN201811653560.1A Active CN109726168B (en) | 2016-11-03 | 2016-11-03 | SLAM operation device and method |
CN201811653558.4A Active CN109656867B (en) | 2016-11-03 | 2016-11-03 | SLAM arithmetic device and method |
CN201811529500.9A Active CN109697184B (en) | 2016-11-03 | 2016-11-03 | SLAM operation device and method |
CN201811654180.XA Pending CN109710559A (en) | 2016-11-03 | 2016-11-03 | SLAM arithmetic unit and method |
CN201610958847.XA Active CN108021528B (en) | 2016-11-03 | 2016-11-03 | SLAM operation device and method |
CN201811529557.9A Active CN109634905B (en) | 2016-11-03 | 2016-11-03 | SLAM operation device and method |
Country Status (2)
Country | Link |
---|---|
CN (12) | CN109634904B (en) |
WO (1) | WO2018082229A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111290789A (en) * | 2018-12-06 | 2020-06-16 | 上海寒武纪信息科技有限公司 | Operation method, operation device, computer equipment and storage medium |
CN111290788A (en) * | 2018-12-07 | 2020-06-16 | 上海寒武纪信息科技有限公司 | Operation method, operation device, computer equipment and storage medium |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111079915B (en) * | 2018-10-19 | 2021-01-26 | 中科寒武纪科技股份有限公司 | Operation method, device and related product |
CN110058884B (en) * | 2019-03-15 | 2021-06-01 | 佛山市顺德区中山大学研究院 | Optimization method, system and storage medium for computational storage instruction set operation |
CN110991291B (en) * | 2019-11-26 | 2021-09-07 | 清华大学 | Image feature extraction method based on parallel computing |
CN113112481B (en) * | 2021-04-16 | 2023-11-17 | 北京理工雷科电子信息技术有限公司 | Hybrid heterogeneous on-chip architecture based on matrix network |
CN113177211A (en) * | 2021-04-20 | 2021-07-27 | 深圳致星科技有限公司 | FPGA chip for privacy computation, heterogeneous processing system and computing method |
CN113342671B (en) * | 2021-06-25 | 2023-06-02 | 海光信息技术股份有限公司 | Method, device, electronic equipment and medium for verifying operation module |
CN113395551A (en) * | 2021-07-20 | 2021-09-14 | 珠海极海半导体有限公司 | Processor, NPU chip and electronic equipment |
US20230056246A1 (en) * | 2021-08-03 | 2023-02-23 | Micron Technology, Inc. | Parallel matrix operations in a reconfigurable compute fabric |
CN113792867A (en) * | 2021-09-10 | 2021-12-14 | 中科寒武纪科技股份有限公司 | Arithmetic circuit, chip and board card |
CN117093816B (en) * | 2023-10-19 | 2024-01-19 | 上海登临科技有限公司 | Matrix multiplication operation method and device and electronic equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5926296A (en) * | 1996-02-28 | 1999-07-20 | Olympus Optical Co., Ltd. | Vector normalizing apparatus |
CN102750127A (en) * | 2012-06-12 | 2012-10-24 | 清华大学 | Coprocessor |
US20160258782A1 (en) * | 2015-02-04 | 2016-09-08 | Hossein Sadjadi | Methods and Apparatus for Improved Electromagnetic Tracking and Localization |
Family Cites Families (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS60201472A (en) * | 1984-03-26 | 1985-10-11 | Nec Corp | Matrix product computing device |
US5666300A (en) * | 1994-12-22 | 1997-09-09 | Motorola, Inc. | Power reduction in a data processing system using pipeline registers and method therefor |
US7454451B2 (en) * | 2003-04-23 | 2008-11-18 | Micron Technology, Inc. | Method for finding local extrema of a set of values for a parallel processing element |
US7664810B2 (en) * | 2004-05-14 | 2010-02-16 | Via Technologies, Inc. | Microprocessor apparatus and method for modular exponentiation |
US7814297B2 (en) * | 2005-07-26 | 2010-10-12 | Arm Limited | Algebraic single instruction multiple data processing |
US8051124B2 (en) * | 2007-07-19 | 2011-11-01 | Itt Manufacturing Enterprises, Inc. | High speed and efficient matrix multiplication hardware module |
CN101609715B (en) * | 2009-05-11 | 2012-09-05 | 中国人民解放军国防科学技术大学 | Matrix register file with separated row-column access ports |
KR101395260B1 (en) * | 2009-11-30 | 2014-05-15 | 라코르스 게엠바하 | Microprocessor and method for enhanced precision sum-of-products calculation on a microprocessor |
KR101206213B1 (en) * | 2010-04-19 | 2012-11-28 | 인하대학교 산학협력단 | High speed slam system and method based on graphic processing unit |
US9146315B2 (en) * | 2010-07-26 | 2015-09-29 | Commonwealth Scientific And Industrial Research Organisation | Three dimensional scanning beam system and method |
CN101986264B (en) * | 2010-11-25 | 2013-07-31 | 中国人民解放军国防科学技术大学 | Multifunctional floating-point multiply and add calculation device for single instruction multiple data (SIMD) vector microprocessor |
CN102012893B (en) * | 2010-11-25 | 2012-07-18 | 中国人民解放军国防科学技术大学 | Extensible vector operation device |
CN102156637A (en) * | 2011-05-04 | 2011-08-17 | 中国人民解放军国防科学技术大学 | Vector crossing multithread processing method and vector crossing multithread microprocessor |
CN102353379B (en) * | 2011-07-06 | 2013-02-13 | 上海海事大学 | Environment modeling method applicable to navigation of automatic piloting vehicles |
CN104204990B (en) * | 2012-03-30 | 2018-04-10 | 英特尔公司 | Accelerate the apparatus and method of operation in the processor using shared virtual memory |
US9013490B2 (en) * | 2012-05-17 | 2015-04-21 | The United States Of America As Represented By The Administrator Of The National Aeronautics Space Administration | Hilbert-huang transform data processing real-time system with 2-D capabilities |
CN103208000B (en) * | 2012-12-28 | 2015-10-21 | 青岛科技大学 | Based on the Feature Points Extraction of local extremum fast search |
CN103150596B (en) * | 2013-02-22 | 2015-12-23 | 百度在线网络技术(北京)有限公司 | The training system of a kind of reverse transmittance nerve network DNN |
CN104252331B (en) * | 2013-06-29 | 2018-03-06 | 华为技术有限公司 | Multiply-accumulator |
US9449675B2 (en) * | 2013-10-31 | 2016-09-20 | Micron Technology, Inc. | Apparatuses and methods for identifying an extremum value stored in an array of memory cells |
CN103640018B (en) * | 2013-12-13 | 2014-09-03 | 江苏久祥汽车电器集团有限公司 | SURF (speeded up robust feature) algorithm based localization method |
CN103677741A (en) * | 2013-12-30 | 2014-03-26 | 南京大学 | Imaging method based on NCS algorithm and mixing precision floating point coprocessor |
CN103955447B (en) * | 2014-04-28 | 2017-04-12 | 中国人民解放军国防科学技术大学 | FFT accelerator based on DSP chip |
CN105212922A (en) * | 2014-06-11 | 2016-01-06 | 吉林大学 | The method and system that R wave of electrocardiosignal detects automatically are realized towards FPGA |
CN105849690B (en) * | 2014-07-02 | 2019-03-15 | 上海兆芯集成电路有限公司 | Merge product-accumulating operation processor and method |
CN104317768B (en) * | 2014-10-15 | 2017-02-15 | 中国人民解放军国防科学技术大学 | Matrix multiplication accelerating method for CPU+DSP (Central Processing Unit + Digital Signal Processor) heterogeneous system |
CN104330090B (en) * | 2014-10-23 | 2017-06-06 | 北京化工大学 | Robot distributed sign intelligent semantic map creating method |
KR102374160B1 (en) * | 2014-11-14 | 2022-03-14 | 삼성디스플레이 주식회사 | A method and apparatus to reduce display lag using scailing |
CN104391820B (en) * | 2014-11-25 | 2017-06-23 | 清华大学 | General floating-point matrix processor hardware structure based on FPGA |
CN104574508A (en) * | 2015-01-14 | 2015-04-29 | 山东大学 | Multi-resolution model simplifying method oriented to virtual reality technology |
CN104851094A (en) * | 2015-05-14 | 2015-08-19 | 西安电子科技大学 | Improved method of RGB-D-based SLAM algorithm |
CN104915322B (en) * | 2015-06-09 | 2018-05-01 | 中国人民解放军国防科学技术大学 | A kind of hardware-accelerated method of convolutional neural networks |
CN104899182B (en) * | 2015-06-09 | 2017-10-31 | 中国人民解放军国防科学技术大学 | A kind of Matrix Multiplication accelerated method for supporting variable partitioned blocks |
CN105528082B (en) * | 2016-01-08 | 2018-11-06 | 北京暴风魔镜科技有限公司 | Three dimensions and gesture identification tracking exchange method, device and system |
-
2016
- 2016-11-03 CN CN201811529556.4A patent/CN109634904B/en active Active
- 2016-11-03 CN CN201811521820.XA patent/CN109376114B/en active Active
- 2016-11-03 CN CN201811545672.5A patent/CN109710558A/en active Pending
- 2016-11-03 CN CN201811653568.8A patent/CN109684267B/en active Active
- 2016-11-03 CN CN201811521818.2A patent/CN109376112B/en active Active
- 2016-11-03 CN CN201811521819.7A patent/CN109376113B/en active Active
- 2016-11-03 CN CN201811653560.1A patent/CN109726168B/en active Active
- 2016-11-03 CN CN201811653558.4A patent/CN109656867B/en active Active
- 2016-11-03 CN CN201811529500.9A patent/CN109697184B/en active Active
- 2016-11-03 CN CN201811654180.XA patent/CN109710559A/en active Pending
- 2016-11-03 CN CN201610958847.XA patent/CN108021528B/en active Active
- 2016-11-03 CN CN201811529557.9A patent/CN109634905B/en active Active
-
2017
- 2017-02-28 WO PCT/CN2017/075134 patent/WO2018082229A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5926296A (en) * | 1996-02-28 | 1999-07-20 | Olympus Optical Co., Ltd. | Vector normalizing apparatus |
CN102750127A (en) * | 2012-06-12 | 2012-10-24 | 清华大学 | Coprocessor |
US20160258782A1 (en) * | 2015-02-04 | 2016-09-08 | Hossein Sadjadi | Methods and Apparatus for Improved Electromagnetic Tracking and Localization |
Non-Patent Citations (3)
Title |
---|
DANIEL TÖRTEI TERTEI等: "FPGA design of EKF block accelerator for 3D visual SLAM", 《COMPUTER AND ELECTRICAL ENGINEERING》 * |
MOHD. YAMANI IDNAIDRIS 等: "A co-processor design to accelerate sequential monocular SLAM EKF process", 《MEASUREMENT》 * |
罗天洪 等: "扩展卡尔曼滤波和配准算法的工业机器人误差补偿", 《机械科学与技术》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111290789A (en) * | 2018-12-06 | 2020-06-16 | 上海寒武纪信息科技有限公司 | Operation method, operation device, computer equipment and storage medium |
CN111290789B (en) * | 2018-12-06 | 2022-05-27 | 上海寒武纪信息科技有限公司 | Operation method, operation device, computer equipment and storage medium |
CN111290788A (en) * | 2018-12-07 | 2020-06-16 | 上海寒武纪信息科技有限公司 | Operation method, operation device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109656867B (en) | 2023-05-16 |
CN109376113A (en) | 2019-02-22 |
CN109697184A (en) | 2019-04-30 |
CN109376112B (en) | 2022-03-15 |
CN109634905B (en) | 2023-03-10 |
CN109710559A (en) | 2019-05-03 |
CN109634905A (en) | 2019-04-16 |
CN109726168A (en) | 2019-05-07 |
CN109376113B (en) | 2021-12-14 |
CN109656867A (en) | 2019-04-19 |
CN109697184B (en) | 2021-04-09 |
CN108021528B (en) | 2020-03-13 |
CN109634904A (en) | 2019-04-16 |
CN108021528A (en) | 2018-05-11 |
WO2018082229A1 (en) | 2018-05-11 |
CN109710558A (en) | 2019-05-03 |
CN109376114B (en) | 2022-03-15 |
CN109376114A (en) | 2019-02-22 |
CN109684267B (en) | 2021-08-06 |
CN109634904B (en) | 2023-03-07 |
CN109726168B (en) | 2021-09-21 |
CN109684267A (en) | 2019-04-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109376112A (en) | SLAM arithmetic unit and method | |
US11922132B2 (en) | Information processing method and terminal device | |
CN109240746B (en) | Apparatus and method for performing matrix multiplication operation | |
US20200089535A1 (en) | Data sharing system and data sharing method therefor | |
KR20200000480A (en) | Processing apparatus and processing method | |
CN107632965B (en) | Restructural S type arithmetic unit and operation method | |
CN111310904A (en) | Apparatus and method for performing convolutional neural network training | |
KR20190003610A (en) | Apparatus and method for performing convolution neural network forward operation | |
CN108334944B (en) | Artificial neural network operation device and method | |
Colleman et al. | High-utilization, high-flexibility depth-first CNN coprocessor for image pixel processing on FPGA | |
CN112817898A (en) | Data transmission method, processor, chip and electronic equipment | |
CN111860772A (en) | Device and method for executing artificial neural network posing operation | |
JP5045652B2 (en) | Correlation processing device and medium readable by correlation processing device | |
Chenini | An embedded FPGA architecture for efficient visual saliency based object recognition implementation | |
CN116185378A (en) | Optimization method of calculation graph, data processing method and related products | |
CN113918220A (en) | Assembly line control method, operation module and related product | |
CN117933327A (en) | Processing device, processing method, chip and electronic device | |
CN117933314A (en) | Processing device, processing method, chip and electronic device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 100000 room 644, No. 6, No. 6, South Road, Beijing Academy of Sciences Applicant after: Zhongke Cambrian Technology Co., Ltd Address before: 100000 room 644, No. 6, No. 6, South Road, Beijing Academy of Sciences Applicant before: Beijing Zhongke Cambrian Technology Co., Ltd. |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |