WO2020073164A1 - 数据存储的装置、方法、处理器和可移动设备 - Google Patents

数据存储的装置、方法、处理器和可移动设备 Download PDF

Info

Publication number
WO2020073164A1
WO2020073164A1 PCT/CN2018/109327 CN2018109327W WO2020073164A1 WO 2020073164 A1 WO2020073164 A1 WO 2020073164A1 CN 2018109327 W CN2018109327 W CN 2018109327W WO 2020073164 A1 WO2020073164 A1 WO 2020073164A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
unit
assembling
units
predetermined size
Prior art date
Application number
PCT/CN2018/109327
Other languages
English (en)
French (fr)
Inventor
韩峰
王耀杰
高明明
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to CN201880040193.XA priority Critical patent/CN110770763A/zh
Priority to PCT/CN2018/109327 priority patent/WO2020073164A1/zh
Publication of WO2020073164A1 publication Critical patent/WO2020073164A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering

Definitions

  • the present application relates to the field of information technology, and more specifically, to an apparatus, method, processor and removable device for data storage.
  • CNN Convolutional Neural Network
  • the output format of the calculation result of the convolutional neural network is different from that stored in memory, such as static random access memory (Static Random Access Memory, SRAM), so it needs to be converted to the format stored in the memory during storage. Therefore, how to improve the efficiency of data storage has become an urgent technical problem in the design of convolutional neural networks.
  • SRAM static random access memory
  • the embodiments of the present application provide a data storage device, method, processor and removable device, which can improve the efficiency of data storage.
  • a data storage device including: an assembling module for acquiring a calculation result after multiply-accumulating by a multiply-accumulate unit, the calculation result including at least one data unit that outputs a feature map, and the at least one Each data unit of the output feature map is assembled into a data unit group of a predetermined size; a storage module is used to store the data unit group into a memory, wherein the predetermined size is a storage unit in the memory the size of.
  • a method for data storage including: obtaining a calculation result after multiply-accumulating by a multiply-accumulate unit, the calculation result including at least one data unit outputting a feature map; A data unit outputting a characteristic map is assembled into a data unit group of a predetermined size; the data unit group is stored in a memory, wherein the predetermined size is the size of a storage unit in the memory.
  • a processor including the data storage device of the first aspect.
  • a removable device including the apparatus for storing data of the first aspect; or, the processor of the third aspect.
  • a computer storage medium stores a program code, and the program code may be used to instruct to perform the method of the second aspect.
  • each data unit of the output feature map in the calculation result after the multiply-accumulate unit is assembled into a data unit group of a predetermined size and stored in the memory, because the assembly of the data unit is based on the storage unit in the memory
  • the size does not occupy too many resources, and it is convenient to store the data unit group in the memory, so the efficiency of data storage can be improved.
  • FIG. 1 is a schematic diagram of a convolution operation process of a convolutional neural network according to an embodiment of the present application.
  • FIG. 2 is an architectural diagram of applying the technical solution of the embodiment of the present application.
  • FIG. 3 is a schematic diagram of the calculation result output by the multiply-accumulate unit of the embodiment of the present application.
  • FIG. 4 is a schematic diagram of a storage format of a feature map in a memory according to an embodiment of the present application.
  • FIG. 5 is a schematic architectural diagram of a mobile device according to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a data storage device according to an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a data storage device according to another embodiment of the present application.
  • FIG. 8 is a schematic diagram of a data storage device according to yet another embodiment of the present application.
  • FIG. 9 is a schematic diagram of a data storage device according to yet another embodiment of the present application.
  • FIG. 10 is a schematic flowchart of reading data using a polling algorithm according to an embodiment of the present application.
  • FIG. 11 is a schematic diagram of a data storage device according to yet another embodiment of the present application.
  • FIG. 12 is a schematic diagram of data unit distribution according to an embodiment of the present application.
  • FIG. 13 is a schematic diagram of a data storage device according to yet another embodiment of the present application.
  • FIG. 14 is a schematic diagram of assembling data units according to an embodiment of the present application.
  • 15 is a schematic flowchart of a data storage method according to an embodiment of the present application.
  • the size of the sequence number of each process does not mean the order of execution, and the execution order of each process should be determined by its function and internal logic, and should not be applied to the embodiments of this application
  • the implementation process constitutes no limitation.
  • FIG. 1 shows a schematic diagram of a convolution operation process of a convolutional neural network.
  • the convolution operation of the convolutional neural network will calculate a set of input weight values and a set of input feature maps (Input Feature Map, IFM) and output a set of output feature maps (Output Feature Map, OFM ).
  • the input weight value is called a filter or a convolution kernel.
  • the input feature map is the output feature map of the previous layer.
  • the output feature map is the feature map obtained by the input feature map after the current layer operation.
  • the convolution kernel and the input and output feature maps can be expressed as a multi-dimensional matrix.
  • a convolution operation of the convolution layer of the convolutional neural network is at least part of the eigenvalues (data units) of the input feature matrix and the convolution kernel matrix.
  • the weight value is used for inner product operation.
  • the convolution operation of the convolutional layer can use the sliding window method, starting from the upper left corner of the input eigenvalue matrix and using the size of the convolution kernel as the window, and sliding the window to the lower right corner of the input feature matrix in turn, generating a complete two-dimensional Output feature matrix.
  • the convolution calculation device extracts a window-sized input feature value from the input feature value matrix, and performs an inner product operation with the convolution kernel to generate an output feature value.
  • the three-dimensional output feature matrix of the convolutional layer can be obtained.
  • FIG. 2 is an architectural diagram of applying the technical solution of the embodiment of the present application.
  • the system 200 may include a convolution calculation device 210 and a memory 220.
  • the memory 220 is used to store data to be processed, for example, input feature maps and weight values, and store processed data, for example, output feature maps.
  • the memory 220 may be SRAM.
  • the convolution calculation device 210 includes a multiply accumulate unit (Multiply Accumulate Unit, MAU) 211, an IFM input module 212, a weight value input module 213, and an OFM storage module 214.
  • the weight value input module 213 is responsible for reading the weight value from the memory 220 and sending it to the MAU 211 in a specific format.
  • the IFM input module 212 is responsible for reading the input feature map data from the memory 220 and sending it to the MAU 211 for convolution operation.
  • the MAU211 may include a pulsating array and a buffer for storing intermediate calculation results.
  • MAU211 When performing the convolution operation, MAU211 first loads the weight value sent by the weight value input module 213 into the pulsation array, and then, when the input feature map data is sent from the IFM input module 212 into the pulsation array, it is loaded with the weight value loaded in advance Multiply and accumulate. If the intermediate result is buffered in the buffer in MAU211, the output of the pulsating array will continue to be multiplied and accumulated with the intermediate result in the buffer. If the result of the multiply-accumulate is still the intermediate result of the convolution operation, it is stored in the cache of the MAU, otherwise it is output to the lower-level module OFM storage module 214 for subsequent processing. The OFM storage module 214 assembles the convolution calculation result output by the MAU 211 into the data format stored in the memory 220, and then writes it into the memory 220.
  • FIG. 3 The calculation result output by MAU211 is shown in Figure 3.
  • [k, m, n] represents the eigenvalues of the mth row and nth column of the kth feature map in the three-dimensional feature matrix.
  • Each cycle of the pulsating array outputs the characteristic values of one row in FIG. 3.
  • Each column of the pulsating array outputs a two-dimensional output feature matrix, which corresponds to an output feature map.
  • the delay between the first effective feature values output by two adjacent columns is greater than or equal to 1 cycle.
  • feature maps are continuously stored in units of a predetermined size. Its storage format is shown in Figure 4. [k, m, n] in the figure represents the eigenvalue of the mth row and nth column of the kth feature map in the three-dimensional feature matrix. The size of the value.
  • a line of feature values output by the MAU211 in FIG. 3 per cycle belongs to multiple different feature maps, and the storage format in the memory 220 is to store each feature map in units of a predetermined size Continuous storage. Therefore, the output format of the calculation result of the MAU 211 is different from the storage format in the memory 220.
  • embodiments of the present application provide a technical solution for data storage, which can efficiently assemble convolution calculation results into a data format stored in a memory for storage, thereby improving data storage efficiency.
  • the technical solutions of the embodiments of the present application may be applied to removable devices.
  • the movable device may be a drone, an unmanned boat, an autonomous vehicle, a robot, or the like, which is not limited in this embodiment of the present application.
  • FIG. 5 is a schematic architectural diagram of a mobile device 500 according to an embodiment of the present application.
  • the mobile device 500 may include a power system 510, a control system 520, a sensing system 530 and a processing system 540.
  • the power system 510 is used to power the mobile device 500.
  • the power system of the unmanned aircraft may include an electronic governor (abbreviated as electric governor), a propeller, and a motor corresponding to the propeller.
  • the motor is connected between the electronic governor and the propeller.
  • the motor and the propeller are arranged on the corresponding arm; the electronic governor is used to receive the driving signal generated by the control system and provide the driving current to the motor according to the driving signal to control the motor Rotating speed.
  • the motor is used to drive the propeller to rotate, thereby providing power for the drone's flight.
  • the sensing system 530 can be used to measure the posture information of the mobile device 500, that is, the position information and status information of the mobile device 500 in space, for example, three-dimensional position, three-dimensional angle, three-dimensional velocity, three-dimensional acceleration, and three-dimensional angular velocity.
  • the sensing system 530 may include, for example, at least one of sensors such as a gyroscope, an electronic compass, an inertial measurement unit (Inertial Measurement Unit (IMU), a visual sensor, a global positioning system (GPS), a barometer, an airspeed meter, etc. Species.
  • the sensing system 530 can also be used to acquire images, that is, the sensing system 530 includes sensors for acquiring images, such as cameras and the like.
  • the control system 520 is used to control the movement of the mobile device 500.
  • the control system 520 can control the mobile device 500 according to a preset program instruction.
  • the control system 520 may control the movement of the movable device 500 according to the attitude information of the movable device 500 measured by the sensing system 530.
  • the control system 520 may also control the mobile device 500 according to the control signal from the remote controller.
  • the control system 520 may be a flight control system (flight control), or a control circuit in the flight control.
  • the processing system 540 can process the image collected by the sensing system 530.
  • the processing system 540 may be an image signal processing (Image Signal Processing, ISP) type chip.
  • ISP Image Signal Processing
  • the processing system 540 may be the system 200 in FIG. 2 or the processing system 540 may include the system 200 in FIG. 2.
  • the mobile device 500 may further include other components not shown in FIG. 5, which is not limited in this embodiment of the present application.
  • FIG. 6 shows a schematic diagram of a data storage device 600 according to an embodiment of the present application.
  • the device 600 may be the OFM storage module 214 in FIG. 2.
  • the device 600 may include an assembling module 610 and a storage module 620.
  • the assembling module 610 may be an assembling circuit, but the embodiments of the present application are not limited thereto, and they may also be implemented in other ways.
  • the assembling module 610 is used to obtain a multiplied-accumulated unit calculation result after multiply-accumulated, the calculated result includes at least one data unit outputting the feature map, and assembling each data unit of the output feature map in the at least one output feature map into a predetermined The size of the data unit group.
  • the storage module 620 is used to store the data unit group in a memory, wherein the predetermined size is the size of the storage unit in the memory.
  • the format conversion of the calculation result is performed by the assembling module 610.
  • the assembling module 610 assembles each data unit of the output feature map into a data unit group of the size of the storage unit in the memory.
  • the storage module 620 can store the assembled data unit group to the storage unit in the memory. Because the assembly of the data unit is based on the size of the storage unit in the memory, it does not occupy too many resources, and it is convenient to store the data unit group in the memory, so the efficiency of data storage can be improved.
  • the assembling module 610 includes N assembling units 611, wherein each of the N assembling units 611 is used to integrate one
  • the data units of the output feature map are assembled into the data unit group of the predetermined size, and N is a positive integer greater than 1.
  • multiple assembling units 611 are used to implement assembling of data units.
  • Each assembling unit 611 is responsible for assembling a data unit that outputs a feature map.
  • the first assembling unit is responsible for assembling the first output data unit of the feature map
  • the second assembling unit is responsible for the assembling the second data unit output feature map, and so on.
  • the N assembly units 611 can realize the assembly of N data units outputting feature maps.
  • the multiply-accumulate unit can output N data units at a time, and the N data units belong to N output feature maps, respectively. As shown in FIG. 3, the multiply-accumulate unit outputs one row of data units at a time, where each data unit belongs to an output feature map, and a row of N data units belong to N output feature maps, respectively.
  • the N assembly units 611 may correspond to the N output feature maps, respectively.
  • the device 600 further includes:
  • the distribution module 630 is configured to distribute the N data units to the N assembly units 611, respectively.
  • the distribution module 630 distributes a row of N data units output by the multiply-accumulate unit to N assembly units 611 respectively.
  • the data units that are continuously input multiple times in each of the N assembly units 611 are assembled into the data unit group of the predetermined size.
  • each assembling unit 611 receives one data unit of the output feature map corresponding to the assembling unit 611 at a time, and assembles it with the previously received data unit until it is assembled into the data unit group of the predetermined size.
  • the size of the storage unit in the memory is generally smaller than the size of a line in the feature map.
  • the size of a line in the feature map may be an integer multiple of the predetermined size.
  • the size of is not an integer multiple of the predetermined size, then the last data unit group of each row includes only the last remaining data unit, that is, its size is smaller than the predetermined size.
  • each of the N assembling units 611 has N consecutive input data units Assembled into the data unit group of the predetermined size.
  • each of the N assembly units 611 includes a first cache 612.
  • the size of the first cache 612 may be the predetermined size.
  • the first cache 612 may be implemented through a register.
  • the first buffer 612 is used for assembling data units.
  • the storage module 620 is used to store the assembled data unit group of the first cache 612 into the memory.
  • the size of the first cache 612 in the assembling unit 611 is to ensure that the assembly of the data unit group of the predetermined size can be achieved. Therefore, the minimum size of the first cache 612 can be the predetermined size. In this case, whenever a data unit group is assembled in the first cache 612, the storage module 620 needs to immediately store the assembled data unit group in the memory.
  • each of the N assembly units 611 includes a first cache 612 and a second cache 613.
  • the sizes of the first cache 612 and the second cache 613 are both the predetermined size.
  • the first cache 612 is used for assembling data units, and caches the assembled data unit group to the second cache 613.
  • the storage module 620 is used to store the assembled data unit group in the second cache 613 into the memory.
  • the first cache 612 and the second cache 613 are used to implement the assembling unit 611.
  • the size of both caches is the predetermined size.
  • the first cache 612 and the second cache 613 may be implemented by registers.
  • the first buffer 612 is used for assembling, and the second buffer 613 is used for buffering the assembled data unit group.
  • first cache 612 and the second cache 613 may be physically separated, or may be integrated. That is to say, the first cache 612 and the second cache 613 may be two independent caches or two parts of one cache, which is not limited in this embodiment of the present application.
  • the sizes of the first cache 612 and the second cache 613 may also be greater than the predetermined size, as long as the data unit group of the predetermined size can be assembled and cached, and this embodiment of the present application is not limited thereto. .
  • the storage module 620 Due to the existence of the second cache 613, it is convenient for the storage module 620 to store the assembled data unit group.
  • the storage module 620 may sequentially read the assembled data unit group from the second cache 613 of each of the N assembly units 611 according to a round-robin algorithm and store the data in the assembly In memory.
  • FIG. 10 shows a schematic flowchart of reading data using a polling algorithm.
  • the storage module 620 can cyclically execute 1001, 1002, 1003, 1004, ..., 1005, 1006, and sequentially read the assembled data unit group from each assembly unit 611 and store it in the memory. For example, in 1001, it is determined whether there is an assembled data unit group in the first assembly unit, and if so, 1002 is executed to read out the assembled data unit group in the first assembly unit. Then execute 1003 to determine whether there is an assembled data unit group in the second assembly unit. If yes, execute 1004 to read out the assembled data unit group in the second assembly unit, and so on.
  • the device 600 may further include:
  • the control module 640 is used to control the speed at which the multiply-accumulate unit outputs the calculation result.
  • the speed at which the multiply-accumulate unit outputs the calculation result may not match the speed at which the device 600 processes data. Therefore, in the embodiment of the present application, the speed at which the multiply-accumulate unit outputs the calculation result is controlled by the control module 640. For example, when the speed at which the multiply-accumulate unit sends data is too fast, the control module 640 may trigger a backpressure signal to the multiply-accumulate unit. After receiving the backpressure signal, the multiply-accumulate unit will stop calculation until the backpressure signal is cancelled and then continue calculation.
  • the assembly of data units is implemented by N assembly units.
  • the assembling module 610 may also use other implementation methods.
  • the implementation of another embodiment of the present application is described below.
  • the assembling module 610 includes a first assembling unit 616 and a second assembling unit 617.
  • the first assembling unit 616 is used for assembling data units of a specific odd-numbered row into the data unit group of a predetermined size
  • the second assembling unit 617 is used for assembling data units of a specific even-numbered row into the predetermined-size data unit A data unit group, wherein the specific odd-numbered rows represent odd rows of each output feature map in the at least one output feature map, and the specific even-numbered rows represent even numbers of each output feature map in the at least one output feature map Row.
  • the first assembling unit 616 and the second assembling unit 617 are used to implement assembling of the data unit.
  • the first assembling unit 616 is used for assembling each data unit of the odd-numbered rows of the output feature map
  • the second assembling unit 616 is used for assembling each data unit of the even-numbered rows of the output feature map.
  • the device 600 further includes:
  • the distribution module 635 is configured to distribute the data units of the specific odd rows to the first assembly unit 616, and distribute the data units of the specific even rows to the second assembly unit 617.
  • the distribution module 635 may separately count the data units of each feature map, and distribute the data units to different assembly units according to the line numbers of the input data units. For example, as shown in FIG. 12, the data units of odd rows of each feature map are distributed to the first assembly unit 616, and the data units of even rows are distributed to the second assembly unit 617.
  • [k, m, n] represents the eigenvalue (data unit) of the mth row and nth column of the kth feature map in the three-dimensional feature matrix, and the width of the feature map (the number of data units per row) is 56 ,
  • the number of feature maps is 32.
  • each of the first assembling unit 616 and the second assembling unit 617 includes N first-in first-out queues (First Input First Output) (FIFO) .
  • the FIFO may be a dual-port FIFO implemented by random access memory (Random Access Memory, RAM).
  • the p * N + i data units of the data units of the specific odd-numbered rows are input into the i-th FIFO of the first assembly unit 616, and the specific N data units of odd rows are assembled into the data unit group of the predetermined size;
  • the p * N + i data units of the data units of the specific even-numbered rows are input into the i-th FIFO of the second assembling unit 617, and the specific The N data units of even rows are assembled into the data unit group of the predetermined size, where N is a positive integer greater than 1, i is a positive integer not greater than N, and p is zero or a positive integer.
  • the multiply-accumulate unit can output N data units at a time, and the N data units belong to N output feature maps, respectively. As shown in FIG. 3, the multiply-accumulate unit outputs one row of data units at a time, where each data unit belongs to an output feature map, and a row of N data units belong to N output feature maps, respectively.
  • the distribution module 635 is used to distribute the N data units to the corresponding FIFO, wherein the p * N + i data unit in the data unit of the specific odd row is distributed to In the i-th FIFO of the first assembling unit 616, the p * N + i-th data unit of the specific even-row data unit is distributed to the i-th FIFO of the second assembling unit 617.
  • the storage module 620 is used to store the assembled data unit group in the N FIFOs of the first assembling unit 616 or the second assembling unit 617 into the memory.
  • the storage module 620 reads the assembled data unit group from the two assembly units in turn according to the distribution rules of the distribution module 635 and stores them in the memory. For example, in the above example, the 32 data units of the first line of the first feature map among the 32 FIFOs of the first assembling unit 616, namely, [0,0,0], [0,0,1], ... After [0,0,31] is assembled into a data unit group, the storage module 620 reads out and stores the data unit group into the memory.
  • the device 600 may also include: a control module 640 for controlling the speed at which the multiply-accumulate unit outputs the calculation result.
  • a control module 640 for controlling the speed at which the multiply-accumulate unit outputs the calculation result.
  • data unit assembly is implemented by FIFO, and FIFO can be implemented by RAM.
  • FIFO can be implemented by RAM.
  • FPGA Field Programmable Gate Array
  • LUT Look Up Table
  • the data storage device of the embodiment of the present application is described above, and the data storage method of the embodiment of the present application is described below.
  • the method for data storage in the embodiments of the present application is the method for implementing the technical solutions in the embodiments of the present application in the foregoing data storage device in the embodiments of the present application or a device including the data storage device in the embodiments of the present application. In the embodiments, for the sake of brevity, they will not be repeated here.
  • FIG. 15 shows a schematic flowchart of a data storage method 1500 according to an embodiment of the present application.
  • the method 1500 includes:
  • a data unit of an output feature map is assembled into the data unit group of a predetermined size by each of the N assembly units, and N is a positive integer greater than 1.
  • N data units output by the multiply-accumulate unit at a time are obtained, and the N data units belong to N output feature maps; the method 1500 further includes: N data units are distributed to the N assembly units, respectively.
  • a data unit that is continuously input multiple times in each of the N assembly units is assembled into the data unit group of the predetermined size.
  • the predetermined size is the size of N data units; the data units input consecutively N times in each of the N assembly units are assembled to the predetermined size Group of data units.
  • each of the N assembly units includes a first cache, and the size of the first cache is the predetermined size; an output is output through the first cache
  • the data units of the feature map are assembled into the data unit group of the predetermined size; the assembled data unit group of the first cache is stored in the memory.
  • each of the N assembly units includes a first cache and a second cache, and the sizes of the first cache and the second cache are the predetermined Size; assembling a data unit of the output feature map into the data unit group of the predetermined size through the first cache; caching the assembled data unit group of the first cache to the second cache; The assembled data unit group in the second cache is stored in the memory.
  • the assembled data unit group is sequentially read from the second cache of each of the N assembly units and stored in the memory .
  • the data unit of a specific odd row is assembled into the data unit group of a predetermined size by a first assembling unit, wherein the specific odd row represents the at least one output feature map Each odd line of the output feature map; the data unit of the specific even line is assembled into the data unit group of the predetermined size by the second assembling unit, wherein the specific even line indicates that each of the at least one output feature map An even row of output feature maps.
  • the method 1500 further includes: distributing the data units of the specific odd rows to the first assembly unit, and distributing the data units of the specific even rows to the The second assembly unit.
  • the first assembling unit and the second assembling unit each include N first-in first-out queue FIFOs; the p * Nth of the data units of the specific odd-numbered rows + i data units are distributed to the i-th FIFO of the first assembling unit; N data units of the specific odd rows in the N FIFOs of the first assembling unit are assembled into the data of the predetermined size Unit group; distribute the p * N + i data units in the data units of the specific even-numbered rows to the i-th FIFO of the second assembly unit; distribute the N FIFOs of the second assembly unit
  • the N data units of the specific even-numbered rows are assembled into the data unit group of a predetermined size, where N is a positive integer greater than 1, i is a positive integer not greater than N, and p is a zero or positive integer.
  • N data units output by the multiply-accumulate unit at one time are obtained, and the N data units belong to N output feature maps respectively; the N data units are described as The p * N + i data units in the data units of the specific odd-numbered rows are distributed to the i-th FIFO of the first assembly unit; among the data units of the specific even-numbered rows in the N data units The p * N + i data unit is distributed to the i-th FIFO of the second assembly unit.
  • the assembled data unit group in the N FIFOs of the first assembling unit or the second assembling unit is stored in the memory.
  • the method 1500 further includes: controlling the speed at which the multiply-accumulate unit outputs the calculation result.
  • An embodiment of the present application further provides a processor.
  • the processor includes a multiply-accumulate unit and the foregoing data storage device of the embodiment of the present application.
  • the multiply-accumulate unit is used to perform multiply-accumulate calculation and output the calculation result to the data storage device.
  • the data storage device uses the technical solution of the embodiment of the present application to store data in the memory.
  • the processor may be the convolution calculation device 210 in FIG. 2, wherein the OFM storage module 214 may be a data storage device according to an embodiment of this application.
  • An embodiment of the present application further provides a removable device.
  • the removable device may include the apparatus for storing data of the foregoing embodiment of the present application; or, include the processor of the foregoing embodiment of the present application.
  • An embodiment of the present application also provides a computer storage medium in which a program code is stored, and the program code may be used to instruct to execute the data storage method of the above-mentioned embodiment of the present application.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the units is only a division of logical functions.
  • there may be other divisions for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual couplings or direct couplings or communication connections may be indirect couplings or communication connections through some interfaces, devices, or units, and may also be electrical, mechanical, or other forms of connection.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments of the present application.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or software function unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium.
  • the technical solution of the present application essentially or part of the contribution to the existing technology, or all or part of the technical solution can be embodied in the form of a software product
  • the computer software product is stored in a storage medium
  • several instructions are included to enable a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Neurology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Processing (AREA)

Abstract

一种数据存储的装置(600)、方法、处理器和可移动设备。该装置(600)包括:拼装模块(610),用于获取乘累加单元乘累加后的计算结果,所述计算结果包括至少一个输出特征图的数据单元,将所述至少一个输出特征图中每一个输出特征图的数据单元拼装为预定大小的数据单元组;存储模块(620),用于将所述数据单元组存储到存储器中,其中,所述预定大小为所述存储器中存储单元的大小。能够提高数据存储的效率。

Description

数据存储的装置、方法、处理器和可移动设备
版权申明
本专利文件披露的内容包含受版权保护的材料。该版权为版权所有人所有。版权所有人不反对任何人复制专利与商标局的官方记录和档案中所存在的该专利文件或者该专利披露。
技术领域
本申请涉及信息技术领域,并且更具体地,涉及一种数据存储的装置、方法、处理器和可移动设备。
背景技术
卷积神经网络(Convolutional Neural Network,CNN)是一种机器学习算法,它被广泛应用于目标识别、目标检测以及图像的语义分割等计算机视觉任务。
卷积神经网络的计算结果的输出格式与存储器,如静态随机存取存储器(Static Random Access Memory,SRAM)中存储的格式不同,因此在存储时需要转换为存储器中存储的格式。因此,如何提高数据存储的效率,成为卷积神经网络设计中一个亟待解决的技术问题。
发明内容
本申请实施例提供了一种数据存储的装置、方法、处理器和可移动设备,能够提高数据存储的效率。
第一方面,提供了一种数据存储的装置,包括:拼装模块,用于获取乘累加单元乘累加后的计算结果,所述计算结果包括至少一个输出特征图的数据单元,将所述至少一个输出特征图中每一个输出特征图的数据单元拼装为预定大小的数据单元组;存储模块,用于将所述数据单元组存储到存储器中,其中,所述预定大小为所述存储器中存储单元的大小。
第二方面,提供了一种数据存储的方法,包括:获取乘累加单元乘累加后的计算结果,所述计算结果包括至少一个输出特征图的数据单元;将所述至少一个输出特征图中每一个输出特征图的数据单元拼装为预定大小的 数据单元组;将所述数据单元组存储到存储器中,其中,所述预定大小为所述存储器中存储单元的大小。
第三方面,提供了一种处理器,包括第一方面的数据存储的装置。
第四方面,提供了一种可移动设备,包括第一方面的数据存储的装置;或者,第三方面的处理器。
第五方面,提供了一种计算机存储介质,该计算机存储介质中存储有程序代码,该程序代码可以用于指示执行上述第二方面的方法。
本申请实施例的技术方案,将乘累加单元乘累加后的计算结果中每一个输出特征图的数据单元拼装为预定大小的数据单元组存储到存储器中,由于数据单元的拼装基于存储器中存储单元的大小,不会占用太多资源,而且便于将数据单元组存储到存储器中,因此能够提高数据存储的效率。
附图说明
图1是本申请实施例的卷积神经网络的卷积操作过程的示意图。
图2是应用本申请实施例的技术方案的架构图。
图3是本申请实施例的乘累加单元输出的计算结果的示意图。
图4是本申请实施例的特征图在存储器中的存储格式的示意图。
图5是本申请实施例的可移动设备的示意性架构图。
图6是本申请一个实施例的数据存储的装置的示意图。
图7是本申请另一个实施例的数据存储的装置的示意图。
图8是本申请又一个实施例的数据存储的装置的示意图。
图9是本申请又一个实施例的数据存储的装置的示意图。
图10是本申请实施例的采用轮询算法读出数据的示意性流程图。
图11是本申请又一个实施例的数据存储的装置的示意图。
图12是本申请实施例的数据单元分发的示意图。
图13是本申请又一个实施例的数据存储的装置的示意图。
图14是本申请实施例的数据单元拼装的示意图。
图15是本申请实施例的数据存储的方法的示意性流程图。
具体实施方式
下面将结合附图,对本申请实施例中的技术方案进行描述。
应理解,本文中的具体的例子只是为了帮助本领域技术人员更好地理解本申请实施例,而非限制本申请实施例的范围。
还应理解,在本申请的各种实施例中,各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
还应理解,本说明书中描述的各种实施方式,既可以单独实施,也可以组合实施,本申请实施例对此并不限定。
本申请实施例的技术方案可以应用于各种深度学习算法中,例如卷积神经网络,但本申请实施例对此并不限定。
图1示出了卷积神经网络的卷积操作过程的示意图。
如图1所示,卷积神经网络的卷积操作会对输入的一组权重值和一组输入特征图(Input Feature Map,IFM)进行运算后输出一组输出特征图(Output Feature Map,OFM)。输入的权重值被称为滤波器(Filter)或卷积核。输入特征图为上一层的输出特征图。输出特征图为输入特征图经过当前层运算后得到的特征图。卷积核和输入、输出特征图都可以被表示为一个多维矩阵,卷积神经网络的卷积层的一次卷积运算为输入特征矩阵的至少部分特征值(数据单元)与卷积核矩阵的权重值进行内积操作。
卷积层的卷积运算可以采用滑动窗口方式,以输入特征值矩阵的左上角为起点,以卷积核的大小为窗口,依次滑动窗口到输入特征矩阵的右下角,产生一个完整的二维输出特征矩阵。每次滑动窗口后,卷积计算装置都会从输入特征值矩阵中提取一个窗口大小的输入特征值,将其与卷积核进行内积操作,产生一个输出特征值。依照上述方式,依次产生所有的二维输出特征矩阵后,便可得到该卷积层的三维输出特征矩阵。
图2是应用本申请实施例的技术方案的架构图。
如图2所示,系统200可以包括卷积计算装置210和存储器220。
存储器220用于存储待处理的数据,例如,输入特征图和权重值,以及存储处理后的数据,例如输出特征图。存储器220可以为SRAM。
卷积计算装置210包括乘累加单元(Multiply Accumulate Unit,MAU)211、IFM输入模块212、权重值输入模块213和OFM存储模块214。权重值输入模块213负责从存储器220中读出权重值,并按特定格式送到MAU 211。IFM输入模块212负责将输入特征图数据从存储器220中读出,并将 其送到MAU 211中进行卷积运算。MAU211可以包括脉动阵列和用来存储中间计算结果的缓存。进行卷积运算时,MAU211首先将权重值输入模块213送入的权重值装载到脉动阵列,之后,当输入特征图数据从IFM输入模块212送入脉动阵列后,将其与提前装载的权重值进行乘累加。如果MAU 211中的缓存中缓存了中间结果,则脉动阵列输出结果还会继续与该缓存中的中间结果再进行一次乘累加。乘累加的结果如果仍为卷积运算的中间结果,则将其存储到MAU的缓存中,否则输出到下级模块OFM存储模块214中进行后续处理。OFM存储模块214将MAU 211输出的卷积计算结果组装为存储器220中存储的数据格式,然后将其写入存储器220。
MAU211输出的计算结果如图3所示。图3中[k,m,n]表示三维特征矩阵中第k个特征图的第m行第n列的特征值。脉动阵列每个周期(cycle)输出图3中一行特征值。脉动阵列每列输出一个二维的输出特征矩阵,对应一个输出特征图,相邻两列输出的第一个有效特征值之间的延迟大于或等于1个周期。
在存储器220中,特征图则以预定大小的单元连续存储。其存储格式如图4所示,图中[k,m,n]表示三维特征矩阵中第k个特征图的第m行第n列的特征值,图4中示例的预定大小为32个特征值的大小。
从图3和图4中可以看出,图3中MAU211每个周期输出的一行特征值属于多个不同的特征图,而存储器220中的存储格式则为将每个特征图以预定大小的单元连续存储。因此,MAU211的计算结果的输出格式与存储器220中的存储格式不同。
鉴于此,本申请实施例提供了一种数据存储的技术方案,可以高效地将卷积计算结果组装为存储器中存储的数据格式进行存储,从而能够提高数据存储的效率。
在一些实施例中,本申请实施例的技术方案可以应用于可移动设备中。该可移动设备可以是无人机、无人驾驶船、自动驾驶车辆或机器人等,本申请实施例对此并不限定。
图5是本申请实施例的可移动设备500的示意性架构图。
如图5所示,可移动设备500可以包括动力系统510、控制系统520、传感系统530和处理系统540。
动力系统510用于为该可移动设备500提供动力。
以无人机为例,无人机的动力系统可以包括电子调速器(简称为电调)、螺旋桨以及与螺旋桨相对应的电机。电机连接在电子调速器与螺旋桨之间,电机和螺旋桨设置在对应的机臂上;电子调速器用于接收控制系统产生的驱动信号,并根据驱动信号提供驱动电流给电机,以控制电机的转速。电机用于驱动螺旋桨旋转,从而为无人机的飞行提供动力。
传感系统530可以用于测量可移动设备500的姿态信息,即可移动设备500在空间的位置信息和状态信息,例如,三维位置、三维角度、三维速度、三维加速度和三维角速度等。传感系统530例如可以包括陀螺仪、电子罗盘、惯性测量单元(Inertial Measurement Unit,IMU)、视觉传感器、全球定位系统(Global Positioning System,GPS)、气压计、空速计等传感器中的至少一种。
传感系统530还可用于采集图像,即传感系统530包括用于采集图像的传感器,例如相机等。
控制系统520用于控制可移动设备500的移动。控制系统520可以按照预先设置的程序指令对可移动设备500进行控制。例如,控制系统520可以根据传感系统530测量的可移动设备500的姿态信息控制可移动设备500的移动。控制系统520也可以根据来自遥控器的控制信号对可移动设备500进行控制。例如,对于无人机,控制系统520可以为飞行控制系统(飞控),或者为飞控中的控制电路。
处理系统540可以处理传感系统530采集的图像。例如,处理系统540可以为图像信号处理(Image Signal Processing,ISP)类芯片。
处理系统540可以为图2中的系统200,或者,处理系统540可以包括图2中的系统200。
应理解,上述对于可移动设备500的各组成部件的划分和命名仅仅是示例性的,并不应理解为对本申请实施例的限制。
还应理解,可移动设备500还可以包括图5中未示出的其他部件,本申请实施例对此并不限定。
图6示出了本申请一个实施例的数据存储的装置600的示意图。该装置600可以为图2中的OFM存储模块214。
如图6所示,该装置600可以包括拼装模块610和存储模块620。
应理解,本申请实施例中的各种模块具体可以由电路实现,例如,拼 装模块610可以为拼装电路,但本申请实施例对此并不限定,它们也可以由其他方式实现。
拼装模块610用于获取乘累加单元乘累加后的计算结果,所述计算结果包括至少一个输出特征图的数据单元,将所述至少一个输出特征图中每一个输出特征图的数据单元拼装为预定大小的数据单元组。
存储模块620用于将所述数据单元组存储到存储器中,其中,所述预定大小为所述存储器中存储单元的大小。
在本申请实施例中,通过拼装模块610进行计算结果的格式转换。拼装模块610将每一个输出特征图的数据单元拼装为存储器中存储单元的大小的数据单元组。相应地,存储模块620可以将拼装好的数据单元组存储到存储器中的存储单元。由于数据单元的拼装基于存储器中存储单元的大小,不会占用太多资源,而且便于将数据单元组存储到存储器中,因此能够提高数据存储的效率。
可选地,在本申请的一个实施例中,如图7所示,所述拼装模块610包括N个拼装单元611,其中,所述N个拼装单元611中每个拼装单元611用于将一个输出特征图的数据单元拼装为所述预定大小的数据单元组,N为大于1的正整数。
具体而言,在本实施例中,采用多个拼装单元611实现数据单元的拼装。每个拼装单元611负责一个输出特征图的数据单元拼装。例如,第一个拼装单元负责第一个输出特征图的数据单元拼装,第二个拼装单元负责第二个输出特征图的数据单元拼装,以此类推。这样,N个拼装单元611可以实现N个输出特征图的数据单元拼装。
若乘累加单元的输出数据位宽为N个数据单元,乘累加单元可以一次输出N个数据单元,该N个数据单元分别属于N个输出特征图。如图3所示,乘累加单元一次输出一行数据单元,其中每个数据单元属于一个输出特征图,一行N个数据单元分别属于N个输出特征图。N个拼装单元611可以分别对应这N个输出特征图。
在这种情况下,可选地,在本申请的一个实施例中,如图7所示,所述装置600还包括:
分发模块630,用于将所述N个数据单元分别分发至所述N个拼装单元611。
分发模块630将乘累加单元一次输出的一行N个数据单元分别分发至N个拼装单元611。N个拼装单元611中每个拼装单元611中连续多次输入的数据单元拼装为所述预定大小的数据单元组。
也就是说,每个拼装单元611每次接收到该拼装单元611对应的输出特征图的一个数据单元,将其与之前接收到的数据单元拼装,直到拼装为所述预定大小的数据单元组。
应理解,存储器中存储单元的大小,即所述预定大小,一般要小于特征图中的一行的大小,特征图中的一行的大小可以为所述预定大小的整数倍,若特征图中的一行的大小不为所述预定大小的整数倍,则每一行的最后一个数据单元组仅包括最后剩下的数据单元,即,其大小要小于所述预定大小。
若存储器的访问数据位宽为N个数据单元的大小,即所述预定大小为N个数据单元的大小,则所述N个拼装单元611中每个拼装单元611中连续N次输入的数据单元拼装为所述预定大小的数据单元组。
可选地,在本申请的一个实施例中,如图8所示,所述N个拼装单元611中每个拼装单元611包括第一缓存612。
所述第一缓存612的大小可以为所述预定大小。可选地,所述第一缓存612可以通过寄存器实现。所述第一缓存612用于进行数据单元拼装。相应地,所述存储模块620用于将所述第一缓存612拼装后的数据单元组存储到所述存储器中。
拼装单元611中的第一缓存612的大小要确保可以实现所述预定大小的数据单元组的拼装,因此,第一缓存612的大小最小可以为所述预定大小。在这种情况下,每当第一缓存612中拼装完一个数据单元组时,存储模块620需要立刻将拼装完的数据单元组存储到存储器中。
可选地,在本申请的一个实施例中,如图9所示,所述N个拼装单元611中每个拼装单元611包括第一缓存612和第二缓存613。
所述第一缓存612和所述第二缓存613的大小均为所述预定大小。所述第一缓存612用于进行数据单元拼装,并将拼装后的数据单元组缓存至所述第二缓存613。相应地,所述存储模块620用于将所述第二缓存613中拼装后的数据单元组存储到所述存储器中。
在本实施例中,采用第一缓存612和第二缓存613实现拼装单元611。两个缓存的大小均为所述预定大小。可选地,第一缓存612和第二缓存613 可以通过寄存器实现。第一缓存612用于拼装,第二缓存613用于缓存拼装后的数据单元组。
应理解,第一缓存612和第二缓存613可以是物理上分离的,也可以是一体的。也就是说,第一缓存612和第二缓存613可以是两个独立的缓存,也可以是一个缓存的两个部分,本申请实施例对此并不限定。
还应理解,第一缓存612和第二缓存613的大小也可以大于所述预定大小,只要可以实现所述预定大小的数据单元组的拼装和缓存即可,本申请实施例对此也不限定。
由于第二缓存613的存在,方便了存储模块620对拼装后的数据单元组的存储。
例如,存储模块620可以根据轮询(round-robin)算法,依次从所述N个拼装单元611中每个拼装单元611的第二缓存613中读取拼装后的数据单元组并存储到所述存储器中。
图10示出了采用轮询算法读出数据的示意性流程图。如图10所示,存储模块620可以循环执行1001,1002,1003,1004,…,1005,1006,依次从每个拼装单元611中读出拼装后的数据单元组并存储到所述存储器中。例如,在1001中,判断第一个拼装单元中是否有拼装好的数据单元组,若是,则执行1002,将第一个拼装单元中拼装好的数据单元组读出。再执行1003,判断第二个拼装单元中是否有拼装好的数据单元组,若是,则执行1004,将第二个拼装单元中拼装好的数据单元组读出,以此类推。
可选地,在本申请的一个实施例中,如图7所示,所述装置600还可以包括:
控制模块640,用于控制所述乘累加单元输出计算结果的速度。
具体而言,乘累加单元输出计算结果的速度有可能与装置600处理数据的速度不匹配。因此,在本申请实施例中,通过控制模块640控制乘累加单元输出计算结果的速度。例如,当乘累加单元送入数据的速度太快时,控制模块640可以触发反压信号给乘累加单元。乘累加单元接收到该反压信号后会停止计算,直到反压信号撤销再继续计算。
在上述实施例中,通过N个拼装单元实现数据单元的拼装。对于数据单元的拼装,还可以采用其他方式,也就是说,拼装模块610还可以采用其他实现方式。下面描述本申请另一个实施例的实现方式。
可选地,在本申请的一个实施例中,如图11所示,所述拼装模块610包括第一拼装单元616和第二拼装单元617。
所述第一拼装单元616用于将特定奇数行的数据单元拼装为所述预定大小的数据单元组,所述第二拼装单元617用于将特定偶数行的数据单元拼装为所述预定大小的数据单元组,其中,所述特定奇数行表示所述至少一个输出特征图中每一个输出特征图的奇数行,所述特定偶数行表示所述至少一个输出特征图中每一个输出特征图的偶数行。
具体而言,在本实施例中,采用第一拼装单元616和第二拼装单元617实现数据单元的拼装。第一拼装单元616用于拼装每一个输出特征图的奇数行的数据单元,第二拼装单元616用于拼装每一个输出特征图的偶数行的数据单元。
在这种情况下,可选地,在本申请的一个实施例中,如图11所示,所述装置600还包括:
分发模块635,用于将所述特定奇数行的数据单元分发至所述第一拼装单元616,将所述特定偶数行的数据单元分发至所述第二拼装单元617。
具体而言,分发模块635可以对每个特征图的数据单元分别进行计数,根据输入数据单元的行号分发数据单元到不同的拼装单元。例如,如图12所示,每个特征图奇数行的数据单元分发给第一拼装单元616,偶数行的数据单元分发给第二拼装单元617。图12中,[k,m,n]表示三维特征矩阵中第k个特征图的第m行第n列的特征值(数据单元),特征图的宽度(每行的数据单元数量)为56,特征图的个数为32。
可选地,在本申请的一个实施例中,如图13所示,所述第一拼装单元616和所述第二拼装单元617均包括N个先入先出队列(First Input First Output,FIFO)。可选地,该FIFO可以为随机存取存储器(Random Access Memory,RAM)实现的双端口FIFO。
所述特定奇数行的数据单元中的第p*N+i个数据单元被输入到所述第一拼装单元616的第i个FIFO中,所述第一拼装单元的N个FIFO中所述特定奇数行的N个数据单元拼装为所述预定大小的数据单元组;
所述特定偶数行的数据单元中的第p*N+i个数据单元被输入到所述第二拼装单元617的第i个FIFO中,所述第二拼装单元的N个FIFO中所述特定偶数行的N个数据单元拼装为所述预定大小的数据单元组,其中,N 为大于1的正整数,i为不大于N的正整数,p为零或正整数。
在乘累加单元的输出数据位宽为N个数据单元时,乘累加单元可以一次输出N个数据单元,该N个数据单元分别属于N个输出特征图。如图3所示,乘累加单元一次输出一行数据单元,其中每个数据单元属于一个输出特征图,一行N个数据单元分别属于N个输出特征图。
在这种情况下,所述分发模块635用于将所述N个数据单元分别分发至对应的FIFO中,其中,所述特定奇数行数据单元中的第p*N+i个数据单元分发至所述第一拼装单元616的第i个FIFO中,所述特定偶数行数据单元中的第p*N+i个数据单元分发至所述第二拼装单元617的第i个FIFO中。
例如,如图14所示,[0,0,0]为第一个特征图的第一行的第一个数据单元,因此将[0,0,0]分发至第一拼装单元616的第1个FIFO中;[0,0,1]为第一个特征图的第一行的第二个数据单元,因此将[0,0,1]分发至第一拼装单元616的第2个FIFO中;[1,0,0]为第二个特征图的第一行的第一个数据单元,因此将[1,0,0]分发至第一拼装单元616的第1个FIFO中;[0,0,2]为第一个特征图的第一行的第三个数据单元,因此将[0,0,2]分发至第一拼装单元616的第3个FIFO中;[1,0,1]为第二个特征图的第一行的第二个数据单元,因此将[1,0,0]分发至第一拼装单元616的第2个FIFO中;[2,0,0]为第三个特征图的第一行的第一个数据单元,因此将[2,0,0]分发至第一拼装单元616的第1个FIFO中;以此类推。当第一个特征图的第一行的第N(图14中N为32)个数据单元[0,0,31]分发至第一拼装单元616的第32个FIFO中后,第一拼装单元616的32个FIFO中第一个特征图的第一行的32个数据单元,即,[0,0,0],[0,0,1],…,[0,0,31]拼装为一个数据单元组。
相应地,在本实施例中,所述存储模块620用于将所述第一拼装单元616或所述第二拼装单元617的N个FIFO中拼装后的数据单元组存储到所述存储器中。
存储模块620根据分发模块635的分发规则,轮流从两个拼装单元中读出拼装好的数据单元组,存入存储器中。例如,上述举例中,第一拼装单元616的32个FIFO中第一个特征图的第一行的32个数据单元,即,[0,0,0],[0,0,1],…,[0,0,31]拼装为一个数据单元组后,存储模块620将该数据单元组读出并存储到存储器中。
与前述实施例类似,在本实施例中,如图11所示,所述装置600也 可以包括:控制模块640,用于控制所述乘累加单元输出计算结果的速度。相关描述可参见前述实施例,为了简洁,在此不再赘述。
在本实施例中,通过FIFO实现数据单元拼装,而FIFO可以通过RAM实现。根据现场可编程门阵列(FieldProgrammable Gate Array,FPGA)查找表(Look Up Table,LUT)的构造,RAM比同等规模的寄存器需要的LUT资源更少,所以本实施例的技术方案需要的LUT资源更少。
以上描述了本申请实施例的数据存储的装置,下面描述本申请实施例的数据存储的方法。本申请实施例的数据存储的方法为前述本申请实施例的数据存储的装置或包括本申请实施例的数据存储的装置的设备实施本申请实施例的技术方案时的方法,相关描述可以参考前述实施例,以下为了简洁,在此不再赘述。
图15示出了本申请实施例的数据存储的方法1500的示意性流程图。
如图15所示,所述方法1500包括:
1510,获取乘累加单元乘累加后的计算结果,所述计算结果包括至少一个输出特征图的数据单元;
1520,将所述至少一个输出特征图中每一个输出特征图的数据单元拼装为预定大小的数据单元组;
1530,将所述数据单元组存储到存储器中,其中,所述预定大小为所述存储器中存储单元的大小。
可选地,在本申请一个实施例中,通过N个拼装单元中每个拼装单元将一个输出特征图的数据单元拼装为所述预定大小的数据单元组,N为大于1的正整数。
可选地,在本申请一个实施例中,获取所述乘累加单元一次输出的N个数据单元,所述N个数据单元分别属于N个输出特征图;所述方法1500还包括:将所述N个数据单元分别分发至所述N个拼装单元。
可选地,在本申请一个实施例中,将所述N个拼装单元中每个拼装单元中连续多次输入的数据单元拼装为所述预定大小的数据单元组。
可选地,在本申请一个实施例中,所述预定大小为N个数据单元的大小;将所述N个拼装单元中每个拼装单元中连续N次输入的数据单元拼装为所述预定大小的数据单元组。
可选地,在本申请一个实施例中,所述N个拼装单元中每个拼装单元包括第一缓存,所述第一缓存的大小为所述预定大小;通过所述第一缓存将一个输出特征图的数据单元拼装为所述预定大小的数据单元组;将所述第一缓存拼装后的数据单元组存储到所述存储器中。
可选地,在本申请一个实施例中,所述N个拼装单元中每个拼装单元包括第一缓存和第二缓存,所述第一缓存和所述第二缓存的大小均为所述预定大小;通过所述第一缓存将一个输出特征图的数据单元拼装为所述预定大小的数据单元组;将所述第一缓存拼装后的数据单元组缓存至所述第二缓存;将所述第二缓存中拼装后的数据单元组存储到所述存储器中。
可选地,在本申请一个实施例中,根据轮询算法,依次从所述N个拼装单元中每个拼装单元的第二缓存中读取拼装后的数据单元组并存储到所述存储器中。
可选地,在本申请一个实施例中,通过第一拼装单元将特定奇数行的数据单元拼装为所述预定大小的数据单元组,其中,所述特定奇数行表示所述至少一个输出特征图中每一个输出特征图的奇数行;通过第二拼装单元将特定偶数行的数据单元拼装为所述预定大小的数据单元组,其中,所述特定偶数行表示所述至少一个输出特征图中每一个输出特征图的偶数行。
可选地,在本申请一个实施例中,所述方法1500还包括:将所述特定奇数行的数据单元分发至所述第一拼装单元,将所述特定偶数行的数据单元分发至所述第二拼装单元。
可选地,在本申请一个实施例中,所述第一拼装单元和所述第二拼装单元均包括N个先入先出队列FIFO;将所述特定奇数行的数据单元中的第p*N+i个数据单元分发至所述第一拼装单元的第i个FIFO中;将所述第一拼装单元的N个FIFO中所述特定奇数行的N个数据单元拼装为所述预定大小的数据单元组;将所述特定偶数行的数据单元中的第p*N+i个数据单元分发至所述第二拼装单元的第i个FIFO中;将所述第二拼装单元的N个FIFO中所述特定偶数行的N个数据单元拼装为所述预定大小的数据单元组,其中,N为大于1的正整数,i为不大于N的正整数,p为零或正整数。
可选地,在本申请一个实施例中,获取所述乘累加单元一次输出的N个数据单元,所述N个数据单元分别属于N个输出特征图;将所述N个数据单元中所述特定奇数行的数据单元中的第p*N+i个数据单元分发至 所述第一拼装单元的第i个FIFO中;将所述N个数据单元中所述特定偶数行的数据单元中的第p*N+i个数据单元分发至所述第二拼装单元的第i个FIFO中。
可选地,在本申请一个实施例中,将所述第一拼装单元或所述第二拼装单元的N个FIFO中拼装后的数据单元组存储到所述存储器中。
可选地,在本申请一个实施例中,所述方法1500还包括:控制所述乘累加单元输出计算结果的速度。
本申请实施例还提供了一种处理器,该处理器包括乘累加单元以及前述本申请实施例的数据存储的装置。
该乘累加单元用于进行乘累加计算,并向该数据存储的装置输出计算结果,该数据存储的装置采用本申请实施例的技术方案向存储器中存储数据。
例如,该处理器可以为图2中的卷积计算装置210,其中,OFM存储模块214可以为本申请实施例的数据存储的装置。
本申请实施例还提供了一种可移动设备,该可移动设备可以包括上述本申请实施例的数据存储的装置;或者,包括上述本申请实施例的处理器。
本申请实施例还提供了一种计算机存储介质,该计算机存储介质中存储有程序代码,该程序代码可以用于指示执行上述本申请实施例的数据存储的方法。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到 另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口、装置或单元的间接耦合或通信连接,也可以是电的,机械的或其它的形式连接。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本申请实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以是两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分,或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (30)

  1. 一种数据存储的装置,其特征在于,包括:
    拼装模块,用于获取乘累加单元乘累加后的计算结果,所述计算结果包括至少一个输出特征图的数据单元,将所述至少一个输出特征图中每一个输出特征图的数据单元拼装为预定大小的数据单元组;
    存储模块,用于将所述数据单元组存储到存储器中,其中,所述预定大小为所述存储器中存储单元的大小。
  2. 根据权利要求1所述的装置,其特征在于,所述拼装模块包括N个拼装单元,其中,所述N个拼装单元中每个拼装单元用于将一个输出特征图的数据单元拼装为所述预定大小的数据单元组,N为大于1的正整数。
  3. 根据权利要求2所述的装置,其特征在于,所述乘累加单元一次输出N个数据单元,所述N个数据单元分别属于N个输出特征图;
    所述装置还包括:
    分发模块,用于将所述N个数据单元分别分发至所述N个拼装单元。
  4. 根据权利要求3所述的装置,其特征在于,所述N个拼装单元中每个拼装单元中连续多次输入的数据单元拼装为所述预定大小的数据单元组。
  5. 根据权利要求4所述的装置,其特征在于,所述预定大小为N个数据单元的大小,所述N个拼装单元中每个拼装单元中连续N次输入的数据单元拼装为所述预定大小的数据单元组。
  6. 根据权利要求2至5中任一项所述的装置,其特征在于,所述N个拼装单元中每个拼装单元包括第一缓存,所述第一缓存的大小为所述预定大小;
    所述第一缓存用于进行数据单元拼装;
    所述存储模块用于将所述第一缓存拼装后的数据单元组存储到所述存储器中。
  7. 根据权利要求2至5中任一项所述的装置,其特征在于,所述N个拼装单元中每个拼装单元包括第一缓存和第二缓存,所述第一缓存和所述第二缓存的大小均为所述预定大小;
    所述第一缓存用于进行数据单元拼装,并将拼装后的数据单元组缓存至所述第二缓存;
    所述存储模块用于将所述第二缓存中拼装后的数据单元组存储到所述 存储器中。
  8. 根据权利要求7所述的装置,其特征在于,所述存储模块用于根据轮询算法,依次从所述N个拼装单元中每个拼装单元的第二缓存中读取拼装后的数据单元组并存储到所述存储器中。
  9. 根据权利要求1所述的装置,其特征在于,所述拼装模块包括第一拼装单元和第二拼装单元,其中,所述第一拼装单元用于将特定奇数行的数据单元拼装为所述预定大小的数据单元组,所述第二拼装单元用于将特定偶数行的数据单元拼装为所述预定大小的数据单元组,其中,所述特定奇数行表示所述至少一个输出特征图中每一个输出特征图的奇数行,所述特定偶数行表示所述至少一个输出特征图中每一个输出特征图的偶数行。
  10. 根据权利要求9所述的装置,其特征在于,所述装置还包括:
    分发模块,用于将所述特定奇数行的数据单元分发至所述第一拼装单元,将所述特定偶数行的数据单元分发至所述第二拼装单元。
  11. 根据权利要求9或10所述的装置,其特征在于,所述第一拼装单元和所述第二拼装单元均包括N个先入先出队列FIFO;
    所述特定奇数行的数据单元中的第p*N+i个数据单元被输入到所述第一拼装单元的第i个FIFO中,所述第一拼装单元的N个FIFO中所述特定奇数行的N个数据单元拼装为所述预定大小的数据单元组;
    所述特定偶数行的数据单元中的第p*N+i个数据单元被输入到所述第二拼装单元的第i个FIFO中,所述第二拼装单元的N个FIFO中所述特定偶数行的N个数据单元拼装为所述预定大小的数据单元组,其中,N为大于1的正整数,i为不大于N的正整数,p为零或正整数。
  12. 根据权利要求11所述的装置,其特征在于,所述乘累加单元一次输出N个数据单元,所述N个数据单元分别属于N个输出特征图;
    所述分发模块用于将所述N个数据单元分别分发至对应的FIFO中,其中,所述特定奇数行数据单元中的第p*N+i个数据单元分发至所述第一拼装单元的第i个FIFO中,所述特定偶数行数据单元中的第p*N+i个数据单元分发至所述第二拼装单元的第i个FIFO中。
  13. 根据权利要求11或12所述的装置,其特征在于,所述存储模块用于将所述第一拼装单元或所述第二拼装单元的N个FIFO中拼装后的数据单元组存储到所述存储器中。
  14. 根据权利要求2至13中任一项所述的装置,其特征在于,所述装置还包括:
    控制模块,用于控制所述乘累加单元输出计算结果的速度。
  15. 一种数据存储的方法,其特征在于,包括:
    获取乘累加单元乘累加后的计算结果,所述计算结果包括至少一个输出特征图的数据单元;
    将所述至少一个输出特征图中每一个输出特征图的数据单元拼装为预定大小的数据单元组;
    将所述数据单元组存储到存储器中,其中,所述预定大小为所述存储器中存储单元的大小。
  16. 根据权利要求15所述的方法,其特征在于,所述将所述至少一个输出特征图中每一个输出特征图的数据单元拼装为预定大小的数据单元组包括:
    通过N个拼装单元中每个拼装单元将一个输出特征图的数据单元拼装为所述预定大小的数据单元组,N为大于1的正整数。
  17. 根据权利要求16所述的方法,其特征在于,所述获取乘累加单元乘累加后的计算结果,包括:
    获取所述乘累加单元一次输出的N个数据单元,所述N个数据单元分别属于N个输出特征图;
    所述方法还包括:
    将所述N个数据单元分别分发至所述N个拼装单元。
  18. 根据权利要求17所述的方法,其特征在于,所述通过N个拼装单元中每个拼装单元将一个输出特征图的数据单元拼装为所述预定大小的数据单元组,包括:
    将所述N个拼装单元中每个拼装单元中连续多次输入的数据单元拼装为所述预定大小的数据单元组。
  19. 根据权利要求18所述的方法,其特征在于,所述预定大小为N个数据单元的大小;
    所述将所述N个拼装单元中每个拼装单元中连续多次输入的数据单元拼装为所述预定大小的数据单元组,包括:
    将所述N个拼装单元中每个拼装单元中连续N次输入的数据单元拼装 为所述预定大小的数据单元组。
  20. 根据权利要求16至19中任一项所述的方法,其特征在于,所述N个拼装单元中每个拼装单元包括第一缓存,所述第一缓存的大小为所述预定大小;
    所述通过N个拼装单元中每个拼装单元将一个输出特征图的数据单元拼装为所述预定大小的数据单元组,包括:
    通过所述第一缓存将一个输出特征图的数据单元拼装为所述预定大小的数据单元组;
    所述将所述数据单元组存储到存储器中,包括:
    将所述第一缓存拼装后的数据单元组存储到所述存储器中。
  21. 根据权利要求16至19中任一项所述的方法,其特征在于,所述N个拼装单元中每个拼装单元包括第一缓存和第二缓存,所述第一缓存和所述第二缓存的大小均为所述预定大小;
    所述通过N个拼装单元中每个拼装单元将一个输出特征图的数据单元拼装为所述预定大小的数据单元组,包括:
    通过所述第一缓存将一个输出特征图的数据单元拼装为所述预定大小的数据单元组;
    所述方法还包括:
    将所述第一缓存拼装后的数据单元组缓存至所述第二缓存;
    所述将所述数据单元组存储到存储器中,包括:
    将所述第二缓存中拼装后的数据单元组存储到所述存储器中。
  22. 根据权利要求21所述的方法,其特征在于,所述将所述第二缓存中拼装后的数据单元组存储到所述存储器中,包括:
    根据轮询算法,依次从所述N个拼装单元中每个拼装单元的第二缓存中读取拼装后的数据单元组并存储到所述存储器中。
  23. 根据权利要求15所述的方法,其特征在于,所述将所述至少一个输出特征图中每一个输出特征图的数据单元拼装为预定大小的数据单元组包括:
    通过第一拼装单元将特定奇数行的数据单元拼装为所述预定大小的数据单元组,其中,所述特定奇数行表示所述至少一个输出特征图中每一个输出特征图的奇数行;
    通过第二拼装单元将特定偶数行的数据单元拼装为所述预定大小的数据单元组,其中,所述特定偶数行表示所述至少一个输出特征图中每一个输出特征图的偶数行。
  24. 根据权利要求23所述的方法,其特征在于,所述方法还包括:
    将所述特定奇数行的数据单元分发至所述第一拼装单元,将所述特定偶数行的数据单元分发至所述第二拼装单元。
  25. 根据权利要求24所述的方法,其特征在于,所述第一拼装单元和所述第二拼装单元均包括N个先入先出队列FIFO;
    所述将所述特定奇数行的数据单元分发至所述第一拼装单元,包括:
    将所述特定奇数行的数据单元中的第p*N+i个数据单元分发至所述第一拼装单元的第i个FIFO中;
    所述通过第一拼装单元将特定奇数行的数据单元拼装为所述预定大小的数据单元组,包括:
    将所述第一拼装单元的N个FIFO中所述特定奇数行的N个数据单元拼装为所述预定大小的数据单元组;
    所述将所述特定偶数行的数据单元分发至所述第二拼装单元,包括:
    将所述特定偶数行的数据单元中的第p*N+i个数据单元分发至所述第二拼装单元的第i个FIFO中;
    所述通过第二拼装单元将特定偶数行的数据单元拼装为所述预定大小的数据单元组,包括:
    将所述第二拼装单元的N个FIFO中所述特定偶数行的N个数据单元拼装为所述预定大小的数据单元组,其中,N为大于1的正整数,i为不大于N的正整数,p为零或正整数。
  26. 根据权利要求25所述的方法,其特征在于,所述获取乘累加单元乘累加后的计算结果,包括:
    获取所述乘累加单元一次输出的N个数据单元,所述N个数据单元分别属于N个输出特征图;
    所述将所述特定奇数行的数据单元中的第p*N+i个数据单元分发至所述第一拼装单元的第i个FIFO中,包括:
    将所述N个数据单元中所述特定奇数行的数据单元中的第p*N+i个数据单元分发至所述第一拼装单元的第i个FIFO中;
    所述将所述第二拼装单元的N个FIFO中所述特定偶数行的N个数据单元拼装为所述预定大小的数据单元组,包括:
    将所述N个数据单元中所述特定偶数行的数据单元中的第p*N+i个数据单元分发至所述第二拼装单元的第i个FIFO中。
  27. 根据权利要求25或26所述的方法,其特征在于,所述将所述数据单元组存储到存储器中,包括:
    将所述第一拼装单元或所述第二拼装单元的N个FIFO中拼装后的数据单元组存储到所述存储器中。
  28. 根据权利要求16至27中任一项所述的方法,其特征在于,所述方法还包括:
    控制所述乘累加单元输出计算结果的速度。
  29. 一种处理器,其特征在于,包括乘累加单元以及根据权利要求1至14中任一项所述的数据存储的装置。
  30. 一种可移动设备,其特征在于,包括:
    根据权利要求1至14中任一项所述的数据存储的装置;或者,
    根据权利要求29所述的处理器。
PCT/CN2018/109327 2018-10-08 2018-10-08 数据存储的装置、方法、处理器和可移动设备 WO2020073164A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201880040193.XA CN110770763A (zh) 2018-10-08 2018-10-08 数据存储的装置、方法、处理器和可移动设备
PCT/CN2018/109327 WO2020073164A1 (zh) 2018-10-08 2018-10-08 数据存储的装置、方法、处理器和可移动设备

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/109327 WO2020073164A1 (zh) 2018-10-08 2018-10-08 数据存储的装置、方法、处理器和可移动设备

Publications (1)

Publication Number Publication Date
WO2020073164A1 true WO2020073164A1 (zh) 2020-04-16

Family

ID=69328581

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/109327 WO2020073164A1 (zh) 2018-10-08 2018-10-08 数据存储的装置、方法、处理器和可移动设备

Country Status (2)

Country Link
CN (1) CN110770763A (zh)
WO (1) WO2020073164A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105260773A (zh) * 2015-09-18 2016-01-20 华为技术有限公司 一种图像处理装置以及图像处理方法
CN107844826A (zh) * 2017-10-30 2018-03-27 中国科学院计算技术研究所 神经网络处理单元及包含该处理单元的处理系统
CN108229645A (zh) * 2017-04-28 2018-06-29 北京市商汤科技开发有限公司 卷积加速和计算处理方法、装置、电子设备及存储介质
US20180253635A1 (en) * 2017-03-03 2018-09-06 Samsung Electronics Co, Ltd. Neural network devices and methods of operating the same

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI423682B (zh) * 2010-10-29 2014-01-11 Altek Corp 影像處理方法
CN108205702B (zh) * 2017-12-29 2020-12-01 中国人民解放军国防科技大学 一种多输入多输出矩阵卷积的并行处理方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105260773A (zh) * 2015-09-18 2016-01-20 华为技术有限公司 一种图像处理装置以及图像处理方法
US20180253635A1 (en) * 2017-03-03 2018-09-06 Samsung Electronics Co, Ltd. Neural network devices and methods of operating the same
CN108229645A (zh) * 2017-04-28 2018-06-29 北京市商汤科技开发有限公司 卷积加速和计算处理方法、装置、电子设备及存储介质
CN107844826A (zh) * 2017-10-30 2018-03-27 中国科学院计算技术研究所 神经网络处理单元及包含该处理单元的处理系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
BAO, XIANLIANG: "Design and Implementation of a High-performance Accelerator Dedicated for Convolutional Neural Networks", CHINA MASTER S THESES FULL-TEXT DATABASE, 27 May 2018 (2018-05-27), pages 34, XP055701141 *

Also Published As

Publication number Publication date
CN110770763A (zh) 2020-02-07

Similar Documents

Publication Publication Date Title
WO2020019174A1 (zh) 数据存取的方法、处理器、计算机系统和可移动设备
CN107747941B (zh) 一种双目视觉定位方法、装置及系统
US20200285942A1 (en) Method, apparatus, accelerator, system and movable device for processing neural network
CN108605098B (zh) 用于卷帘快门校正的系统和方法
CN107341814B (zh) 基于稀疏直接法的四旋翼无人机单目视觉测程方法
US10726616B2 (en) System and method for processing captured images
CN110296717B (zh) 一种事件数据流的处理方法及计算设备
WO2018218481A1 (zh) 神经网络训练的方法、装置、计算机系统和可移动设备
JPWO2019204876A5 (zh)
JP6441586B2 (ja) 情報処理装置および情報処理方法
CN112136137A (zh) 一种参数优化方法、装置及控制设备、飞行器
WO2020124678A1 (zh) 一种基于函数迭代积分的惯性导航解算方法及系统
WO2019191288A1 (en) Direct sparse visual-inertial odometry using dynamic marginalization
CN114041140A (zh) 事件驱动脉冲卷积神经网络
WO2020155044A1 (zh) 卷积计算的装置、方法、处理器和可移动设备
JP6384000B1 (ja) 制御装置、撮像装置、撮像システム、移動体、制御方法、及びプログラム
Müller et al. Efficient probabilistic localization for autonomous indoor airships using sonar, air flow, and IMU sensors
WO2020073164A1 (zh) 数据存储的装置、方法、处理器和可移动设备
US20200134771A1 (en) Image processing method, chip, processor, system, and mobile device
Watman et al. Design of a miniature, multi-directional optical flow sensor for micro aerial vehicles
CN112470138A (zh) 计算装置、方法、处理器和可移动设备
CN108701348A (zh) 处理图像的方法、集成电路、处理器、系统和可移动设备
Konomura et al. Visual 3D self localization with 8 gram circuit board for very compact and fully autonomous unmanned aerial vehicles
CN112129272B (zh) 视觉里程计的实现方法和实现装置
CN112269187A (zh) 机器人状态检测方法、装置及设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18936500

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18936500

Country of ref document: EP

Kind code of ref document: A1