WO2021196160A1 - Data storage management apparatus and processing core - Google Patents

Data storage management apparatus and processing core Download PDF

Info

Publication number
WO2021196160A1
WO2021196160A1 PCT/CN2020/083208 CN2020083208W WO2021196160A1 WO 2021196160 A1 WO2021196160 A1 WO 2021196160A1 CN 2020083208 W CN2020083208 W CN 2020083208W WO 2021196160 A1 WO2021196160 A1 WO 2021196160A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
ram
random access
access memory
processing unit
Prior art date
Application number
PCT/CN2020/083208
Other languages
French (fr)
Chinese (zh)
Inventor
罗飞
王维伟
Original Assignee
北京希姆计算科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京希姆计算科技有限公司 filed Critical 北京希姆计算科技有限公司
Priority to CN202080096316.9A priority Critical patent/CN115380292A/en
Priority to PCT/CN2020/083208 priority patent/WO2021196160A1/en
Publication of WO2021196160A1 publication Critical patent/WO2021196160A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Definitions

  • the invention relates to the technical field of processing cores, in particular to a data storage management device and a processing core.
  • the chip is the cornerstone of data processing, and it fundamentally determines the ability of people to process data. From the perspective of application areas, there are two main routes for chips: one is a general chip route, such as CPU, etc., which can provide great flexibility, but the effective computing power is relatively low when processing algorithms in a specific field; the other is a dedicated chip Routes, such as TPU, can exert high effective computing power in some specific fields, but they have poor processing capabilities or even unable to handle flexible and versatile fields.
  • a general chip route such as CPU, etc.
  • TPU dedicated chip Routes
  • the chip Due to the wide variety and huge amount of data in the intelligent era, the chip is required to have extremely high flexibility, capable of processing different fields and rapidly changing algorithms, and extremely strong processing capabilities, which can quickly process extremely large and rapidly increasing data. quantity.
  • the invention provides a data storage management device and a processing core, which can eliminate the decrease in calculation efficiency caused by Cache access failure, and improve the controllability of program efficiency.
  • the first aspect of the present invention provides a data storage management device, including: at least two random access memories RAM; a control unit for receiving instructions, generating and sending control signals according to the instructions; direct memory access controller DMAC , Used to realize the access to the data in the random access memory RAM according to the control signal.
  • the data storage management device receives and responds to instructions sent from an external processing unit, and reads data from the external storage unit, so that the external processing unit can directly read from the data storage management device when executing a program
  • the external processing unit does not need to fetch the data from the external storage unit through the high-speed Cache, which eliminates the decrease in computing efficiency caused by the Cache access failure, and improves the controllability of the program efficiency.
  • a direct memory access controller DMAC is configured to implement access to data in the RAM according to the control signal, including: the direct memory access controller DMAC is configured to send data from an external storage unit according to the control signal Read data in the control signal, and store the data in the random access memory RAM indicated by the control signal; or the direct memory access controller DMAC is used to obtain data from the control signal according to the control signal.
  • the indicated random access memory RAM reads data and stores the data in an external storage unit.
  • the number of the random access memory RAM indicated by the control signal is one or more.
  • all the addresses of the random access memory RAM and the addresses of the external storage unit are uniformly addressed.
  • the addresses of all the random access memory RAMs are uniformly addressed.
  • the access address range of the direct memory access controller DMAC is an address segment of the random access memory RAM and an address segment of an external storage unit.
  • a processing core including a processing unit, a storage unit, and the storage management device provided in the first aspect; the processing unit is configured to send instructions, and the instructions are used to instruct the The storage management device realizes the access to the data in the storage unit; the processing unit is also used to read the data required for executing the program from any random access memory RAM.
  • the instructions include a fetch instruction and a storage instruction; the processing unit is used to send a fetch instruction, and the fetch instruction is used to instruct the data storage management device to fetch data from the storage unit and store the data Stored in the random access memory RAM indicated by the fetch instruction.
  • the direct memory access controller DMAC is configured to send a storage completion signal after completing the fetch instruction; the processing unit is configured to issue a new fetch instruction according to the storage completion signal, and send a new fetch instruction from the The data is read from the random access memory RAM indicated by the fetch instruction.
  • the processing unit and the direct memory access controller DMAC access different random access memory RAMs.
  • the processing unit and the direct memory access controller DMAC access different random access memory RAMs, including: the at least two random access memory RAMs include a first random access memory RAM. Fetch the memory RAM and the second random access memory RAM; at the first time, the processing unit reads the first data from the first random access memory RAM, and the direct memory access controller DMAC sends the second random access memory to the The second data retrieved from the storage unit is written in the access memory RAM; at the second time, the processing unit reads the second data from the second random access memory RAM, and the direct The memory access controller DMAC writes the third data retrieved from the storage unit into the first random access memory RAM.
  • the first random access memory RAM is a first group of random access memory RAM including a plurality of RAMs
  • the second random access memory RAM is a second group of random access memory RAM including a plurality of random access memories RAM .
  • the processing unit and the direct memory access controller DMAC are respectively Each random access memory RAM in a group is accessed, or, at the same time, the processing unit and the direct memory access controller DMAC access RAMs belonging to two groups.
  • a chip including one or more processing cores provided in the second aspect.
  • a card board which includes one or more chips provided in the third aspect.
  • an electronic device including one or more cards provided in the fourth aspect.
  • a control unit receives an instruction, generates and sends a control signal according to the instruction; the direct memory access controller DMAC realizes the control of the random memory according to the control signal. Access to the data in the memory RAM.
  • an electronic device including: a memory for storing computer-readable instructions; and one or more processors for running the computer-readable instructions so that the processor The method for realizing any of the aforementioned data storage management in the sixth aspect at runtime.
  • a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute any of the aforementioned sixth aspects Describe the method of data storage management.
  • a computer program product which includes computer instructions, and when the computer instructions are executed by a computing device, the computing device can execute any of the data storage in the sixth aspect. Methods of management.
  • the data storage management device receives and responds to instructions sent from an external processing unit, and reads data from the external storage unit, so that the external processing unit can directly read from the data storage management device when executing a program
  • the external processing unit does not need to fetch the data from the external storage unit through the high-speed Cache, which eliminates the decrease in computing efficiency caused by the Cache access failure, and improves the controllability of the program efficiency.
  • FIG. 1 is a schematic diagram of reading data in a processing core in the prior art
  • Figure 2 is a schematic structural diagram of a data storage management device according to an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a processing core according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of the structure of a neural network according to an embodiment of the present invention.
  • Fig. 5 is a schematic structural diagram of a processing core according to an embodiment of the present invention.
  • Fig. 6 is a sequence diagram of neural network calculation performed by a processing core according to an embodiment of the present invention.
  • Fig. 7 is a schematic flowchart of a data storage management method according to an embodiment of the present invention.
  • multi-core or many-core chips are often used.
  • the cores in the multi-core architecture all have a certain degree of independent processing capability, and have a relatively large internal storage space for storing their own programs, data, and weights.
  • the play of the basic computing power of a single core determines the ability of the entire chip to compute neural networks.
  • the performance of the basic computing power of a single-core is determined by the ideal computing power and storage access efficiency of the single-core computing unit.
  • SRAM Static Random Access Memory
  • DDR SDRAM Double Data Rate Synchronous Dynamic Access Memory
  • the general concern is the access of the processing unit to the memory unit.
  • the speed of the processing unit is very fast, and its main frequency is generally several hundred MHz (megahertz) to several GHz (gigahertz), that is, ps to ns level, and the access speed of the memory unit is tens of ns level, both There is a big difference in speed.
  • How to solve the speed difference between the processing unit and the memory access, and effectively utilize the computing power of the processing unit, is a difficult point in modern CPU design.
  • Figure 1 is a schematic diagram of reading data in a processing core.
  • a high-speed cache is inserted between the processing unit (PU) and the storage unit memory.
  • the PU accesses the Memory in a hierarchical and indirect manner, that is, the PU directly accesses the Cache.
  • PU accesses Memory indirectly through Cache.
  • Cache is a mapping of Memory, and its content is a subset of the memory content.
  • the Cache has no independent organization space, and the address of the Cache is the same as the address of the accessed memory.
  • the PU when the PU is executing the program, it reads some data from the Memory through the Cache, that is, the Cache saves this part of the data. When the PU needs to use this part of the data again in a short time, the PU will directly call it from the Cache .
  • the Cache is transparent and has no functional significance, that is, the program cannot access the Cache alone, that is, the program thinks that the PU has retrieved data from the memory, but in fact the PU is called from the Cache. Fetched data.
  • Fig. 2 is a schematic structural diagram of a data storage management device according to an embodiment of the present invention.
  • the data storage management device includes: at least two random access memories RAM, a control unit, and a direct memory access controller (DMAC).
  • RAM random access memories
  • control unit a control unit
  • DMAC direct memory access controller
  • the data storage management device may be arranged in the processing core.
  • At least two RAMs include RAM_0, RAM_1...RAM_N.
  • the data storage management device has at least two RAMs, and each RAM can be accessed independently and in parallel.
  • the storage capacity of all RAMs can be the same or different.
  • the control unit is used for receiving instructions, generating and sending control signals C_DMAC according to the instructions. Among them, the instruction is sent by the processing unit PU located outside the data storage management device.
  • DMAC is used to realize the access to the data in the RAM according to the control signal.
  • the DMAC is used to implement access to data in RAM according to a control signal, including: DMAC is used to read data from an external storage unit Memory according to the control signal, and store the data in In the RAM indicated by the control signal.
  • the DMAC is used to implement access to data in RAM according to a control signal, including: DMAC is used to read data from the RAM indicated by the control signal according to the control signal, and The data is stored in an external memory.
  • the number of RAM indicated by the control signal is one or more.
  • the access addresses of all RAMs are addressed uniformly. More preferably, the access addresses of the RAMs are programmed continuously to reduce the complexity of program control.
  • the addresses of all RAMs are uniformly addressed with the addresses of the external storage unit.
  • the data storage management device contains two RAMs, namely RAM_0 and RAM_1.
  • the address of RAM_0 is 0000H-0FFFH
  • the address of RAM_1 is 1000H-1FFFH
  • the address of the external memory is 2000H-FFFFH.
  • the access address of the DMAC is a full address range, specifically all RAM address segments and external memory address segments.
  • the access address of PU is the address segment of all RAM.
  • the DMAC after the DMAC reads data from the external memory according to the control signal and stores the data in the RAM indicated by the control signal, it sends a storage completion signal to the external processing unit, and the storage is completed The signal is used to prompt the external processing unit to read data from the RAM that has just completed storage.
  • Fig. 3 is a schematic structural diagram of a processing core according to an embodiment of the present invention.
  • the processing core includes a processing unit PU, a storage unit memory, and the data storage management device provided in the above-mentioned embodiment.
  • the PU is used to send instructions, and the instructions are used to instruct the data storage management device to implement access to the data in the memory.
  • the instructions include fetch instructions and store instructions.
  • the processing unit is configured to send instructions, where the instructions are used to instruct the data storage management device to implement access to the data in the memory, including:
  • the PU is used to send a fetch instruction, and the fetch instruction is used to instruct the data storage management device to read data from the memory and store the data in the RAM indicated by the fetch instruction.
  • the DAMC sends a storage complete signal to the PU.
  • the PU is also used to read data needed to execute the program from any RAM.
  • the PU is used to issue a new fetch instruction every time after receiving a storage completion signal sent by the DMAC, and read data from the RAM that has just completed storage.
  • the PU when the PU receives the storage completion signal sent by the DMAC, it issues a new fetch instruction, and then reads the data from the RAM that has just completed the storage, so that the DMAC reads the data from the memory according to the new fetch instruction And stored in the corresponding RAM, and the PU fetches the data from the RAM that has just completed the storage to execute the program in parallel, which improves the efficiency of the operation.
  • the PU can first read the data from the RAM that has just completed storage, and then issue a new fetch instruction.
  • the PU and the DMAC access different RAMs.
  • the at least two RAMs include a first RAM and a second RAM.
  • the PU reads the first data from the first RAM, and the DMAC writes the second data retrieved from the memory into the second RAM.
  • the PU reads the second data from the second RAM, and the DMAC writes the third data retrieved from the memory into the first RAM.
  • PU and DMAC can also access the same RAM at the same time, and RAM responds to PU and DMAC serially.
  • the PU and DMAC can also access the same RAM at the same time, and the dual-port RAM responds to the PU and DMAC in parallel.
  • first RAM may be a first group of RAMs including a plurality of RAMs
  • second RAM may be a second group of RAMs including a plurality of RAMs.
  • the number of RAMs in the first group of RAMs may be the same as or different from the number of RAMs in the second group.
  • the PU and DMAC can respectively access each RAM in a group, or at the same time, the PU and DMAC accesses belong to two groups. RAM for each group.
  • the PU reads the first data from the first group of RAM, and the DMAC writes the second data retrieved from the memory to the second group of RAM; at the second time, the PU reads from the second group of RAM The second data is read in the DMAC, and the DMAC writes the third data retrieved from the memory to the first group of RAM.
  • the processing unit or the DMAC may also time-sharing access to each RAM in the same group.
  • the PU fetches data from the RAM and the DMAC stores the data stored in the memory in the RAM can be processed in parallel, which can further improve the computing power of the processing core, and is more suitable for neural network operations.
  • there is no need to design a complex Cache circuit in the processing core which saves the cost of the processing core and reduces the difficulty of chip design.
  • the processing unit does not need to fetch data from external memory through the high-speed cache. The decrease in computing efficiency caused by Cache access failure is eliminated, and the processing core can directly call data from the RAM of the storage management device, which improves the controllability of program efficiency.
  • Fig. 4 is a schematic structural diagram of a neural network according to an embodiment of the present invention.
  • the neural network has two layers, the output of the first layer is used as the input of the second layer, and the output of the second layer is the output of the entire neural network.
  • Fig. 5 is a schematic structural diagram of a processing core according to an embodiment of the present invention.
  • the processing core shown in FIG. 5 is used to realize the calculation of the neural network shown in FIG. 4.
  • the processing core includes a data storage management device, a processing unit, and a storage unit.
  • the data storage management device includes RAM_0, RAM_1, DAMC and a control unit.
  • RAM_0 and RAM_1 can hold the parameters and data calculated by the next layer of neural network.
  • PU and DMAC access different RAMs, so that the PU execution program and DMAC can store data in parallel, thereby optimizing calculation and storage efficiency. For example, at the first time, PU accesses RAM_0, and DAMC accesses RAM_1. At the second time point, PU accesses RAM_1, DAMC accesses RAM_0, and so on.
  • the PU sends the instruction lls_dis, the control unit receives the instruction, and generates a control signal C_DMAC and sends it to the DMAC.
  • the DMAC reads the data indicated by the instruction from the memory according to the instruction, and stores the data in the instruction indicated by the instruction.
  • RAM_0 when the DMAC has finished storing the data, it sends a storage completion signal to the PU; the PU receives the storage completion signal and issues a new instruction to instruct the DMAC to read new data from memory and store it in RAM_1, and then Read data from RAM_0.
  • the DMAC stores the data in RAM_1, it sends a signal that the storage is complete.
  • the PU issues an instruction again and then reads the data from RAM_1.
  • the instruction issued again instructs the DMAC to read new data from memory and store it in RAM_0.
  • DMAC storage data and PU read data realize parallel processing, so that both calculation and storage can maximize efficiency.
  • Fig. 6 is a sequence diagram of a neural network calculation performed by a processing core according to an embodiment of the present invention.
  • the program executed by the PU at t2 can be set to be the same as the program executed at t1, that is, the calculation of the first layer of the same neural network is also performed at t2, or the program executed by the PU at t2 can be set It is different from the program executed at t1, and the present invention is not limited to this.
  • a chip including one or more processing cores provided in the foregoing embodiments.
  • a card board which includes one or more chips provided in the foregoing embodiments.
  • an electronic device including one or more of the card boards provided in the foregoing embodiments.
  • FIG. 7 is a data storage management method provided by an embodiment of the present invention. The method includes: step S101-step S102;
  • step S101 the control unit receives an instruction, generates and sends a control signal according to the instruction.
  • Step S102 the direct memory access controller DMAC realizes the access to the data in the RAM according to the control signal.
  • the DMAC implements the access to the data in the RAM according to the control signal, including: the DMAC reads data from the external memory according to the control signal, and stores the data in the RAM indicated by the control signal.
  • the DMAC implements the access to the data in the RAM according to the control signal, including: the DMAC reads data from the RAM indicated by the control signal according to the control signal, and stores the data in an external memory.
  • An embodiment of the present invention provides a schematic flowchart of a method for processing core processing data.
  • the method includes: step S201-step S202,
  • Step S201 the processing unit sends a fetch instruction
  • Step S202 The data storage management device reads data from the storage unit according to the fetch instruction, and stores the data in the RAM of the data storage management device indicated by the fetch instruction.
  • the data storage management device sends a storage completion signal after storing the data in the RAM of the data storage management device indicated by the fetch instruction.
  • the processing unit receives a storage completion signal sent by the DMAC, it issues a new fetch instruction, and reads data from the RAM indicated by the fetch instruction just completed.
  • the processing unit and the direct memory access controller DMAC access different said RAMs.
  • the processing unit reads the first data from the first RAM, and the DMAC writes the second data retrieved from the memory into the second RAM.
  • the PU reads the second data from the second RAM, and the DMAC writes the third data retrieved from the memory into the first RAM.
  • an electronic device including: a memory for storing computer-readable instructions; and one or more processors for running the computer-readable instructions so that the processor
  • the method of data storage management of the foregoing embodiment is implemented at runtime.
  • a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the data storage management method of the foregoing embodiment .
  • a computer program product which includes computer instructions, and when the computer instructions are executed by a computing device, the computing device can execute the data storage management method of the foregoing embodiment.

Abstract

A data storage management apparatus and a processing core. The apparatus comprises: at least two random access memories (RAM); a control unit, receiving an instruction, generating and sending a control signal according to the instruction (S101); and a direct memory access controller (DMAC), achieving access of data in the RAM according to the control signal (S102). The data storage management apparatus receives and responds to an instruction sent from an external processing unit, and data is read from an external storage unit, so that data required for executing a program can be directly read from the data storage management apparatus when the external processing unit executes the program, and the external processing unit does not need to fetch a number from the external storage unit by means of a Cache, thereby eliminating the decrease of computing efficiency due to a Cache access failure, and improving the controllability of program efficiency.

Description

一种数据存储管理装置及处理核Data storage management device and processing core 技术领域Technical field
本发明涉及到处理核技术领域,尤其是涉及到一种数据存储管理装置及处理核。The invention relates to the technical field of processing cores, in particular to a data storage management device and a processing core.
背景技术Background technique
随着科学技术的发展,人类社会正在快速进入智能时代。智能时代的重要特点,就是人们获得数据的种类越来越多,获得数据的量越来越大,而对处理数据的速度要求越来越高。With the development of science and technology, human society is rapidly entering the era of intelligence. The important feature of the intelligent age is that people have more and more types of data, the amount of data they can obtain is larger and larger, and the requirements for the speed of data processing are getting higher and higher.
芯片是数据处理的基石,它从根本上决定了人们处理数据的能力。从应用领域来看,芯片主要有两条路线:一条是通用芯片路线,例如CPU等,它们能提供极大的灵活性,但是在处理特定领域算法时有效算力比较低;另一条是专用芯片路线,例如TPU等,它们在某些特定领域,能发挥较高的有效算力,但是面对灵活多变的比较通用的领域,它们处理能力比较差甚至无法处理。The chip is the cornerstone of data processing, and it fundamentally determines the ability of people to process data. From the perspective of application areas, there are two main routes for chips: one is a general chip route, such as CPU, etc., which can provide great flexibility, but the effective computing power is relatively low when processing algorithms in a specific field; the other is a dedicated chip Routes, such as TPU, can exert high effective computing power in some specific fields, but they have poor processing capabilities or even unable to handle flexible and versatile fields.
由于智能时代的数据种类繁多且数量巨大,所以要求芯片既具有极高的灵活性,能处理不同领域且日新月异的算法,又具有极强的处理能力,能快速处理极大的且急剧增长的数据量。Due to the wide variety and huge amount of data in the intelligent era, the chip is required to have extremely high flexibility, capable of processing different fields and rapidly changing algorithms, and extremely strong processing capabilities, which can quickly process extremely large and rapidly increasing data. quantity.
发明内容Summary of the invention
本发明提供一种数据存储管理装置及处理核,可以消除由于Cache访问失效带来的计算效率的下降,提高了程序效率的可控性。The invention provides a data storage management device and a processing core, which can eliminate the decrease in calculation efficiency caused by Cache access failure, and improve the controllability of program efficiency.
本发明的第一方面提供了一种数据存储管理装置,包括:至少两个随机 存取存储器RAM;控制单元,用于接收指令,根据所述指令生成并发送控制信号;直接存储器访问控制器DMAC,用于根据所述控制信号实现对所述随机存取存储器RAM中数据的存取。The first aspect of the present invention provides a data storage management device, including: at least two random access memories RAM; a control unit for receiving instructions, generating and sending control signals according to the instructions; direct memory access controller DMAC , Used to realize the access to the data in the random access memory RAM according to the control signal.
本发明实施方式提供的数据存储管理装置,接收并响应来自外部的处理单元发送的指令,从外部的存储单元中读取数据,使得外部的处理单元在执行程序时能直接从数据存储管理装置中读取执行程序所需的数据,外部处理单元无需通过高速缓存Cache从外部存储单元中取数,消除了由于Cache访问失效带来的计算效率的下降,提高了程序效率的可控性。The data storage management device provided by the embodiment of the present invention receives and responds to instructions sent from an external processing unit, and reads data from the external storage unit, so that the external processing unit can directly read from the data storage management device when executing a program To read the data required to execute the program, the external processing unit does not need to fetch the data from the external storage unit through the high-speed Cache, which eliminates the decrease in computing efficiency caused by the Cache access failure, and improves the controllability of the program efficiency.
优选的,直接存储器访问控制器DMAC,用于根据所述控制信号实现对所述RAM中数据的存取,包括:所述直接存储器访问控制器DMAC,用于根据所述控制信号从外部存储单元中读取数据,并将所述数据存储在所述控制信号指示的所述随机存取存储器RAM中;或所述直接存储器访问控制器DMAC,用于根据所述控制信号从所述控制信号所指示的所述随机存取存储器RAM中读取数据,并将所述数据存储在外部存储单元中。Preferably, a direct memory access controller DMAC is configured to implement access to data in the RAM according to the control signal, including: the direct memory access controller DMAC is configured to send data from an external storage unit according to the control signal Read data in the control signal, and store the data in the random access memory RAM indicated by the control signal; or the direct memory access controller DMAC is used to obtain data from the control signal according to the control signal. The indicated random access memory RAM reads data and stores the data in an external storage unit.
优选的,控制信号所指示的所述随机存取存储器RAM的个数为一个或多个。Preferably, the number of the random access memory RAM indicated by the control signal is one or more.
优选的,所有的所述随机存取存储器RAM的地址与所述外部存储单元的地址统一编址。或,所有的所述随机存取存储器RAM的地址统一编址。Preferably, all the addresses of the random access memory RAM and the addresses of the external storage unit are uniformly addressed. Or, the addresses of all the random access memory RAMs are uniformly addressed.
优选的,所述直接存储器访问控制器DMAC的访问地址范围为所述随机存取存储器RAM的地址段和外部的存储单元的地址段。Preferably, the access address range of the direct memory access controller DMAC is an address segment of the random access memory RAM and an address segment of an external storage unit.
根据本发明的另一方面,还提供了一种处理核,包括处理单元、存储单元和第一方面提供的存储管理装置;所述处理单元,用于发送指令,所述指令用于指示所述存储管理装置实现对所述存储单元中数据的存取;所述处理单元,还用于从任意所述随机存取存储器RAM中读取执行程序所需的数据。According to another aspect of the present invention, there is also provided a processing core, including a processing unit, a storage unit, and the storage management device provided in the first aspect; the processing unit is configured to send instructions, and the instructions are used to instruct the The storage management device realizes the access to the data in the storage unit; the processing unit is also used to read the data required for executing the program from any random access memory RAM.
优选的,其中指令包括取数指令和存数指令;处理单元用于发送取数指令,所述取数指令用于指示所述数据存储管理装置从所述存储单元中取出数据并将所述数据存储至所述取数指令指示的随机存取存储器RAM中。Preferably, the instructions include a fetch instruction and a storage instruction; the processing unit is used to send a fetch instruction, and the fetch instruction is used to instruct the data storage management device to fetch data from the storage unit and store the data Stored in the random access memory RAM indicated by the fetch instruction.
优选的,直接存储器访问控制器DMAC用于在完成所述取数指令后,发送存储完成的信号;所述处理单元,用于根据所述存储完成的信号,发出新的取数指令,并从所述取数指令指示的随机存取存储器RAM中读取数据。Preferably, the direct memory access controller DMAC is configured to send a storage completion signal after completing the fetch instruction; the processing unit is configured to issue a new fetch instruction according to the storage completion signal, and send a new fetch instruction from the The data is read from the random access memory RAM indicated by the fetch instruction.
优选的,在同一时间点,所述处理单元与所述直接存储器访问控制器DMAC访问不同的所述随机存取存储器RAM。Preferably, at the same point in time, the processing unit and the direct memory access controller DMAC access different random access memory RAMs.
优选的,所述在同一时间,所述处理单元与所述直接存储器访问控制器DMAC访问不同的所述随机存取存储器RAM,包括:所述至少两个随机存取存储器RAM包括第一随机存取存储器RAM和第二随机存取存储器RAM;在第一时间,所述处理单元从第一随机存取存储器RAM中读取第一数据,所述直接存储器访问控制器DMAC向所述第二随机存取存储器RAM中写入从所述存储单元中取出的第二数据;在第二时间,所述处理单元从所述第二随机存取存储器RAM中读取所述第二数据,所述直接存储器访问控制器DMAC向所述第一随机存取存储器RAM中写入从所述存储单元中取出的第三数据。Preferably, at the same time, the processing unit and the direct memory access controller DMAC access different random access memory RAMs, including: the at least two random access memory RAMs include a first random access memory RAM. Fetch the memory RAM and the second random access memory RAM; at the first time, the processing unit reads the first data from the first random access memory RAM, and the direct memory access controller DMAC sends the second random access memory to the The second data retrieved from the storage unit is written in the access memory RAM; at the second time, the processing unit reads the second data from the second random access memory RAM, and the direct The memory access controller DMAC writes the third data retrieved from the storage unit into the first random access memory RAM.
优选的,第一随机存取存储器RAM为包括多个RAM的第一组随机存取存储器RAM,第二随机存取存储器RAM为包括多个随机存取存储器RAM的第二组随机存取存储器RAM。Preferably, the first random access memory RAM is a first group of random access memory RAM including a plurality of RAMs, and the second random access memory RAM is a second group of random access memory RAM including a plurality of random access memories RAM .
当第一随机存取存储器RAM为第一组随机存取存储器RAM,第二随机存取存储器RAM为第二随机存取存储器RAM组时,在同一时间,处理单元和直接存储器访问控制器DMAC分别访问一个组内的各个随机存取存储器RAM,或者,在同一时间,处理单元和直接存储器访问控制器DMAC访问分属于两个组的RAM。When the first random access memory RAM is the first random access memory RAM and the second random access memory RAM is the second random access memory RAM group, at the same time, the processing unit and the direct memory access controller DMAC are respectively Each random access memory RAM in a group is accessed, or, at the same time, the processing unit and the direct memory access controller DMAC access RAMs belonging to two groups.
根据本发明的第三方面,提供了一种芯片,包括一个或多个第二方面提供的处理核。According to a third aspect of the present invention, there is provided a chip including one or more processing cores provided in the second aspect.
根据本发明的第四方面,提供了一种卡板,包括一个或多个第三方面提供的芯片。According to a fourth aspect of the present invention, a card board is provided, which includes one or more chips provided in the third aspect.
根据本发明的第五方面,提供了一种电子设备,包括一个或多个第四方面提供的卡板。According to a fifth aspect of the present invention, there is provided an electronic device including one or more cards provided in the fourth aspect.
根据本发明的第六方面,提供了一种数据存储管理方法,控制单元,接收指令,根据所述指令生成并发送控制信号;直接存储器访问控制器DMAC根据所述控制信号实现对所述随机存取存储器RAM中数据的存取。According to a sixth aspect of the present invention, there is provided a data storage management method. A control unit receives an instruction, generates and sends a control signal according to the instruction; the direct memory access controller DMAC realizes the control of the random memory according to the control signal. Access to the data in the memory RAM.
根据本发明的第七方面,提供了一种电子设备,包括:存储器,用于存储计算机可读指令;以及一个或多个处理器,用于运行所述计算机可读指令,使得所述处理器运行时实现前述第六方面中的任一所述数据存储管理的方法。According to a seventh aspect of the present invention, there is provided an electronic device, including: a memory for storing computer-readable instructions; and one or more processors for running the computer-readable instructions so that the processor The method for realizing any of the aforementioned data storage management in the sixth aspect at runtime.
根据本发明的第八方面,提供一种非暂态计算机可读存储介质,该非暂态计算机可读存储介质存储计算机指令,该计算机指令用于使计算机执行前述第六方面中的任一所述数据存储管理的方法。According to an eighth aspect of the present invention, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute any of the aforementioned sixth aspects Describe the method of data storage management.
根据本发明的第九方面,提供一种计算机程序产品,其中,包括计算机指令,当所述计算机指令被计算设备执行时,所述计算设备可以执行前述第六方面中的任一所述数据存储管理的方法。According to a ninth aspect of the present invention, there is provided a computer program product, which includes computer instructions, and when the computer instructions are executed by a computing device, the computing device can execute any of the data storage in the sixth aspect. Methods of management.
本发明实施方式提供的数据存储管理装置,接收并响应来自外部的处理单元发送的指令,从外部的存储单元中读取数据,使得外部的处理单元在执行程序时能直接从数据存储管理装置中读取执行程序所需的数据,外部处理单元无需通过高速缓存Cache从外部存储单元中取数,消除了由于Cache访问失效带来的计算效率的下降,提高了程序效率的可控性。The data storage management device provided by the embodiment of the present invention receives and responds to instructions sent from an external processing unit, and reads data from the external storage unit, so that the external processing unit can directly read from the data storage management device when executing a program To read the data required to execute the program, the external processing unit does not need to fetch the data from the external storage unit through the high-speed Cache, which eliminates the decrease in computing efficiency caused by the Cache access failure, and improves the controllability of the program efficiency.
附图说明Description of the drawings
图1是现有技术中处理核中读取数据的示意图;FIG. 1 is a schematic diagram of reading data in a processing core in the prior art;
图2是根据本发明一实施方式的数据存储管理装置的结构示意图;Figure 2 is a schematic structural diagram of a data storage management device according to an embodiment of the present invention;
图3是根据本发明一实施方式的处理核的结构示意图;FIG. 3 is a schematic structural diagram of a processing core according to an embodiment of the present invention;
图4是根据本发明一实施方式的神经网络的结构示意图;4 is a schematic diagram of the structure of a neural network according to an embodiment of the present invention;
图5是根据本发明一实施方式的处理核的结构示意图。Fig. 5 is a schematic structural diagram of a processing core according to an embodiment of the present invention.
图6是根据本发明一实施方式的处理核进行神经网络计算的时序图;Fig. 6 is a sequence diagram of neural network calculation performed by a processing core according to an embodiment of the present invention;
图7是根据本发明一实施方式的数据存储管理方法的流程示意图。Fig. 7 is a schematic flowchart of a data storage management method according to an embodiment of the present invention.
具体实施方式Detailed ways
为使本发明的目的、技术方案和优点更加清楚明了,下面结合具体实施方式并参照附图,对本发明进一步详细说明。应该理解,这些描述只是示例性的,而并非要限制本发明的范围。此外,在以下说明中,省略了对公知结构和技术的描述,以避免不必要地混淆本发明的概念。In order to make the objectives, technical solutions, and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with specific embodiments and with reference to the accompanying drawings. It should be understood that these descriptions are only exemplary, and are not intended to limit the scope of the present invention. In addition, in the following description, descriptions of well-known structures and technologies are omitted to avoid unnecessarily obscuring the concept of the present invention.
在神经网络计算中,经常会用到多核或者众核的芯片。此处多(众)核架构中的核,都有一定独立处理能力,并且带有比较大的核内存储空间,用于存储自身的程序、数据和权重。In neural network calculations, multi-core or many-core chips are often used. Here, the cores in the multi-core architecture all have a certain degree of independent processing capability, and have a relatively large internal storage space for storing their own programs, data, and weights.
单核的基础计算能力的发挥,决定了整个芯片计算神经网络的能力。而单核的基础计算能力的发挥,由单核的计算单元的理想计算能力及存储访问效率决定。The play of the basic computing power of a single core determines the ability of the entire chip to compute neural networks. The performance of the basic computing power of a single-core is determined by the ideal computing power and storage access efficiency of the single-core computing unit.
不同的存储单元,其被访问速度是不一样的。一般来说,访问寄存器的速度最快,一般访问一次用时几百ps(皮秒);其次是静态随机访问存储器(Static Random Access Memory,SRAM),一般访问一次用时在几百ps到几ns(纳秒)的范围内;再次是内存单元,也就是双倍速率同步动态随机访问存储器(Double Data Rate Synchronous Dynamic Access Memory,DDR SDRAM),一般访问一次用时在几十到几百ns;最后是通过IO口访问的其他存储器,如硬盘等,其访问速度缓慢,一般是ms(毫秒)级。Different storage units have different access speeds. Generally speaking, the speed of accessing registers is the fastest, which usually takes a few hundred ps (picoseconds) for one access; followed by Static Random Access Memory (SRAM), which generally takes a few hundred ps to a few ns ( In the range of nanoseconds); again is the memory unit, that is, Double Data Rate Synchronous Dynamic Access Memory (DDR SDRAM). Generally, it takes tens to hundreds of ns for one access; the last is through Other memories accessed by the IO port, such as hard disks, have a slow access speed, generally in the ms (millisecond) level.
在神经网络处理场合,一般关注的是处理单元对内存单元的访问。众所周知,处理单元的速度非常快,其主频一般是几百MHz(兆赫兹)到几GHz(吉赫兹),也就是ps到ns级,而内存单元的访问速度是几十ns级别,两者的速度有着较大的差异。如何解决处理单元和内存访问的速度差,有效发挥处理单元的算力,是现代CPU设计的一个难点。In the case of neural network processing, the general concern is the access of the processing unit to the memory unit. As we all know, the speed of the processing unit is very fast, and its main frequency is generally several hundred MHz (megahertz) to several GHz (gigahertz), that is, ps to ns level, and the access speed of the memory unit is tens of ns level, both There is a big difference in speed. How to solve the speed difference between the processing unit and the memory access, and effectively utilize the computing power of the processing unit, is a difficult point in modern CPU design.
图1是一种处理核中读取数据的示意图。Figure 1 is a schematic diagram of reading data in a processing core.
如图1所示,该处理核中,处理单元(Processing Unit,PU)和存储单元memory之间插入高速缓存Cache,PU采用分层的、间接的方式访问Memory,即PU直接访问的是Cache,PU通过Cache间接访问Memory。Cache是Memory的映射,其内容是内存内容的子集。Cache没有独立的编制空间,Cache的地址与访问的memory的地址相同。As shown in Figure 1, in the processing core, a high-speed cache is inserted between the processing unit (PU) and the storage unit memory. The PU accesses the Memory in a hierarchical and indirect manner, that is, the PU directly accesses the Cache. PU accesses Memory indirectly through Cache. Cache is a mapping of Memory, and its content is a subset of the memory content. The Cache has no independent organization space, and the address of the Cache is the same as the address of the accessed memory.
例如,PU在执行程序时,通过Cache从Memory中读取了一些数据,即Cache会保存了这部分数据,当在短时间内PU需要再次使用该部分数据时,则PU会直接从Cache中调用。For example, when the PU is executing the program, it reads some data from the Memory through the Cache, that is, the Cache saves this part of the data. When the PU needs to use this part of the data again in a short time, the PU will directly call it from the Cache .
但是,对于PU执行的程序,Cache是透明的,没有功能上的意义,也就是程序不能单独的访问Cache,即程序认为PU是从memory中调取了数据,但是实际上PU是从Cache中调取的数据。However, for the program executed by the PU, the Cache is transparent and has no functional significance, that is, the program cannot access the Cache alone, that is, the program thinks that the PU has retrieved data from the memory, but in fact the PU is called from the Cache. Fetched data.
上述方案存在如下的缺陷:The above scheme has the following shortcomings:
(1)由于神经网络计算中,用到的参数和数据量庞大,通常远超Cache的容量,而Cache基于数据的时间局部性特性和空间局部性特性而采取的降低访问失效率的措施将无法实现,从而大大降低处理单元的算力发挥。(1) Due to the huge amount of parameters and data used in neural network calculations, which usually far exceed the capacity of the Cache, the measures taken by the Cache to reduce the access failure rate based on the temporal and spatial local characteristics of the data will not be able to Realize, thereby greatly reducing the computing power of the processing unit.
(2)Cache电路复杂,导致芯片设计的难度和芯片的成本较高。(2) The Cache circuit is complicated, which leads to the difficulty of chip design and the higher cost of the chip.
下面将详细说明本申请一实施方式提供的数据存储管理装置。在本发明的描述中,需要说明的是,术语“第一”、“第二”、“第三”、“第四”仅用于描述目的,而不能理解为指示或暗示相对重要性。此外,下面所描述的本发明不同实施方式中所涉及的技术特征只要彼此之间未构成冲突就可以相互结合。The data storage management device provided by an embodiment of the present application will be described in detail below. In the description of the present invention, it should be noted that the terms "first", "second", "third", and "fourth" are only used for descriptive purposes, and cannot be understood as indicating or implying relative importance. In addition, the technical features involved in the different embodiments of the present invention described below can be combined with each other as long as they do not conflict with each other.
图2是根据本发明一实施方式的数据存储管理装置的结构示意图。Fig. 2 is a schematic structural diagram of a data storage management device according to an embodiment of the present invention.
如图2所示,该数据存储管理装置包括:至少两个随机存取存储器RAM、控制单元和直接存储访问控制器(Direct Memory Access Controller,DMAC)。As shown in FIG. 2, the data storage management device includes: at least two random access memories RAM, a control unit, and a direct memory access controller (DMAC).
优选的,该数据存储管理装置可设置在处理核中。Preferably, the data storage management device may be arranged in the processing core.
其中,至少两个RAM包括RAM_0、RAM_1…RAM_N。该数据存储管理装置设置的至少两个RAM,各个RAM可以独立的并行的被访问。Among them, at least two RAMs include RAM_0, RAM_1...RAM_N. The data storage management device has at least two RAMs, and each RAM can be accessed independently and in parallel.
可选的,所有的RAM的存储容量可以相同或者不同。Optionally, the storage capacity of all RAMs can be the same or different.
控制单元,用于接收指令,根据指令生成并发送控制信号C_DMAC。其中,指令是位于数据存储管理装置外的处理单元PU发送的。The control unit is used for receiving instructions, generating and sending control signals C_DMAC according to the instructions. Among them, the instruction is sent by the processing unit PU located outside the data storage management device.
DMAC,用于根据控制信号实现对所述RAM中数据的存取。DMAC is used to realize the access to the data in the RAM according to the control signal.
在一个实施例中,DMAC,用于根据控制信号实现对RAM中数据的存取,包括:DMAC,用于根据所述控制信号从外部存储单元Memory中读取数据,并将所述数据存储在所述控制信号指示的所述RAM中。In one embodiment, the DMAC is used to implement access to data in RAM according to a control signal, including: DMAC is used to read data from an external storage unit Memory according to the control signal, and store the data in In the RAM indicated by the control signal.
在一个实施例中,DMAC用于根据控制信号实现对RAM中数据的存取,包括:DMAC,用于根据所述控制信号从所述控制信号所指示的所述RAM中读取数据,并将所述数据存储在外部的memory中。In one embodiment, the DMAC is used to implement access to data in RAM according to a control signal, including: DMAC is used to read data from the RAM indicated by the control signal according to the control signal, and The data is stored in an external memory.
优选的,所述控制信号所指示的RAM的个数为一个或多个。Preferably, the number of RAM indicated by the control signal is one or more.
优选的,所有的RAM的访问地址统一编址,更优选的,RAM的访问地址连续的编制,降低程序控制的复杂度。Preferably, the access addresses of all RAMs are addressed uniformly. More preferably, the access addresses of the RAMs are programmed continuously to reduce the complexity of program control.
在一个优选的实施例中,所有的RAM的地址与所述外部存储单元的地址统一编址。In a preferred embodiment, the addresses of all RAMs are uniformly addressed with the addresses of the external storage unit.
例如,该数据存储管理装置中含有两个RAM,即RAM_0和RAM_1。其中,RAM_0的地址为0000H-0FFFH,RAM_1的地址为1000H-1FFFH,外部的memory的地址为2000H-FFFFH。For example, the data storage management device contains two RAMs, namely RAM_0 and RAM_1. Among them, the address of RAM_0 is 0000H-0FFFH, the address of RAM_1 is 1000H-1FFFH, and the address of the external memory is 2000H-FFFFH.
进一步优选的,DMAC的访问地址为全地址范围,具体的是所有的RAM的地址段和外部的memory的地址段。Further preferably, the access address of the DMAC is a full address range, specifically all RAM address segments and external memory address segments.
PU的访问地址为所有的RAM的地址段。The access address of PU is the address segment of all RAM.
在一个实施例中,DMAC在根据所述控制信号从外部memory中读取数据,并将数据存储在控制信号指示的所述RAM中后,向外部的处理单元发送存储完成的信号,存储完成的信号用于提示外部的处理单元可以从刚刚完成存储的RAM中读取数据。In one embodiment, after the DMAC reads data from the external memory according to the control signal and stores the data in the RAM indicated by the control signal, it sends a storage completion signal to the external processing unit, and the storage is completed The signal is used to prompt the external processing unit to read data from the RAM that has just completed storage.
图3是根据本发明一实施方式的处理核的结构示意图。Fig. 3 is a schematic structural diagram of a processing core according to an embodiment of the present invention.
如图3所示,该处理核包括处理单元PU、存储单元memory和上述实施 方式提供的数据存储管理装置。As shown in Fig. 3, the processing core includes a processing unit PU, a storage unit memory, and the data storage management device provided in the above-mentioned embodiment.
PU用于发送指令,指令用于指示数据存储管理装置实现对memory中数据的存取。The PU is used to send instructions, and the instructions are used to instruct the data storage management device to implement access to the data in the memory.
其中指令包括取数指令和存数指令。The instructions include fetch instructions and store instructions.
所述处理单元,用于发送指令,所述指令用于指示所述数据存储管理装置实现对所述memory中数据的存取,包括:The processing unit is configured to send instructions, where the instructions are used to instruct the data storage management device to implement access to the data in the memory, including:
PU用于发送取数指令,取数指令用于指示数据存储管理装置从memory中读取数据,并将数据存储在取数指令所指示的RAM中。优选的,当数据完成在指令所指示的RAM中的存储后,DAMC向PU发送存储完成的信号。The PU is used to send a fetch instruction, and the fetch instruction is used to instruct the data storage management device to read data from the memory and store the data in the RAM indicated by the fetch instruction. Preferably, when the data is stored in the RAM indicated by the instruction, the DAMC sends a storage complete signal to the PU.
在一个实施例中,PU还用于从任意RAM中读取执行程序所需的数据。In one embodiment, the PU is also used to read data needed to execute the program from any RAM.
优选的,PU用于在每次收到DMAC发出的存储完成的信号后,发出新的取数指令,并从刚刚完成存储的RAM中读取数据。Preferably, the PU is used to issue a new fetch instruction every time after receiving a storage completion signal sent by the DMAC, and read data from the RAM that has just completed storage.
具体地,当PU在收到DMAC发出的存储完成的信号后,发出新的取数指令,然后从刚刚完成存储的RAM中读取数据,这样DMAC根据新的取数指令从memory中读取数据并存储在相应的RAM中,与PU从刚刚完成存储的RAM中取数以执行程序可以并行,提高运算的效率。Specifically, when the PU receives the storage completion signal sent by the DMAC, it issues a new fetch instruction, and then reads the data from the RAM that has just completed the storage, so that the DMAC reads the data from the memory according to the new fetch instruction And stored in the corresponding RAM, and the PU fetches the data from the RAM that has just completed the storage to execute the program in parallel, which improves the efficiency of the operation.
当然,还可以PU在收到DMAC发出的存储完成的信号后,先从刚刚完成存储的RAM中读数数据,然后在发出新的取数指令。Of course, after receiving the storage completion signal sent by the DMAC, the PU can first read the data from the RAM that has just completed storage, and then issue a new fetch instruction.
在一个实施例中,在同一时间点,所述PU与DMAC访问不同的所述RAM。In one embodiment, at the same point in time, the PU and the DMAC access different RAMs.
具体地,至少两个RAM包括第一RAM和第二RAM。在第一时间,PU从第一RAM中读取第一数据,DMAC向第二RAM中写入从memory中取出的第二数据。在第二时间,PU从第二RAM中读取第二数据,DMAC向第一RAM中写入从memory中取出的第三数据。Specifically, the at least two RAMs include a first RAM and a second RAM. At the first time, the PU reads the first data from the first RAM, and the DMAC writes the second data retrieved from the memory into the second RAM. At the second time, the PU reads the second data from the second RAM, and the DMAC writes the third data retrieved from the memory into the first RAM.
可选的,PU与DMAC还可以同时访问同一个RAM,RAM串行的响应PU和DMAC。Optionally, PU and DMAC can also access the same RAM at the same time, and RAM responds to PU and DMAC serially.
可选的,如果RAM为双端口RAM,PU与DMAC还可以同时访问同一个RAM,双端口RAM并行的响应PU和DMAC。Optionally, if the RAM is a dual-port RAM, the PU and DMAC can also access the same RAM at the same time, and the dual-port RAM responds to the PU and DMAC in parallel.
需要说明的是,上述第一RAM可以为包括多个RAM的第一组RAM,第二RAM可以为包括多个RAM的第二组RAM。It should be noted that the above-mentioned first RAM may be a first group of RAMs including a plurality of RAMs, and the second RAM may be a second group of RAMs including a plurality of RAMs.
可选的,第一组RAM的RAM个数可以与第二组RAM的个数相同或不同。Optionally, the number of RAMs in the first group of RAMs may be the same as or different from the number of RAMs in the second group.
当第一RAM为第一组RAM,第二RAM为第二RAM组时,在同一时间,PU和DMAC可以分别访问一个组内的各个RAM,或者,在同一时间,PU和DMAC访问分属于两个组的RAM。When the first RAM is the first RAM group and the second RAM is the second RAM group, at the same time, the PU and DMAC can respectively access each RAM in a group, or at the same time, the PU and DMAC accesses belong to two groups. RAM for each group.
具体的,在第一时间,PU从第一组RAM中读取第一数据,DMAC向第二组RAM中写入从memory中取出的第二数据;在第二时间,PU从第二组RAM中读取第二数据,DMAC向第一组RAM中写入从memory中取出的第三数据。Specifically, at the first time, the PU reads the first data from the first group of RAM, and the DMAC writes the second data retrieved from the memory to the second group of RAM; at the second time, the PU reads from the second group of RAM The second data is read in the DMAC, and the DMAC writes the third data retrieved from the memory to the first group of RAM.
可选的,第一RAM为第一组RAM,第二RAM为第二RAM组时,所述处理单元或DMAC也可以分时访问同一个组内的各个RAM。Optionally, when the first RAM is the first RAM group and the second RAM is the second RAM group, the processing unit or the DMAC may also time-sharing access to each RAM in the same group.
本发明实施方式提供的处理核,PU从RAM中取数和DMAC将存储在memory的数据存储在RAM中能够并行处理,进一步能够提高处理核的算力,更加适用于神经网络的运算。另外,处理核中无需设计复杂的Cache电路,节省了处理核的成本也降低了芯片设计的难度,并且由于处理核中无需设计Cache电路,处理单元无需通过高速缓存Cache从外部memory中取数,消除了由于Cache访问失效带来的计算效率的下降,处理核可以直接从存储管理装置的RAM中调用数据,提高了程序效率的可控性。In the processing core provided by the embodiment of the present invention, the PU fetches data from the RAM and the DMAC stores the data stored in the memory in the RAM can be processed in parallel, which can further improve the computing power of the processing core, and is more suitable for neural network operations. In addition, there is no need to design a complex Cache circuit in the processing core, which saves the cost of the processing core and reduces the difficulty of chip design. Moreover, because there is no need to design a Cache circuit in the processing core, the processing unit does not need to fetch data from external memory through the high-speed cache. The decrease in computing efficiency caused by Cache access failure is eliminated, and the processing core can directly call data from the RAM of the storage management device, which improves the controllability of program efficiency.
图4是根据本发明一实施方式的神经网络的结构示意图。Fig. 4 is a schematic structural diagram of a neural network according to an embodiment of the present invention.
如图4所示,神经网络为2层,第一层的输出结果作为第二层的输入,第二层的输出结果为整个神经网络的输出。As shown in Figure 4, the neural network has two layers, the output of the first layer is used as the input of the second layer, and the output of the second layer is the output of the entire neural network.
图5是根据本发明一实施方式的处理核的结构示意图。图5所示的处理核用于实现图4所示的神经网络的计算。Fig. 5 is a schematic structural diagram of a processing core according to an embodiment of the present invention. The processing core shown in FIG. 5 is used to realize the calculation of the neural network shown in FIG. 4.
如图5所示,该处理核包括数据存储管理装置、处理单元和存储单元。As shown in Figure 5, the processing core includes a data storage management device, a processing unit, and a storage unit.
其中,数据存储管理装置包括RAM_0、RAM_1、DAMC和控制单元。Among them, the data storage management device includes RAM_0, RAM_1, DAMC and a control unit.
假设每一层神经网络的参数和数据均小于单块RAM的容量,即RAM_0和RAM_1均可容下一层神经网络计算的参数和数据。Assuming that the parameters and data of each layer of neural network are less than the capacity of a single block of RAM, that is, RAM_0 and RAM_1 can hold the parameters and data calculated by the next layer of neural network.
设置在同一时间点,PU和DMAC访问不同的RAM,进而实现PU执行程序与DMAC进行存储数据并行,从而优化计算和存储效率。例如第一时间点PU访问RAM_0,DAMC访问RAM_1。在第二时间点,PU访问RAM_1,DAMC访问RAM_0,如此循环。Set at the same point in time, PU and DMAC access different RAMs, so that the PU execution program and DMAC can store data in parallel, thereby optimizing calculation and storage efficiency. For example, at the first time, PU accesses RAM_0, and DAMC accesses RAM_1. At the second time point, PU accesses RAM_1, DAMC accesses RAM_0, and so on.
在一个具体的实施例中,PU发送指令lls_dis,控制单元接收指令,并产生控制信号C_DMAC并发送给DMAC,DMAC根据指令从memory中读取指令所指示的数据,并将数据存储至指令所指示的RAM_0中,当DMAC将数据存储完成后,向PU发送存储完成的信号;PU接收到存储完成的信号,发出新的指令用于指示DMAC从memory中读取新的数据存储至RAM_1中,然后从RAM_0中读取数据。当DMAC将数据存储至RAM_1后,发送存储完成的信号,PU再次发出指令,然后从RAM_1中读取数据,再次发出的指令指示DMAC从memory中读取新的数据存储至RAM_0中。这样,DMAC存储数据和PU读取数据实现了并行处理,从而计算和存储都可以最大的发挥效率。In a specific embodiment, the PU sends the instruction lls_dis, the control unit receives the instruction, and generates a control signal C_DMAC and sends it to the DMAC. The DMAC reads the data indicated by the instruction from the memory according to the instruction, and stores the data in the instruction indicated by the instruction. In RAM_0, when the DMAC has finished storing the data, it sends a storage completion signal to the PU; the PU receives the storage completion signal and issues a new instruction to instruct the DMAC to read new data from memory and store it in RAM_1, and then Read data from RAM_0. When the DMAC stores the data in RAM_1, it sends a signal that the storage is complete. The PU issues an instruction again and then reads the data from RAM_1. The instruction issued again instructs the DMAC to read new data from memory and store it in RAM_0. In this way, DMAC storage data and PU read data realize parallel processing, so that both calculation and storage can maximize efficiency.
图6是根据本发明一实施方式的处理核进行神经网络计算的时序图。Fig. 6 is a sequence diagram of a neural network calculation performed by a processing core according to an embodiment of the present invention.
如图6所示,在t0时,该PU从RAM_0中读取数据时,即RAM_0被PU执行神经网络第一层的程序所占用,同时DMAC将从memory中读取的数据写入到RAM_1中;在t1时,当RAM_1被PU执行神经网络第二层的程序所占用,同时DMAC将从memory中读取的数据写入到RAM_0中。通过设置PU和DMAC访问地址的不同,无需PU从memory中取数,能够降低程序的复杂度,提高处理核的算力。As shown in Figure 6, at t0, when the PU reads data from RAM_0, that is, RAM_0 is occupied by the PU executing the program of the first layer of the neural network, and the DMAC writes the data read from memory to RAM_1. ; At t1, when RAM_1 is occupied by the program of the second layer of the neural network executed by the PU, at the same time the DMAC writes the data read from memory to RAM_0. By setting the difference between the PU and DMAC access addresses, there is no need for the PU to fetch data from the memory, which can reduce the complexity of the program and increase the computing power of the processing core.
应当理解的是,可以设置PU在t2时执行的程序与在t1时执行的程序相同,即在t2时也进行相同的神经网络的第一层的计算,或者,设置PU在t2时执行的程序与在t1时执行的程序不相同,本发明不以此为限。It should be understood that the program executed by the PU at t2 can be set to be the same as the program executed at t1, that is, the calculation of the first layer of the same neural network is also performed at t2, or the program executed by the PU at t2 can be set It is different from the program executed at t1, and the present invention is not limited to this.
根据本发明的一个实施方式,提供了一种芯片,包括一个或多个上述实施方式提供的处理核。According to an embodiment of the present invention, there is provided a chip including one or more processing cores provided in the foregoing embodiments.
根据本发明的一个实施方式,提供了一种卡板,包括一个或多个上述实施方式提供的芯片。According to an embodiment of the present invention, a card board is provided, which includes one or more chips provided in the foregoing embodiments.
根据本发明的一个实施方式,提供了一种电子设备,包括一个或多个上述实施方式提供的卡板。According to an embodiment of the present invention, there is provided an electronic device, including one or more of the card boards provided in the foregoing embodiments.
图7是本发明一实施方式提供的数据存储管理方法,该方法包括:步骤S101-步骤S102;FIG. 7 is a data storage management method provided by an embodiment of the present invention. The method includes: step S101-step S102;
其中,步骤S101,控制单元接收指令,根据所述指令生成并发送控制信号。Wherein, in step S101, the control unit receives an instruction, generates and sends a control signal according to the instruction.
步骤S102,直接存储器访问控制器DMAC根据所述控制信号实现对所述RAM中数据的存取。Step S102, the direct memory access controller DMAC realizes the access to the data in the RAM according to the control signal.
DMAC根据所述控制信号实现对所述RAM中数据的存取,包括:DMAC根据所述控制信号从外部memory中读取数据,并将所述数据存储在所述控制信号指示的RAM中。The DMAC implements the access to the data in the RAM according to the control signal, including: the DMAC reads data from the external memory according to the control signal, and stores the data in the RAM indicated by the control signal.
DMAC根据所述控制信号实现对所述RAM中数据的存取,包括:DMAC根据所述控制信号从所述控制信号所指示的RAM中读取数据,并将所述数据存储在外部memory中。The DMAC implements the access to the data in the RAM according to the control signal, including: the DMAC reads data from the RAM indicated by the control signal according to the control signal, and stores the data in an external memory.
可以理解的是,本实施方式提供的数据存储管理方法由本发明前述实施方式提供的数据存储管理装置执行,因此,相同特征在此不做雷同描述。It is understandable that the data storage management method provided in this embodiment is executed by the data storage management device provided in the foregoing embodiment of the present invention, and therefore, the same features are not described here.
本发明一实施方式提供了处理核处理数据的方法流程示意图。An embodiment of the present invention provides a schematic flowchart of a method for processing core processing data.
该方法包括:步骤S201-步骤S202,The method includes: step S201-step S202,
步骤S201,处理单元发送取数指令;Step S201, the processing unit sends a fetch instruction;
步骤S202,数据存储管理装置根据所述取数指令从存储单元中读取数据,并将数据存储至取数指令指示的数据存储管理装置的RAM中。Step S202: The data storage management device reads data from the storage unit according to the fetch instruction, and stores the data in the RAM of the data storage management device indicated by the fetch instruction.
在一个优选的实施例中,数据存储管理装置在将数据存储至取数指令指示的数据存储管理装置的RAM中后,发出存储完成的信号。In a preferred embodiment, the data storage management device sends a storage completion signal after storing the data in the RAM of the data storage management device indicated by the fetch instruction.
所述处理单元在每次收到DMAC发出的存储完成的信号后,发出新的取数指令,在从刚刚完成的取数指令所指示的RAM中读取数据。Each time the processing unit receives a storage completion signal sent by the DMAC, it issues a new fetch instruction, and reads data from the RAM indicated by the fetch instruction just completed.
在一个实施例中,在同一时间点,处理单元与直接存储器访问控制器DMAC访问不同的所述RAM。In one embodiment, at the same point in time, the processing unit and the direct memory access controller DMAC access different said RAMs.
具体地,在第一时间,处理单元从第一RAM中读取第一数据,DMAC向第二RAM中写入从memory中取出的第二数据。在第二时间,PU从第二RAM中读取第二数据,DMAC向第一RAM中写入从memory中取出的第三数据。Specifically, at the first time, the processing unit reads the first data from the first RAM, and the DMAC writes the second data retrieved from the memory into the second RAM. At the second time, the PU reads the second data from the second RAM, and the DMAC writes the third data retrieved from the memory into the first RAM.
可以理解的是,本实施方式提供的处理核处理数据的方法由本发明前述实施方式提供的处理核执行,因此,相同特征在此不做雷同描述。It is understandable that the method for processing core processing data provided in this embodiment is executed by the processing core provided in the foregoing embodiment of the present invention, and therefore, the same features are not described here.
根据本发明的一个实施方式,提供了一种电子设备,包括:存储器,用于存储计算机可读指令;以及一个或多个处理器,用于运行所述计算机可读指令,使得所述处理器运行时实现前述实施方式的数据存储管理的方法。According to an embodiment of the present invention, there is provided an electronic device, including: a memory for storing computer-readable instructions; and one or more processors for running the computer-readable instructions so that the processor The method of data storage management of the foregoing embodiment is implemented at runtime.
根据本发明的一个实施方式,提供一种非暂态计算机可读存储介质,该非暂态计算机可读存储介质存储计算机指令,该计算机指令用于使计算机执行前述实施方式的数据存储管理的方法。According to one embodiment of the present invention, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the data storage management method of the foregoing embodiment .
根据本发明的一个实施方式,提供一种计算机程序产品,其中,包括计算机指令,当所述计算机指令被计算设备执行时,所述计算设备可以执行前述实施方式的数据存储管理的方法。According to an embodiment of the present invention, a computer program product is provided, which includes computer instructions, and when the computer instructions are executed by a computing device, the computing device can execute the data storage management method of the foregoing embodiment.
应当理解的是,本发明的上述具体实施方式仅仅用于示例性说明或解释本发明的原理,而不构成对本发明的限制。因此,在不偏离本发明的精神和范围的情况下所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。此外,本发明所附权利要求旨在涵盖落入所附权利要求范围和边界、或者这种范围和边界的等同形式内的全部变化和修改例。It should be understood that the foregoing specific embodiments of the present invention are only used to exemplarily illustrate or explain the principle of the present invention, and do not constitute a limitation to the present invention. Therefore, any modification, equivalent replacement, improvement, etc. made without departing from the spirit and scope of the present invention should be included in the protection scope of the present invention. In addition, the appended claims of the present invention are intended to cover all changes and modifications that fall within the scope and boundary of the appended claims, or equivalent forms of such scope and boundary.

Claims (10)

  1. 一种数据存储管理装置,其特征在于,包括:A data storage management device, characterized in that it comprises:
    至少两个随机存取存储器RAM;At least two random access memory RAMs;
    控制单元,用于接收指令,根据所述指令生成并发送控制信号;The control unit is configured to receive instructions, and generate and send control signals according to the instructions;
    直接存储器访问控制器DMAC,用于根据所述控制信号实现对所述随机存取存储器RAM中数据的存取。The direct memory access controller DMAC is used to implement the access to the data in the random access memory RAM according to the control signal.
  2. 根据权利要求1所述的装置,其特征在于,所述直接存储器访问控制器DMAC,用于根据所述控制信号实现对所述随机存取存储器RAM中数据的存取,包括:The device according to claim 1, wherein the direct memory access controller DMAC is configured to implement access to data in the random access memory RAM according to the control signal, comprising:
    所述直接存储器访问控制器DMAC,用于根据所述控制信号从外部存储单元中读取数据,并将所述数据存储在所述控制信号指示的所述随机存取存储器RAM中;或The direct memory access controller DMAC is configured to read data from an external storage unit according to the control signal, and store the data in the random access memory RAM indicated by the control signal; or
    所述直接存储器访问控制器DMAC,用于根据所述控制信号从所述控制信号所指示的所述随机存取存储器RAM中读取数据,并将所述数据存储在外部存储单元中。The direct memory access controller DMAC is configured to read data from the random access memory RAM indicated by the control signal according to the control signal, and store the data in an external storage unit.
  3. 根据权利要求1或2所述的装置,其特征在于,所述控制信号所指示的所述随机存取存储器RAM的个数为一个或多个。The device according to claim 1 or 2, wherein the number of the random access memory RAM indicated by the control signal is one or more.
  4. 根据权利要求1-3任一项所述的装置,其特征在于,所有的所述随机存取存储器RAM的地址与所述外部存储单元的地址统一编址;或,所有的所述随机存取存储器RAM的地址统一编址。The device according to any one of claims 1-3, wherein all the addresses of the random access memory RAM and the addresses of the external storage unit are uniformly addressed; or, all the random access memories are The address of the memory RAM is uniformly addressed.
  5. 根据权利要求1-4任一项所述的装置,其特征在于,所述直接存储器访问控制器DMAC的访问地址范围为所有的所述随机存取存储器RAM的地址段和外部的存储单元的地址段。The device according to any one of claims 1 to 4, wherein the access address range of the direct memory access controller DMAC is all the address segments of the random access memory RAM and addresses of external storage units part.
  6. 一种处理核,其特征在于,包括处理单元、存储单元和如权利要求1-5任一项所述的数据存储管理装置;A processing core, characterized by comprising a processing unit, a storage unit, and the data storage management device according to any one of claims 1-5;
    所述处理单元,用于发送指令,所述指令用于指示所述数据存储管理 装置实现对所述存储单元中数据的存取;The processing unit is configured to send an instruction, the instruction being used to instruct the data storage management device to implement access to the data in the storage unit;
    所述处理单元,还用于从任意所述随机存取存储器RAM中读取执行程序所需的数据。The processing unit is also used to read data required for executing the program from any random access memory RAM.
  7. 根据权利要求6所述的处理核,其特征在于,The processing core according to claim 6, wherein:
    其中指令包括取数指令和存数指令;The instructions include fetch instructions and store instructions;
    所述处理单元用于发送取数指令,所述取数指令用于指示所述数据存储管理装置从所述存储单元中取出数据并将所述数据存储至所述取数指令指示的随机存取存储器RAM中。The processing unit is configured to send a fetch instruction, and the fetch instruction is used to instruct the data storage management device to fetch data from the storage unit and store the data in the random access indicated by the fetch instruction Memory RAM.
  8. 根据权利要求7所述的处理核,其特征在于,所述直接存储器访问控制器DMAC用于在完成所述取数指令后,发送存储完成的信号;The processing core according to claim 7, wherein the direct memory access controller (DMAC) is configured to send a storage completion signal after completing the fetch instruction;
    所述处理单元,用于根据所述存储完成的信号,发出新的取数指令,并从所述取数指令指示的随机存取存储器RAM中读取数据。The processing unit is configured to issue a new fetch instruction according to the storage completion signal, and read data from the random access memory RAM indicated by the fetch instruction.
  9. 根据权利要求6-8任一项所述的处理核,其特征在于,在同一时间点,所述处理单元与所述直接存储器访问控制器DMAC访问不同的所述随机存取存储器RAM。The processing core according to any one of claims 6-8, wherein at the same point in time, the processing unit and the direct memory access controller DMAC access different random access memory RAMs.
  10. 根据权利要求7所述的处理核,其特征在于,所述在同一时间,所述处理单元与所述直接存储器访问控制器DMAC访问不同的所述随机存取存储器RAM,包括:7. The processing core according to claim 7, wherein the processing unit and the direct memory access controller DMAC access different random access memory RAMs at the same time, comprising:
    所述至少两个随机存取存储器RAM包括第一随机存取存储器RAM和第二随机存取存储器RAM;The at least two random access memory RAMs include a first random access memory RAM and a second random access memory RAM;
    在第一时间,所述处理单元从所述第一随机存取存储器RAM中读取第一数据,所述直接存储器访问控制器DMAC向所述第二随机存取存储器RAM中写入从所述存储单元中取出的第二数据;At the first time, the processing unit reads first data from the first random access memory RAM, and the direct memory access controller DMAC writes to the second random access memory RAM from the The second data retrieved from the storage unit;
    在第二时间,所述处理单元从所述第二随机存取存储器RAM中读取所述第二数据,所述直接存储器访问控制器DMAC向所述第一随机存取存储器RAM中写入从所述存储单元中取出的第三数据。At the second time, the processing unit reads the second data from the second random access memory RAM, and the direct memory access controller DMAC writes the slave data to the first random access memory RAM. The third data retrieved from the storage unit.
PCT/CN2020/083208 2020-04-03 2020-04-03 Data storage management apparatus and processing core WO2021196160A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202080096316.9A CN115380292A (en) 2020-04-03 2020-04-03 Data storage management device and processing core
PCT/CN2020/083208 WO2021196160A1 (en) 2020-04-03 2020-04-03 Data storage management apparatus and processing core

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/083208 WO2021196160A1 (en) 2020-04-03 2020-04-03 Data storage management apparatus and processing core

Publications (1)

Publication Number Publication Date
WO2021196160A1 true WO2021196160A1 (en) 2021-10-07

Family

ID=77927304

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/083208 WO2021196160A1 (en) 2020-04-03 2020-04-03 Data storage management apparatus and processing core

Country Status (2)

Country Link
CN (1) CN115380292A (en)
WO (1) WO2021196160A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1811741A (en) * 2005-01-27 2006-08-02 富士通株式会社 Direct memory access control method, direct memory access controller, information processing system, and program
CN106776360A (en) * 2017-02-28 2017-05-31 建荣半导体(深圳)有限公司 A kind of chip and electronic equipment
CN108416422A (en) * 2017-12-29 2018-08-17 国民技术股份有限公司 A kind of convolutional neural networks implementation method and device based on FPGA
US10515302B2 (en) * 2016-12-08 2019-12-24 Via Alliance Semiconductor Co., Ltd. Neural network unit with mixed data and weight size computation capability
CN110647722A (en) * 2019-09-20 2020-01-03 北京中科寒武纪科技有限公司 Data processing method and device and related product

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1811741A (en) * 2005-01-27 2006-08-02 富士通株式会社 Direct memory access control method, direct memory access controller, information processing system, and program
US10515302B2 (en) * 2016-12-08 2019-12-24 Via Alliance Semiconductor Co., Ltd. Neural network unit with mixed data and weight size computation capability
CN106776360A (en) * 2017-02-28 2017-05-31 建荣半导体(深圳)有限公司 A kind of chip and electronic equipment
CN108416422A (en) * 2017-12-29 2018-08-17 国民技术股份有限公司 A kind of convolutional neural networks implementation method and device based on FPGA
CN110647722A (en) * 2019-09-20 2020-01-03 北京中科寒武纪科技有限公司 Data processing method and device and related product

Also Published As

Publication number Publication date
CN115380292A (en) 2022-11-22

Similar Documents

Publication Publication Date Title
US10198204B2 (en) Self refresh state machine MOP array
US9892058B2 (en) Centrally managed unified shared virtual address space
JP4322259B2 (en) Method and apparatus for synchronizing data access to local memory in a multiprocessor system
US9141173B2 (en) Thread consolidation in processor cores
US9965222B1 (en) Software mode register access for platform margining and debug
US20140181427A1 (en) Compound Memory Operations in a Logic Layer of a Stacked Memory
CN104699631A (en) Storage device and fetching method for multilayered cooperation and sharing in GPDSP (General-Purpose Digital Signal Processor)
US20220076739A1 (en) Memory context restore, reduction of boot time of a system on a chip by reducing double data rate memory training
JP7126136B2 (en) Reconfigurable cache architecture and method of cache coherency
CN108139994B (en) Memory access method and memory controller
JPH1097464A (en) Information processing system
JP2018136922A (en) Memory division for computing system having memory pool
US20130191587A1 (en) Memory control device, control method, and information processing apparatus
US11914903B2 (en) Systems, methods, and devices for accelerators with virtualization and tiered memory
KR20240004361A (en) Processing-in-memory concurrent processing system and method
WO2022068149A1 (en) Data loading and storage system and method
Guoteng et al. Design and Implementation of a DDR3-based Memory Controller
WO2021196160A1 (en) Data storage management apparatus and processing core
US9720830B2 (en) Systems and methods facilitating reduced latency via stashing in system on chips
US11899970B2 (en) Storage system and method to perform workload associated with a host
EP4060505A1 (en) Techniques for near data acceleration for a multi-core architecture
CN217588059U (en) Processor system
US20240004560A1 (en) Efficient memory power control operations
EP4160423A1 (en) Memory device, memory device operating method, and electronic device including memory device
CN113284532A (en) Processor system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20929409

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20929409

Country of ref document: EP

Kind code of ref document: A1