CN115380292A - Data storage management device and processing core - Google Patents

Data storage management device and processing core Download PDF

Info

Publication number
CN115380292A
CN115380292A CN202080096316.9A CN202080096316A CN115380292A CN 115380292 A CN115380292 A CN 115380292A CN 202080096316 A CN202080096316 A CN 202080096316A CN 115380292 A CN115380292 A CN 115380292A
Authority
CN
China
Prior art keywords
data
ram
random access
access
instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202080096316.9A
Other languages
Chinese (zh)
Inventor
罗飞
王维伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Simm Computing Technology Co ltd
Original Assignee
Beijing Simm Computing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Simm Computing Technology Co ltd filed Critical Beijing Simm Computing Technology Co ltd
Publication of CN115380292A publication Critical patent/CN115380292A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

A data storage management device and a processing core, the device comprises: at least two Random Access Memories (RAMs); the control unit receives an instruction, and generates and transmits a control signal according to the instruction (S101); the DMAC implements access to data in the random access memory RAM according to the control signal (S102). The data storage management device receives and responds to the instruction sent by the external processing unit, and reads data from the external storage unit, so that the external processing unit can directly read data required by the execution program from the data storage management device when executing the program, the external processing unit does not need to fetch data from the external storage unit through the Cache, the reduction of computing efficiency caused by Cache access failure is eliminated, and the controllability of program efficiency is improved.

Description

Data storage management device and processing core Technical Field
The present invention relates to the field of processing core technologies, and in particular, to a data storage management apparatus and a processing core.
Background
With the development of science and technology, the human society is rapidly entering the intelligent era. The important characteristics of the intelligent era are that people obtain more and more data, the quantity of the obtained data is larger and larger, and the requirement on the speed of processing the data is higher and higher.
Chips are the cornerstone of data processing, which fundamentally determines the ability of people to process data. From the application field, the chip has two main routes: one is a generic chip path, such as a CPU or the like, which offers great flexibility but is less computationally efficient in processing domain-specific algorithms; the other is a special chip route, such as TPU, etc., which can exert higher effective computing power in some specific fields, but in the more general fields which are flexible and changeable, the processing capability of the special chip route is poorer or even impossible.
Because the data of the intelligent era is various and huge in quantity, the chip is required to have extremely high flexibility, can process algorithms in different fields and in different days, has extremely high processing capacity, and can rapidly process extremely large and sharply increased data volume.
Disclosure of Invention
The invention provides a data storage management device and a processing core, which can eliminate the reduction of computing efficiency caused by Cache access failure and improve the controllability of program efficiency.
A first aspect of the present invention provides a data storage management apparatus, including: at least two Random Access Memories (RAMs); the control unit is used for receiving an instruction, generating and sending a control signal according to the instruction; and the direct memory access controller DMAC is used for realizing the access of the data in the random access memory RAM according to the control signal.
The data storage management device provided by the embodiment of the invention receives and responds to the instruction sent by the external processing unit and reads data from the external storage unit, so that the external processing unit can directly read the data required by the execution program from the data storage management device when executing the program, and the external processing unit does not need to fetch the data from the external storage unit through the Cache, thereby eliminating the reduction of the computing efficiency caused by the access failure of the Cache and improving the controllability of the program efficiency.
Preferably, the direct memory access controller DMAC is configured to implement access to data in the RAM according to the control signal, and includes: the direct memory access controller DMAC is used for reading data from an external storage unit according to the control signal and storing the data in the random access memory RAM indicated by the control signal; or the direct memory access controller DMAC, configured to read data from the random access memory RAM indicated by the control signal according to the control signal, and store the data in an external storage unit.
Preferably, the number of the random access memories RAM indicated by the control signal is one or more.
Preferably, all the addresses of the random access memory RAM are addressed in unison with the address of the external memory location. Or all the addresses of the random access memory RAM are uniformly addressed.
Preferably, the range of access addresses of the DMAC is an address field of the RAM and an address field of an external memory unit.
According to another aspect of the present invention, there is also provided a processing core, including a processing unit, a storage unit, and the storage management apparatus provided in the first aspect; the processing unit is used for sending an instruction, and the instruction is used for instructing the storage management device to realize the access of the data in the storage unit; the processing unit is further configured to read data required for executing a program from any of the random access memories RAM.
Preferably, the instructions include fetch instructions and store instructions; the processing unit is used for sending an access instruction, and the access instruction is used for instructing the data storage management device to access data from the storage unit and store the data into a random access memory RAM indicated by the access instruction.
Preferably, the DMAC is configured to send a signal indicating that the storing is completed after the fetching instruction is completed; and the processing unit is used for sending a new access instruction according to the storage completion signal and reading data from the random access memory RAM indicated by the access instruction.
Preferably, at the same point in time, the processing unit and the direct memory access controller DMAC access different ones of the random access memories RAM.
Preferably, said at the same time, said processing unit accessing a different said random access memory RAM than said direct memory access controller DMAC, comprises: the at least two random access memories RAM includes a first random access memory RAM and a second random access memory RAM; at a first time, the processing unit reads first data from a first random access memory RAM, the direct memory access controller DMAC writing second data taken out from the storage unit into the second random access memory RAM; at a second time, the processing unit reads the second data from the second random access memory RAM, and the direct memory access controller DMAC writes the third data fetched from the storage unit into the first random access memory RAM.
Preferably, the first random access memory RAM is a first set of random access memory RAMs including a plurality of RAMs, and the second random access memory RAM is a second set of random access memory RAMs including a plurality of random access memory RAMs.
When the first random access memory RAM is a first group of random access memory RAMs and the second random access memory RAM is a second group of random access memory RAMs, the processing unit and the direct memory access controller DMAC access the respective random access memory RAMs within one group at the same time, respectively, or the processing unit and the direct memory access controller DMAC access the RAMs belonging to two groups at the same time.
According to a third aspect of the invention, there is provided a chip comprising one or more processing cores as provided in the second aspect.
According to a fourth aspect of the present invention, there is provided a card board including one or more chips provided by the third aspect.
According to a fifth aspect of the present invention, there is provided an electronic apparatus including one or more cards provided by the fourth aspect.
According to a sixth aspect of the present invention, there is provided a data storage management method, comprising a control unit receiving an instruction, generating and transmitting a control signal according to the instruction; and the Direct Memory Access Controller (DMAC) realizes the access of the data in the Random Access Memory (RAM) according to the control signal.
According to a seventh aspect of the present invention, there is provided an electronic apparatus comprising: a memory for storing computer readable instructions; and one or more processors configured to execute the computer-readable instructions, such that the processors when executed implement the method of data storage management of any of the preceding sixth aspects.
According to an eighth aspect of the present invention, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of data storage management of any one of the preceding sixth aspects.
According to a ninth aspect of the present invention, there is provided a computer program product comprising computer instructions which, when executed by a computing device, enable the computing device to perform the method of data storage management of any of the preceding sixth aspects.
The data storage management device provided by the embodiment of the invention receives and responds to the instruction sent by the external processing unit, and reads data from the external storage unit, so that the external processing unit can directly read the data required by the execution program from the data storage management device when executing the program, and the external processing unit does not need to fetch data from the external storage unit through the Cache, thereby eliminating the reduction of the computing efficiency caused by the access failure of the Cache and improving the controllability of the program efficiency.
Drawings
FIG. 1 is a diagram of a prior art process core reading data;
FIG. 2 is a schematic diagram of a data storage management device according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a processing core according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a neural network according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a processing core according to an embodiment of the present invention.
FIG. 6 is a timing diagram of a processing core performing neural network computations, according to an embodiment of the present invention;
FIG. 7 is a flow diagram illustrating a data storage management method according to an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings in conjunction with the following detailed description. It should be understood that the description is intended to be exemplary only, and is not intended to limit the scope of the present invention. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.
In neural network computing, multi-core or many-core chips are often used. Here, the cores in the multi (many) core architecture all have certain independent processing capability and have relatively large in-core storage space for storing programs, data and weights of the cores.
The exertion of the basic computing power of the single core determines the capability of the whole chip for computing the neural network. The basic computing power of the single core is exerted according to the ideal computing power and the storage access efficiency of the computing unit of the single core.
The speed at which different memory locations are accessed varies. Generally, registers are accessed most quickly, typically accessing hundreds of ps (picoseconds) at a time; secondly, a Static Random Access Memory (SRAM) is used, and generally, the Access time is in the range of hundreds of ps to ns (nanoseconds); the Memory unit is a Double Data Rate Synchronous Dynamic Access Memory (DDR SDRAM), and generally takes tens to hundreds of ns when accessed once; finally, other memories, such as hard disks, etc., accessed through the IO port have slow access speed, typically in ms (milliseconds).
In neural network processing scenarios, access to memory units by processing units is of general interest. It is known that the speed of a processing unit is very fast, the main frequency is typically several hundred MHz (megahertz) to several GHz (gigahertz), that is, ps to ns, and the access speed of a memory unit is in tens of ns, and the speeds of the two are greatly different. How to solve the problem of poor speed of processing unit and memory access and effectively exert the computing power of the processing unit is a difficult point of modern CPU design.
FIG. 1 is a schematic diagram of a processing core reading data.
As shown in fig. 1, in the Processing core, a Cache is inserted between a Processing Unit (PU) and a Memory Unit, the PU accesses the Memory in a hierarchical and indirect manner, that is, the PU directly accesses the Cache, and the PU indirectly accesses the Memory through the Cache. Cache is a mapping of a Memory, whose contents are a subset of the Memory contents. The Cache has no independent programming space, and the address of the Cache is the same as the address of the accessed memory.
For example, when the PU executes a program, some data is read from the Memory through the Cache, that is, the Cache stores the data, and when the PU needs to use the data again in a short time, the PU can directly call from the Cache.
However, for the program executed by the PU, the Cache is transparent, and has no functional meaning, that is, the program cannot access the Cache alone, that is, the program considers that the PU calls data from the memory, but actually the PU calls data from the Cache.
The above scheme has the following defects:
(1) Because of huge parameters and data quantity used in the neural network calculation, the capacity of the Cache is usually far exceeded, and the measures for reducing the access failure rate, which are adopted by the Cache based on the time locality characteristic and the space locality characteristic of the data, cannot be realized, so that the calculation capacity of the processing unit is greatly reduced.
(2) Cache circuits are complex, resulting in difficulty in chip design and high chip cost.
The following describes in detail a data storage management apparatus according to an embodiment of the present application. In the description of the present invention, it should be noted that the terms "first", "second", "third", and "fourth" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Furthermore, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
FIG. 2 is a schematic diagram of a data storage management device according to an embodiment of the present invention.
As shown in fig. 2, the data storage management apparatus includes: at least two Random Access Memories (RAMs), a control unit and a Direct Memory Access Controller (DMAC).
Preferably, the data storage management means may be provided in the processing core.
Wherein, the at least two RAMs comprise RAM _0, RAM _1 … RAM _ N. The data storage management device is provided with at least two RAMs, and each RAM can be independently accessed in parallel.
Alternatively, the storage capacity of all the RAMs may be the same or different.
And the control unit is used for receiving the instruction, and generating and sending a control signal C _ DMAC according to the instruction. The instruction is sent by a processing unit PU located outside the data storage management device.
And the DMAC is used for realizing the access of the data in the RAM according to the control signal.
In one embodiment, the DMAC, for implementing access to data in the RAM according to a control signal, comprises: and the DMAC is used for reading data from an external Memory unit Memory according to the control signal and storing the data in the RAM indicated by the control signal.
In one embodiment, the DMAC is for performing an access to data in the RAM according to a control signal, and includes: and the DMAC is used for reading data from the RAM indicated by the control signal according to the control signal and storing the data in an external memory.
Preferably, the number of RAMs indicated by the control signal is one or more.
Preferably, all the access addresses of the RAM are uniformly addressed, and more preferably, the access addresses of the RAM are continuously programmed, so that the complexity of program control is reduced.
In a preferred embodiment, all the addresses of the RAM are addressed in unison with the address of the external memory location.
For example, the data storage management device has two RAMs, i.e., RAM _0 and RAM _1. The address of RAM _0 is 0000H-0FFFH, the address of RAM _1is 1000H-1FFFH, and the address of the external memory is 2000H-FFFFH.
Further preferably, the DMAC access address is a full address range, specifically, all the address fields of the RAM and the address field of the external memory.
The access address of the PU is the address field of all the RAMs.
In one embodiment, the DMAC sends a storage completion signal to the external processing unit after reading data from the external memory according to the control signal and storing the data in the RAM indicated by the control signal, the storage completion signal being used to indicate that the external processing unit can read data from the RAM that has just completed storing.
FIG. 3 is a schematic diagram of a processing core according to an embodiment of the present invention.
As shown in fig. 3, the processing core includes a processing unit PU, a storage unit memory, and the data storage management apparatus provided in the foregoing embodiments.
The PU is used for sending an instruction, and the instruction is used for indicating the data storage management device to realize the access of the data in the memory.
Wherein the instructions include fetch instructions and store instructions.
The processing unit is configured to send an instruction, where the instruction is used to instruct the data storage management device to implement access to data in the memory, and the instruction includes:
the PU is used for sending an access instruction, and the access instruction is used for instructing the data storage management device to read data from the memory and store the data in the RAM indicated by the access instruction. Preferably, the DAMC signals completion of the store to the PU when the data completes the store in the RAM indicated by the instruction.
In one embodiment, the PU is also used to read data from any RAM that is needed to execute the program.
Preferably, the PU is adapted to issue a new fetch instruction each time a store complete signal from the DMAC is received, and to read data from the RAM which has just completed storing.
Specifically, after receiving a storage completion signal sent by the DMAC, the PU sends a new fetch instruction, and then reads data from the RAM that has just completed storage, so that the DMAC reads data from the memory according to the new fetch instruction and stores the data in the corresponding RAM, and the data can be fetched from the RAM that has just completed storage by the PU to execute the program in parallel, thereby improving the efficiency of operation.
Of course, it is also possible for the PU to read data from the RAM that has just completed storage, and then issue a new fetch instruction, after receiving the DMAC signal that storage is complete.
In one embodiment, the PU and DMAC access different ones of the RAMs at the same point in time.
Specifically, the at least two RAMs include a first RAM and a second RAM. At a first time, the PU reads first data from the first RAM, and the DMAC writes second data retrieved from the memory into the second RAM. At a second time, the PU reads the second data from the second RAM, and the DMAC writes the third data retrieved from the memory into the first RAM.
Optionally, the PU and the DMAC may also access the same RAM simultaneously, with the RAM responding serially to the PU and the DMAC.
Optionally, if the RAM is a dual-port RAM, the PU and the DMAC may also access the same RAM at the same time, and the dual-port RAM responds to the PU and the DMAC in parallel.
It should be noted that the first RAM may be a first RAM group including a plurality of RAMs, and the second RAM may be a second RAM group including a plurality of RAMs.
Optionally, the number of RAMs in the first set of RAMs may be the same or different from the number of RAMs in the second set of RAMs.
When the first RAM is a first group RAM and the second RAM is a second group RAM, the PU and the DMAC may access the RAMs in one group at the same time, respectively, or the PU and the DMAC access the RAMs belonging to two groups at the same time.
Specifically, at a first time, the PU reads first data from the first group of RAM, and the DMAC writes second data taken out of the memory into the second group of RAM; at a second time, the PU reads the second data from the second set of RAMs, and the DMAC writes third data retrieved from the memory into the first set of RAMs.
Optionally, when the first RAM is a first RAM group and the second RAM is a second RAM group, the processing unit or the DMAC may also access each RAM in the same group in a time-sharing manner.
According to the processing core provided by the embodiment of the invention, the PU fetches data from the RAM, and the DMAC stores the data stored in the memory in the RAM for parallel processing, so that the computing power of the processing core can be further improved, and the processing core is more suitable for the operation of a neural network. In addition, a complex Cache circuit is not required to be designed in the processing core, so that the cost of the processing core is saved, the difficulty of chip design is also reduced, the Cache circuit is not required to be designed in the processing core, the processing unit does not need to fetch data from an external memory through the Cache, the reduction of the computing efficiency caused by Cache access failure is eliminated, the processing core can directly call data from the RAM of the storage management device, and the controllability of the program efficiency is improved.
Fig. 4 is a schematic structural diagram of a neural network according to an embodiment of the present invention.
As shown in fig. 4, the neural network has 2 layers, the output result of the first layer is used as the input of the second layer, and the output result of the second layer is the output of the entire neural network.
FIG. 5 is a schematic diagram of a processing core according to an embodiment of the present invention. The processing core shown in FIG. 5 is used to implement the computation of the neural network shown in FIG. 4.
As shown in fig. 5, the processing core includes a data storage management device, a processing unit, and a storage unit.
The data storage management device comprises a RAM _0, a RAM _1, a DAMC and a control unit.
The parameters and data of each layer of neural network are assumed to be smaller than the capacity of a single RAM, namely, RAM _0 and RAM _1 can contain the parameters and data calculated by the next layer of neural network.
The PU and the DMAC access different RAMs at the same time point, so that the PU executive program and the DMAC can store data in parallel, and the calculation and storage efficiency is optimized. For example, at a first point in time the PU accesses RAM _0 and the DAMC accesses RAM _1. At a second point in time, the PU accesses RAM _1 and the DAMC accesses RAM _0, looping.
In a specific embodiment, a PU sends an instruction lls _ dis, a control unit receives the instruction, generates a control signal C _ DMAC and sends the control signal C _ DMAC to the DMAC, the DMAC reads data indicated by the instruction from a memory according to the instruction and stores the data into a RAM _0 indicated by the instruction, and after the DMAC finishes storing the data, the DMAC sends a signal indicating that the storing is finished to the PU; the PU receives the signal of completion of the storage, issues a new instruction for instructing the DMAC to read new data from the memory for storage in RAM _1, and then to read data from RAM _ 0. When the DMAC stores the data in the RAM _1, the DMAC sends a signal of finishing the storage, the PU sends out an instruction again, then reads the data from the RAM _1, and the instruction sent out again instructs the DMAC to read new data from the memory to be stored in the RAM _ 0. Thus, the DMAC stores data and the PU reads data to achieve parallel processing, and both calculation and storage can be performed with maximum efficiency.
FIG. 6 is a timing diagram of a processing core performing neural network computations according to an embodiment of the present invention.
As shown in FIG. 6, at t0, when the PU reads data from RAM _0, i.e., RAM _0 is occupied by the PU executing the program of the first layer of the neural network, while the DMAC writes the data read from the memory into RAM _ 1; at t1, when RAM _1 is occupied by the PU executing a program of the second layer of the neural network, the DMAC writes the data read from the memory into RAM _ 0. By setting different access addresses of the PU and the DMAC, the PU does not need to fetch data from the memory, the complexity of a program can be reduced, and the computing power of a processing core can be improved.
It should be understood that the program executed by the PU at t2 may be configured to be the same as the program executed at t1, that is, the same calculation of the first layer of the neural network is performed at t2, or the program executed by the PU at t2 may be configured to be different from the program executed at t1, and the invention is not limited thereto.
According to an embodiment of the invention, a chip is provided, which comprises one or more processing cores provided in the above embodiments.
According to an embodiment of the present invention, there is provided a card including one or more chips provided by the above embodiments.
According to an embodiment of the invention, an electronic device is provided, which includes one or more cards provided in the above embodiments.
Fig. 7 is a data storage management method according to an embodiment of the present invention, where the method includes: step S101-step S102;
in step S101, the control unit receives an instruction, and generates and transmits a control signal according to the instruction.
And step S102, the direct memory access controller DMAC accesses the data in the RAM according to the control signal.
And the DMAC realizes the access of the data in the RAM according to the control signal, and comprises the following steps: and the DMAC reads data from the external memory according to the control signal and stores the data in the RAM indicated by the control signal.
And the DMAC realizes the access of the data in the RAM according to the control signal, and comprises the following steps: and the DMAC reads data from the RAM indicated by the control signal according to the control signal and stores the data in the external memory.
It is to be understood that the data storage management method provided by the present embodiment is executed by the data storage management apparatus provided by the foregoing embodiment of the present invention, and therefore, the same features are not described in duplicate herein.
One embodiment of the invention provides a flow diagram of a method for processing core processing data.
The method comprises the following steps: from the step S201 to the step S202,
step S201, a processing unit sends an access instruction;
in step S202, the data storage management device reads data from the storage unit according to the access instruction, and stores the data into the RAM of the data storage management device indicated by the access instruction.
In a preferred embodiment, the data storage management device signals completion of storage after storing data into the RAM of the data storage management device indicated by the fetch instruction.
The processing unit issues a new fetch instruction each time it receives a store complete signal from the DMAC, and reads the data from the RAM indicated by the fetch instruction that just completed.
In one embodiment, the processing unit and the direct memory access controller DMAC access different ones of said RAMs at the same point in time.
Specifically, at a first time, the processing unit reads first data from the first RAM, and the DMAC writes second data taken out of the memory into the second RAM. At a second time, the PU reads the second data from the second RAM, and the DMAC writes the third data retrieved from the memory into the first RAM.
It is to be understood that the method for processing data by the processing core provided in the present embodiment is performed by the processing core provided in the foregoing embodiment of the present invention, and therefore, the same features are not described in the same way here.
According to an embodiment of the present invention, there is provided an electronic apparatus including: a memory for storing computer readable instructions; and one or more processors for executing the computer readable instructions, so that the processors can realize the data storage management method of the foregoing embodiment when running.
According to an embodiment of the present invention, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of data storage management of the foregoing embodiment.
According to an embodiment of the present invention, a computer program product is provided, which includes computer instructions, and when the computer instructions are executed by a computing device, the computing device can execute the method for data storage management of the foregoing embodiment.
It is to be understood that the above-described embodiments of the present invention are merely illustrative of or explaining the principles of the invention and are not to be construed as limiting the invention. Therefore, any modifications, equivalents, improvements and the like which are made without departing from the spirit and scope of the present invention shall be included in the protection scope of the present invention. Further, it is intended that the appended claims cover all such variations and modifications as fall within the scope and boundaries of the appended claims or the equivalents of such scope and boundaries.

Claims (10)

  1. A data storage management apparatus, comprising:
    at least two Random Access Memories (RAMs);
    the control unit is used for receiving an instruction, generating and sending a control signal according to the instruction;
    and the direct memory access controller DMAC is used for realizing the access of the data in the random access memory RAM according to the control signal.
  2. The apparatus of claim 1, wherein the direct memory access controller DMAC is configured to enable access to data in the random access memory RAM according to the control signal, and comprises:
    the direct memory access controller DMAC is used for reading data from an external storage unit according to the control signal and storing the data in the random access memory RAM indicated by the control signal; or
    The direct memory access controller DMAC is used for reading data from the random access memory RAM indicated by the control signal according to the control signal and storing the data in an external storage unit.
  3. The apparatus according to claim 1 or 2, wherein the number of the random access memories RAM indicated by the control signal is one or more.
  4. A device according to any of claims 1-3, characterized in that all addresses of said random access memory RAM are addressed in common with addresses of said external memory locations; or all the addresses of the random access memory RAM are uniformly addressed.
  5. The apparatus according to any of claims 1 to 4, wherein the range of access addresses of the direct memory access controller DMAC is all of the address fields of the random access memory RAM and of the external memory unit.
  6. A processing core comprising a processing unit, a storage unit and a data storage management apparatus according to any one of claims 1 to 5;
    the processing unit is used for sending an instruction, and the instruction is used for instructing the data storage management device to realize the access of the data in the storage unit;
    the processing unit is further configured to read data required for executing a program from any of the random access memories RAM.
  7. The processing core of claim 6,
    wherein the instructions include fetch instructions and store instructions;
    the processing unit is used for sending an access instruction, and the access instruction is used for instructing the data storage management device to access data from the storage unit and store the data into a random access memory RAM indicated by the access instruction.
  8. The processing core of claim 7, wherein the direct memory access controller DMAC is configured to signal completion of a store upon completion of the fetch instruction;
    and the processing unit is used for sending a new access instruction according to the storage completion signal and reading data from the random access memory RAM indicated by the access instruction.
  9. The processing core of any of claims 6 to 8, wherein the processing unit and the direct memory access controller DMAC access different random access memories RAM at a same point in time.
  10. The processing core of claim 7, wherein said processing unit and said Direct Memory Access Controller (DMAC) access different ones of said Random Access Memory (RAM) at a same time, including:
    the at least two random access memories RAM include a first random access memory RAM and a second random access memory RAM;
    at a first time, the processing unit reads first data from the first random access memory RAM, the direct memory access controller DMAC writing second data fetched from the storage unit into the second random access memory RAM;
    at a second time, the processing unit reads the second data from the second random access memory RAM, and the direct memory access controller DMAC writes the third data fetched from the storage unit into the first random access memory RAM.
CN202080096316.9A 2020-04-03 2020-04-03 Data storage management device and processing core Pending CN115380292A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/083208 WO2021196160A1 (en) 2020-04-03 2020-04-03 Data storage management apparatus and processing core

Publications (1)

Publication Number Publication Date
CN115380292A true CN115380292A (en) 2022-11-22

Family

ID=77927304

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080096316.9A Pending CN115380292A (en) 2020-04-03 2020-04-03 Data storage management device and processing core

Country Status (2)

Country Link
CN (1) CN115380292A (en)
WO (1) WO2021196160A1 (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4563829B2 (en) * 2005-01-27 2010-10-13 富士通株式会社 Direct memory access control method, direct memory access control device, information processing system, program
US10515302B2 (en) * 2016-12-08 2019-12-24 Via Alliance Semiconductor Co., Ltd. Neural network unit with mixed data and weight size computation capability
CN106776360B (en) * 2017-02-28 2018-04-17 建荣半导体(深圳)有限公司 A kind of chip and electronic equipment
CN108416422B (en) * 2017-12-29 2024-03-01 国民技术股份有限公司 FPGA-based convolutional neural network implementation method and device
CN110647722B (en) * 2019-09-20 2024-03-01 中科寒武纪科技股份有限公司 Data processing method and device and related products

Also Published As

Publication number Publication date
WO2021196160A1 (en) 2021-10-07

Similar Documents

Publication Publication Date Title
US11482260B2 (en) Apparatuses and methods for scatter and gather
US20230334294A1 (en) Multi-memory on-chip computational network
CN110546611B (en) Reducing power consumption in a neural network processor by skipping processing operations
US11422720B2 (en) Apparatuses and methods to change data category values
KR101354346B1 (en) Memory having internal processors and methods of controlling memory access
CN107392309A (en) A kind of general fixed-point number neutral net convolution accelerator hardware structure based on FPGA
US20230236836A1 (en) Memory device for processing operation, data processing system including the same, and method of operating the memory device
KR20200066953A (en) Semiconductor memory device employing processing in memory (PIM) and operating method for the same
CN107657581A (en) Convolutional neural network CNN hardware accelerator and acceleration method
EP2423821A2 (en) Processor, apparatus, and method for fetching instructions and configurations from a shared cache
US20140181427A1 (en) Compound Memory Operations in a Logic Layer of a Stacked Memory
US20160133306A1 (en) Memory device having an adaptable number of open rows
US9053811B2 (en) Memory device refresh
US20210181974A1 (en) Systems and methods for low-latency memory device
CN108139994B (en) Memory access method and memory controller
Kim et al. MViD: Sparse matrix-vector multiplication in mobile DRAM for accelerating recurrent neural networks
KR20200108774A (en) Memory Device including instruction memory based on circular queue and Operation Method thereof
KR102360667B1 (en) Memory protocol with programmable buffer and cache sizes
EP3836031A2 (en) Neural network processor, chip and electronic device
WO2019118363A1 (en) On-chip computational network
EP3846036B1 (en) Matrix storage method, matrix access method, apparatus and electronic device
CN109117415B (en) Data sharing system and data sharing method thereof
CN112199040A (en) Storage access method and intelligent processing device
WO2021115149A1 (en) Neural network processor, chip and electronic device
CN114647446A (en) Storage-level storage device, computer module and server system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination