CN115280272A - Data access circuit and method - Google Patents

Data access circuit and method Download PDF

Info

Publication number
CN115280272A
CN115280272A CN202080098538.4A CN202080098538A CN115280272A CN 115280272 A CN115280272 A CN 115280272A CN 202080098538 A CN202080098538 A CN 202080098538A CN 115280272 A CN115280272 A CN 115280272A
Authority
CN
China
Prior art keywords
data
circuit
memory
internal
control
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202080098538.4A
Other languages
Chinese (zh)
Inventor
王维伟
罗飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Simm Computing Technology Co ltd
Original Assignee
Beijing Simm Computing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Simm Computing Technology Co ltd filed Critical Beijing Simm Computing Technology Co ltd
Publication of CN115280272A publication Critical patent/CN115280272A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers

Abstract

The embodiment of the disclosure discloses a data access circuit and a method. Wherein the data processing circuit comprises: a plurality of internal memories; a first data interface circuit for connecting a first internal memory of the plurality of internal memories with an external processing unit; a second data interface circuit for connecting a second internal memory of the plurality of internal memories with an external memory; a first control circuit for receiving a first control instruction to control the first data interface circuit and the second data interface circuit; and the second control circuit is used for receiving a second control instruction to control the data transmission direction of the second data interface circuit. The control circuit in the data access circuit controls the interface circuit, so that the capacity and the access efficiency of the internal memory are improved, and the technical problems of insufficient capacity, low access efficiency and complex circuit of the memory in the prior art are solved.

Description

Data access circuit and method Technical Field
The present disclosure relates to the field of data access, and more particularly, to a data access circuit and method.
Background
With the development of science and technology, the human society is rapidly entering the intelligent era. The chip is a fundamental stone of data processing and fundamentally determines the capability of people for processing data. From the application field, the chip mainly has two routes: one is a general chip route, such as a Central Processing Unit (CPU) or the like, which provides great flexibility but is less computationally efficient in Processing domain-specific algorithms; the other is a special chip route, such as a Tensor Processing Unit (TPU), which can exert higher effective computing power in some specific fields, but has poorer or even no Processing capability in the more versatile and general fields. Because the data of the intelligent era are various and huge in quantity, the chip is required to have extremely high flexibility, can process algorithms in different fields and in a new and new day, has extremely high processing capacity, and can rapidly process extremely large and rapidly increased data volume.
In neural network computing, multi-core or many-core chips are often used. Here, the processing cores in the multi (many) core architecture all have certain independent processing capability and have relatively large in-core storage space for storing programs, data and weights of the processing cores. The exertion of the basic computing power of a single processing core determines the ability of the entire chip to compute a neural network. The basic computing power of a single processing core is exerted according to the ideal computing power and the storage access efficiency of the computing unit of the single processing core.
Different memory cells are accessed at different speeds. Generally, the access speed of the register is fastest, and the register is accessed for hundreds of ps (picosecond) once; secondly, a Static Random Access Memory (SRAM) is used, and the Access speed is generally in the range of hundreds of ps to ns; the Access speed of the Memory unit is generally dozens to hundreds of ns, namely a Double Data Rate Synchronous Dynamic Random Access Memory (DDR SDRAM); and finally, other memories such as a hard disk and the like accessed through an IO (input output) port are slow in access speed, generally in ms (millisecond) level.
In neural network processing scenarios, access to memory units by processing units is of general interest. It is known that the speed of a processing unit is very fast, the main frequency is typically several hundred MHz (megahertz) to several GHz (gigahertz), that is, ps to ns, and the access speed of a memory unit is in tens of ns, and the speeds of the two are greatly different. How to solve the problem of poor speed of processing unit and memory access and effectively exert the computing power of the processing unit is a difficult point of modern CPU design.
To solve the speed matching problem between the processing unit and the memory unit, the scheme of fig. 1 is generally used in the prior art. As shown in FIG. 1, a PU (Processing Unit) is a Processing Unit, a Cache is a Cache, and a Memory is a Memory. In the scheme, a Cache is inserted between a processing unit PU and a Memory, the PU accesses the Memory in a layered and indirect mode, the Cache directly accesses the Memory, and the Memory is indirectly accessed through the Cache. In this scheme, the Cache is a map of the Memory, and its content is a subset of the Memory content. It is transparent to the program run by PU, has no functional meaning, has no independent addressing space, and its address is identical to the Memory address of access, i.e. the program can not access Cache alone.
For the existing scheme, the used parameters and data amount are huge in neural network calculation, and the capacity of the Cache is far exceeded. Therefore, the measures of reducing the access failure rate, which are adopted by the Cache based on the time locality characteristic and the space locality characteristic of the data, cannot be realized, so that the computing power of the processing unit is greatly reduced; and because the Cache circuit is complicated, the difficulty of chip design and the cost of the chip are greatly improved.
Disclosure of Invention
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
In order to solve the technical problems of insufficient capacity and complex circuit when the data size is large in cache in the prior art, the embodiment of the disclosure provides the following technical scheme:
in a first aspect, an embodiment of the present disclosure provides a data access circuit, including:
a plurality of internal memories;
a first data interface circuit for connecting a first internal memory of the plurality of internal memories with an external processing unit;
a second data interface circuit for connecting a second internal memory of the plurality of internal memories with an external memory;
a first control circuit for receiving a first control instruction to control the first data interface circuit and the second data interface circuit;
and the second control circuit is used for receiving a second control instruction to control the data transmission direction of the second data interface circuit.
Further, the addresses of the plurality of internal memories are the same.
Further, the first data interface circuit includes:
a first switch circuit comprising a plurality of connection states, wherein each connection state is to connect the processing unit with one of the plurality of internal memories.
Further, the second data interface circuit includes:
a second switch circuit including a plurality of connection states, wherein each connection state is used to connect the external memory with one of the plurality of internal memories.
Further, the second data interface circuit further includes:
a memory controller connected between the external memory and the plurality of internal memories for controlling the data exchange between the external memory and the plurality of internal memories.
Further, the first control circuit includes:
the switch control circuit is used for receiving a first control instruction sent by the external processing unit to generate a first switch control signal and a second switch control signal, wherein the first switch control signal is used for setting the connection state of the first switch circuit, and the second switch control signal is used for setting the connection state of the second switch circuit.
Further, the second control circuit includes:
and the access control circuit is used for receiving a second control instruction sent by the external processing unit to generate a control signal, wherein the control signal is used for controlling the storage controller to fetch the data indicated by the control signal from the second internal memory into the external memory or controlling the storage controller to fetch the data indicated by the control signal from the external memory into the second internal memory.
Further, the plurality of internal memories are addressed in unison with the external memory.
In a second aspect, an embodiment of the present disclosure provides a data access method, including:
receiving a first control instruction to determine a first internal memory connected with an external processing unit and a second internal memory connected with the external memory;
receiving a second control instruction to determine a data transfer direction between the external memory and a second internal memory;
the external processing unit acquires data from the first internal memory or transmits data to the first internal memory;
and sending the data in the second internal memory to the external memory or storing the data acquired from the external memory into the second internal memory according to the data transmission direction.
In a third aspect, an embodiment of the present disclosure provides a data access apparatus, including:
at least one data access circuit as described in any one of the first aspects.
In a fourth aspect, an embodiment of the present disclosure provides an electronic device, including: a memory for storing computer readable instructions; and one or more processors configured to execute the computer-readable instructions, such that the processors when executed implement the data access method of any of the second aspects.
In a fifth aspect, the disclosed embodiments provide a non-transitory computer-readable storage medium, wherein the non-transitory computer-readable storage medium stores computer instructions for causing a computer to execute the data access method of any one of the foregoing second aspects.
In a sixth aspect, the present disclosure provides a computer program product, wherein: comprising computer instructions which, when executed by a computing device, may perform the data processing method of any of the preceding second aspects.
In a seventh aspect, an embodiment of the present disclosure provides a chip, which includes at least one data processing apparatus described in the third aspect.
In an eighth aspect, an embodiment of the present disclosure provides a computing device, which includes at least one chip described in the seventh aspect.
The embodiment of the disclosure discloses a data access circuit and a method. Wherein the data processing circuit comprises: a plurality of internal memories; a first data interface circuit for connecting a first internal memory of the plurality of internal memories with a processing unit external to the memory management circuit; a second data interface circuit for connecting a second internal memory of the plurality of internal memories with an external memory; a first control circuit for receiving a first control instruction to control the first data interface circuit and the second data interface circuit; and the second control circuit is used for receiving a second control instruction to control the data transmission direction of the second data interface circuit. The control circuit in the data access circuit controls the interface circuit, so that the capacity and the access efficiency of the internal memory are improved, and the technical problems of insufficient capacity, low access efficiency and complex circuit of the memory in the prior art are solved.
The foregoing description is only an overview of the technical solutions of the present disclosure, and in order to make the technical means of the present disclosure more clearly understood, the present disclosure may be implemented in accordance with the content of the description, and in order to make the above and other objects, features, and advantages of the present disclosure more clearly understood, the following preferred embodiments are specifically illustrated below, and the detailed description is given in conjunction with the accompanying drawings.
Drawings
The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and components are not necessarily drawn to scale.
FIG. 1 is a schematic diagram of a prior art processing scheme;
FIG. 2 is a schematic diagram of an application scenario of a data access circuit according to an embodiment of the disclosure;
FIG. 3a is a schematic diagram of a data access circuit according to an embodiment of the present disclosure;
FIG. 3b is a diagram illustrating an embodiment of a data access circuit;
FIG. 4 is a flow chart illustrating a data access method according to an embodiment of the disclosure;
FIG. 5 is a flow chart illustrating another data access method according to an embodiment of the present disclosure;
FIG. 6 is a flow chart illustrating another data access method according to an embodiment of the present disclosure;
FIG. 7 is a schematic diagram of a practical application scenario of the embodiment of the present disclosure;
FIG. 8 is a timing diagram of neural network calculations performed using an embodiment of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and the embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.
It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.
It is noted that references to "a" or "an" in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will appreciate that references to "one or more" are intended to be exemplary and not limiting unless the context clearly indicates otherwise.
The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.
Fig. 2 is a schematic view of an application scenario of a data access circuit in an embodiment of the disclosure. Fig. 2 shows a data access device including the data access circuit of the present disclosure. The device comprises a processing unit PU, a data access circuit, and an external Memory. The data access circuit is positioned between the PU and the Memory and is responsible for data transfer between the PU and the Memory.
Fig. 3a is a schematic structural diagram of a data access circuit in an embodiment of the disclosure. As shown in fig. 3, the data access circuit 300 includes: a plurality of internal memories 301; a first data interface circuit 302 for connecting a first internal memory of the plurality of internal memories 301 with a processing unit PU external to the memory management circuit; a second data interface circuit 303, configured to connect a second internal Memory of the plurality of internal memories 301 with an external Memory; a first control circuit 304 for receiving a first control instruction to control the first data interface circuit 302 and the second data interface circuit 303; a second control circuit 305, configured to receive a second control instruction to control a data transfer direction of the second data interface circuit 303.
For example, the plurality of internal memories are random access memories RAM, which can directly exchange data with the processor, and it can be understood that, although only two internal memories are shown in fig. 3, the number of the internal memories may be set arbitrarily in practical applications, and will not be described herein again.
The first data interface circuit 302, under the control of the first control circuit 304, determines at least one first internal memory from the plurality of internal memories 301 to connect with the external processing unit, so that data transmission between the first internal memory and the external processing unit PU is possible; the second data interface circuit 303 determines at least one second internal memory from the plurality of internal memories 301 to connect with the external memory under the control of the first control circuit 304, so that data transmission between the second internal memory and the external memory is possible. The first control circuit 304 controls the first data interface circuit 302 and the second data interface circuit 303 by receiving a first control instruction, optionally, the first control instruction is issued by the external processing unit, after receiving the first control instruction, the first control circuit 304 decodes and executes the first control instruction, and configures the first data interface circuit 302 and the second data interface circuit 303 according to a parameter in the first control instruction, so that the first data interface circuit 302 communicates with the corresponding first internal memory and the corresponding external processing unit, and the second data interface circuit 303 communicates with the corresponding second internal memory and the corresponding external processing unit.
The second data interface circuit 303 determines the data transfer direction of the second data interface circuit, i.e. sending data from a second internal memory to the external memory or retrieving data from an external memory to store in the second internal memory, under the control of the second control circuit 305.
It is to be understood that said first internal memory and said second internal memory may be one and the same memory, i.e. the external processing unit and the external memory are connected to one and the same internal memory.
Optionally, the first data interface circuit 302 includes: a first switch circuit comprising a plurality of connection states, wherein each connection state is to connect the processing unit with one of the plurality of internal memories. Fig. 3b is a schematic diagram of a specific structure of a data access circuit according to an embodiment of the disclosure. Illustratively, as shown in fig. 3b, the first data interface circuit 302 includes a first switch circuit SW _1, wherein the connection state of the first switch circuit SW _1 is the same as the number of internal memories, as shown in the example of fig. 3b, the internal memories include two E _ RAM and O _ RAM, so the first switch circuit SW _1 includes two connection states of 0 and 1 for connecting the E _ RAM and the O _ RAM, respectively, and both connection states are connected to the external processing unit, so that the first switch circuit SW _1 can only be in one connection state at each time to connect the external processing unit and the internal memory unit.
Optionally, the second data interface circuit 303 includes: a second switch circuit including a plurality of connection states, wherein each connection state is used to connect the external memory with one of the plurality of internal memories. Illustratively, as shown in fig. 3b, the second data interface circuit 303 includes a second switch circuit SW _2, wherein the connection state of the first switch circuit SW _2 is the same as the number of the internal memories, and as shown in the example of fig. 3b, the internal memories include two E _ RAM and O _ RAM, so that the first switch circuit SW _2 includes two connection states of 0 and 1 for connecting the E _ RAM and the O _ RAM, respectively, and both connection states are connected to the external memory, so that the first switch circuit SW _2 can only be in one connection state at each time to connect the external memory and the internal memory unit.
Optionally, the second data interface circuit 303 further includes: a memory controller connected between the external memory and the plurality of internal memories for controlling the data exchange between the external memory and the plurality of internal memories. Illustratively, as shown in fig. 3b, the second data interface circuit 303 further includes a memory controller DMAC, which is located between the external memory and the plurality of internal memories, and in particular, connected between the second switch circuit SW _1 and the external memory, for controlling the data exchange between the external memory and the plurality of internal memories, i.e., controlling data from the external memory to the plurality of internal memories or controlling data from the plurality of internal memories to the external memory.
Optionally, the first control circuit 304 includes: the switch control circuit is used for receiving a first control instruction sent by the external processing unit to generate a first switch control signal and a second switch control signal, wherein the first switch control signal is used for setting the connection state of the first switch circuit, and the second switch control signal is used for setting the connection state of the second switch circuit. Illustratively, as shown in fig. 3b, the first control circuit 304 includes a switch control circuit SW _ ctrl that receives a first control instruction ISW _ dis sent by the external processing unit PU, and after receiving the first control instruction ISW _ dis, decodes and executes it to generate a first switch control signal C _ SW1 and a second switch control signal C _ SW2, where the first control instruction includes a parameter that controls the first switch and the second switch to indicate which connection state the first switch and the second switch should be in at the present moment, so that the first switch control signal C _ SW1 and the second switch control signal C _ SW2 can be generated to set the connection state of the first switch SW _1 and the second switch SW _2, respectively, so that the internal storage unit to which the external processing unit PU is connected at the present moment and the internal storage unit to which the external Memory is connected can be determined, respectively, as shown in the example of fig. 3b, at the external processing unit is connected to E _ RAM, and the external Memory is connected to O _ RAM.
Optionally, the second control circuit 305 includes: and the access control circuit is used for receiving a second control instruction sent by the external processing unit to generate a control signal, wherein the control signal is used for controlling the storage controller to fetch the data indicated by the control signal from the second internal memory into the external memory or controlling the storage controller to fetch the data indicated by the control signal from the external memory into the second internal memory. Illustratively, as shown in fig. 3b, the second control circuit 305 includes an access control circuit LS _ mem _ ctrl, which receives a second control instruction Ils _ dis sent by the external processing unit PU, and after receiving the second control instruction Ils _ dis, decodes and executes it to generate a control signal C _ DMAC, wherein the second control instruction Ils _ dis includes a fetch instruction ld _ mem instruction which fetches data from the external Memory and stores the data in the internal Memory RAM and a st _ mem instruction which stores the data in the internal Memory RAM into the external Memory; the control signal C _ DMAC is used for configuring and starting the storage controller DMAC according to parameters in the second control instruction so as to control the storage controller DMAC to take out data indicated by the control signal C _ DMAC from an internal Memory O _ RAM and store the data in the external Memory, or control the storage controller DMAC to take out data indicated by the control signal C _ DMAC from the external Memory and store the data in the internal Memory O _ RAM.
In the above-described embodiment, the first switch circuit and the second switch circuit are two independent circuits, and the connection state thereof is controlled by the switch control circuit, so that the external processing unit PU accesses the internal memory RAM through the first switch circuit, and the memory controller DMAC accesses the internal memory RAM through the second switch circuit, which can operate independently and in parallel.
In the disclosed embodiment, the addresses of the plurality of internal memories are the same, so that the addressing space of the PU is unchanged every time the PU accesses one internal memory; while the internal Memory may be addressed in common with the external Memory, e.g. an address range of low addresses is allocated to the internal Memory RAM and an address range of high addresses is allocated to the external Memory. For the external processing unit PU, it only needs to access the address of the internal memory RAM, so its access address range is limited within the internal memory RAM; for the Memory controller DMAC, since it accesses both the internal Memory RAM and the external Memory, its access address range is a full address range in which the internal Memory and the external Memory are addressed collectively. Thus, when the external memory PU reads and writes the internal memory, the same address is used, and the same internal memory RAM is always used from the perspective of the external processor PU without distinguishing which specific internal memory RAM is used; similarly, for the memory controller DMAC, it uses an address (e.g., a uniformly addressed low address range), and therefore it does not need to distinguish which internal memory is specific, and thus it does not need to consider parallel execution of calculation and storage, and the programming is performed according to a conventional serial program, which can greatly reduce the complexity of programming.
Fig. 4 is a flowchart illustrating a data access method according to an embodiment of the disclosure. As shown in fig. 4, the data access method includes:
step S401, receiving a first control instruction to determine a first internal memory connected with an external processing unit from a plurality of internal memories;
step S402, the external processing unit acquires data from the first internal memory or transmits data to the first internal memory.
Wherein in step S401, the switch control circuit of the data access circuit decodes and executes the control signal for generating the first switch after receiving the first control instruction sent by the external processing unit PU to the data access circuit 300, so as to control the connection state of the first switch circuit such that the external processing unit is connected to the first internal memory of the plurality of internal memories; thereafter, in step S402, the external processing unit performs its arithmetic operation, and acquires data from the first temple memory or transmits data to the internal memory.
Fig. 5 is a flowchart illustrating another data access method according to an embodiment of the disclosure. As shown in fig. 5, the data access method includes:
a step S501 of receiving a first control instruction to determine a second internal memory connected to the external memory from among the plurality of internal memories;
step S502, receiving a second control instruction to determine the data transmission direction between the external memory and the second internal memory;
step S503, sending the data in the second internal memory to the external memory or storing the data obtained from the external memory into the second internal memory according to the data transfer direction.
In step S501, after receiving the first control instruction sent by the external processing unit PU to the data access circuit 300, the switch control circuit of the data access circuit decodes the first control instruction and executes a control signal for generating a second switch to control the connection state of the second switch circuit so that the external processing unit is connected to a second internal memory of the plurality of internal memories; then, in step S502, the access control circuit of the data access circuit decodes and executes generation of a control signal for controlling the memory controller after receiving the first control instruction sent by the external processing unit PU to the data access circuit 300, so as to determine the data transfer direction between the external memory and the second internal memory; then, in step S503, the memory controller sends the data indicated in the second control command from the second internal memory to the external memory or obtains the data indicated in the second control command from the external memory and stores the data in the internal memory according to the data transfer direction.
The two data access methods are respectively independently executed by circuits at two ends of the data access circuit, can be independently executed in parallel, and can also be put together to complete more complex data access tasks.
Therefore, an embodiment of the present disclosure further provides a data access method, including:
step S601 of receiving a first control instruction to determine a first internal memory connected to an external processing unit and a second internal memory connected to an external memory;
step S602, receiving a second control command to determine a data transfer direction between the external memory and a second internal memory;
step S603, the external processing unit acquires data from the first internal memory or transmits data to the first internal memory;
step S604, sending the data in the second internal memory to the external memory or storing the data obtained from the external memory into the second internal memory according to the data transfer direction.
An example of the above method is as follows: as shown in fig. 3b, the external processor PU sends a first control instruction Isw _ dis to the data access circuit 300, the switch control circuit of the data access circuit receives the instruction and generates a first switch control signal SW _1 and a second switch control signal SW _2, wherein SW _1 controls the connection state of the first switch to be 0, so that the external processing unit is connected to the E _ RAM, and SW _2 controls the connection state of the second switch to be 1, so that the DAMC is connected to the O _ RAM. The external processing unit PU then sends a second control instruction Ils _ dis to the data access circuit 300, where Ils _ dis is taken as a fetch instruction ld _ mem, and after receiving the instruction ld _ mem, the access control circuit of the data access circuit 300 decodes and executes the instruction ld _ mem to obtain a control signal C _ DMAC of the storage controller DMAC, so as to configure the read address, the write address and the data size of the DMAC, and to read the data block determined by the instruction ld _ mem from the storage section of the read address in the external Memory, and write the data block into the storage section of the write address of the O _ RAM, where both the read address and the write address may be the head address. At the same time, the external processing unit PU may execute its own instruction, the operand of the instruction is read from the E _ RAM, or the execution result data of the instruction is stored in the E _ RAM.
Fig. 7 is a schematic view of an actual application scenario of the embodiment of the present disclosure. The data access circuit in the embodiment of the present disclosure is used to cause the external processing unit PU to perform the calculation of the neural network. Fig. 7 is a schematic diagram of the neural network, the neural network has two layers, and the used parameters and data of the neural network of each layer are smaller than the capacity of a single RAM, i.e., both the E _ RAM and the O _ RAM can accommodate the parameters and data calculated by the neural network of the next layer. Wherein, the E _ RAM is corresponding to the first layer neural network layer1, and the O _ RAM is corresponding to the second layer neural network layer2, so that the external processing unit PU can switch between the E _ RAM and the O _ RAM at different times to execute the programs of the two layers of neural networks.
FIG. 8 is a timing diagram of neural network calculations performed using an embodiment of the present disclosure. As shown in fig. 8, at time t0, the first switch circuit selects the E _ RAM to be connected to the PU, the second switch circuit selects the O _ RAM to be connected to the DMAC, at this time, the PU obtains operands for calculating layer1 through the E _ RAM and performs calculation of layer1, or stores the result of calculating layer1 into the E _ RAM, and the DMAC updates data in the O _ RAM according to the second control instruction of the PU, for example, stores data in Memory into the O _ RAM; at the moment of t1, the PU sends out a first control instruction, the control switch controls to generate a first switch control signal and a second switch control signal so as to control the first switch circuit to select the O _ RAM and control the second switch circuit to select the E _ RAM, at the moment, the PU obtains operands for calculating layer2 through the O _ RAM and executes calculation of layer2, or the result for calculating layer2 is stored in the O _ RAM, and the DMAC updates data in the E _ RAM according to the second control instruction of the PU, for example, the data in the Memory is stored in the E _ RAM; at the moment of t2, the PU sends out a first control instruction, the control switch controls to generate a first switch control signal and a second switch control signal so as to control the first switch circuit to select the E _ RAM and control the second switch circuit to select the O _ RAM, at the moment, the PU obtains operands for calculating layer1 through the E _ RAM and executes calculation of layer1, or the result for calculating layer1 is stored in the E _ RAM, and the DMAC updates data in the O _ RAM according to the second control instruction of the PU, for example, the data in the Memory is stored in the O _ RAM. The above steps are alternated circularly until the calculation task of the neural network is completed.
As can be seen from FIG. 8, when the E _ RAM is occupied by the computation of the PU, the O _ RAM is simultaneously performing the parameter and data update of the next computation; similarly, when the O _ RAM is occupied by the calculation of the PU, the E _ RAM simultaneously updates the parameters and data of the next calculation. In this way, calculation and parameter/data update are performed in parallel, and calculation power can be exerted to the maximum extent.
It is understood that the RAM may not be switched during the above operation, and the calculation and the parameter/data update are performed on the same RAM, so that the calculation and the storage are performed serially on the same RAM.
The embodiment of the present disclosure further provides a data access apparatus, which includes: at least one data access circuit as described in any of the above embodiments.
The data access device is, for example, a processing core.
The disclosed embodiment also provides a chip, which is characterized by comprising at least one data access circuit as described in any one of the above embodiments.
The embodiment of the present disclosure provides a computer program product, wherein: comprising computer instructions which, when executed by a computing device, can perform the data access method of any of the preceding embodiments.
The non-transitory computer readable storage medium stores computer instructions for causing a computer to execute the data access method of any one of the third aspects.
The embodiment of the present disclosure provides a computing device, which includes at least one chip described in any one of the foregoing embodiments.
The flowchart and block diagrams in the figures of the present disclosure illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

Claims (10)

  1. A data access circuit, comprising:
    a plurality of internal memories;
    a first data interface circuit for connecting a first internal memory of the plurality of internal memories with an external processing unit;
    a second data interface circuit for connecting a second internal memory of the plurality of internal memories with an external memory;
    a first control circuit for receiving a first control instruction to control the first data interface circuit and the second data interface circuit;
    and the second control circuit is used for receiving a second control instruction to control the data transmission direction of the second data interface circuit.
  2. The data access circuit of claim 1, wherein the addresses of the plurality of internal memories are the same.
  3. A data access circuit according to claim 1 or 2, wherein the first data interface circuit comprises:
    a first switch circuit comprising a plurality of connection states, wherein each connection state is to connect the processing unit with one of the plurality of internal memories.
  4. A data access circuit according to any of claims 1-3, wherein the second data interface circuit comprises:
    a second switch circuit including a plurality of connection states, wherein each connection state is used to connect the external memory with one of the plurality of internal memories.
  5. The data access circuit of claim 4, wherein the second data interface circuit further comprises:
    a memory controller connected between the external memory and the plurality of internal memories for controlling the data exchange between the external memory and the plurality of internal memories.
  6. A data access circuit according to any of claims 1-4, wherein the first control circuit comprises:
    the switch control circuit is used for receiving a first control instruction sent by the external processing unit to generate a first switch control signal and a second switch control signal, wherein the first switch control signal is used for setting the connection state of the first switch circuit, and the second switch control signal is used for setting the connection state of the second switch circuit.
  7. The data access circuit of any of claims 1-6, wherein the second control circuit comprises:
    and the access control circuit is used for receiving a second control instruction sent by the external processing unit to generate a control signal, wherein the control signal is used for controlling the storage controller to fetch the data indicated by the control signal from the second internal memory into the external memory or controlling the storage controller to fetch the data indicated by the control signal from the external memory into the second internal memory.
  8. The data access circuit of claim 1, wherein the plurality of internal memories are addressed collectively with the external memory.
  9. A method for accessing data, comprising:
    receiving a first control instruction to determine a first internal memory connected with an external processing unit and a second internal memory connected with the external memory;
    receiving a second control instruction to determine a data transfer direction between the external memory and a second internal memory;
    the external processing unit acquires data from the first internal memory or transmits data to the first internal memory;
    and sending the data in the second internal memory to the external memory or storing the data acquired from the external memory into the second internal memory according to the data transmission direction.
  10. A chip comprising at least one data access circuit as claimed in any one of claims 1 to 8.
CN202080098538.4A 2020-04-03 2020-04-03 Data access circuit and method Pending CN115280272A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/083195 WO2021196158A1 (en) 2020-04-03 2020-04-03 Data access circuit and method

Publications (1)

Publication Number Publication Date
CN115280272A true CN115280272A (en) 2022-11-01

Family

ID=77927131

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080098538.4A Pending CN115280272A (en) 2020-04-03 2020-04-03 Data access circuit and method

Country Status (2)

Country Link
CN (1) CN115280272A (en)
WO (1) WO2021196158A1 (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5745785A (en) * 1995-05-09 1998-04-28 Sofmap Future Design, Inc. System for alternatively transferring data from external memory to memory device and from memory device to internal memory depending upon processing unit's operational
US7035953B2 (en) * 2002-05-03 2006-04-25 Hewlett-Packard Development Company, L.P. Computer system architecture with hot pluggable main memory boards
CN107239823A (en) * 2016-08-12 2017-10-10 北京深鉴科技有限公司 A kind of apparatus and method for realizing sparse neural network
CN107239825B (en) * 2016-08-22 2021-04-09 赛灵思电子科技(北京)有限公司 Deep neural network compression method considering load balance
CN108875920A (en) * 2018-02-12 2018-11-23 北京旷视科技有限公司 Operation method, device, system and the storage medium of neural network

Also Published As

Publication number Publication date
WO2021196158A1 (en) 2021-10-07

Similar Documents

Publication Publication Date Title
CN107301455B (en) Hybrid cube storage system for convolutional neural network and accelerated computing method
US9141173B2 (en) Thread consolidation in processor cores
JP7074831B2 (en) Network-on-chip data processing methods and equipment
JP7008983B2 (en) Methods and equipment for accessing tensor data
KR20200108774A (en) Memory Device including instruction memory based on circular queue and Operation Method thereof
CN111433758A (en) Programmable operation and control chip, design method and device thereof
EP3846036B1 (en) Matrix storage method, matrix access method, apparatus and electronic device
CN104679592A (en) Method and system for dynamically distributing resources in microcontroller unit MCU
KR20200138411A (en) Network-on-chip data processing method and device
US10915470B2 (en) Memory system
CN115280272A (en) Data access circuit and method
JP7042870B2 (en) Methods, devices, devices and computer-readable storage media performed by computing devices
KR20230169684A (en) Pim computing system and method for pim arithmetic offloading thereof
WO2021218492A1 (en) Task allocation method and apparatus, electronic device, and computer readable storage medium
US20130346735A1 (en) Enhanced system management bus
CN115904681A (en) Task scheduling method and device and related products
KR20200139256A (en) Network-on-chip data processing method and device
KR20200138414A (en) Network-on-chip data processing method and device
KR20200138413A (en) Network-on-chip data processing method and device
CN115794604B (en) Data generation method, device, equipment, medium and program product
US11836082B2 (en) Neural processing device and load/store method of neural processing device
US20230273733A1 (en) In-memory compute core for machine learning acceleration
US11080059B1 (en) Reducing firmware size and increasing firmware performance
CN112396186B (en) Execution method, execution device and related product
US20230259486A1 (en) Neural processing unit synchronization systems and methods

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination