CN116521096A - Memory access circuit, memory access method, integrated circuit, and electronic device - Google Patents

Memory access circuit, memory access method, integrated circuit, and electronic device Download PDF

Info

Publication number
CN116521096A
CN116521096A CN202310807727.XA CN202310807727A CN116521096A CN 116521096 A CN116521096 A CN 116521096A CN 202310807727 A CN202310807727 A CN 202310807727A CN 116521096 A CN116521096 A CN 116521096A
Authority
CN
China
Prior art keywords
queue
data
processing unit
return
access
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310807727.XA
Other languages
Chinese (zh)
Other versions
CN116521096B (en
Inventor
请求不公布姓名
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Moore Threads Technology Co Ltd
Original Assignee
Moore Threads Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Moore Threads Technology Co Ltd filed Critical Moore Threads Technology Co Ltd
Priority to CN202310807727.XA priority Critical patent/CN116521096B/en
Publication of CN116521096A publication Critical patent/CN116521096A/en
Application granted granted Critical
Publication of CN116521096B publication Critical patent/CN116521096B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Memory System (AREA)

Abstract

The present disclosure relates to a memory access circuit, a memory access method, an integrated circuit, and an electronic device, and relates to the field of electronic technology, where the memory access circuit is configured to access a plurality of different memory units, and includes a plurality of processing units, each processing unit includes a queue module, an arbiter corresponding to each memory unit, a scheduling queue, and a scheduling selector; the queue module is used for recording the request sequence of the processing unit for accessing the plurality of storage units, storing the return data of the plurality of storage units and taking out the return data from the queue module to the processing unit according to the recorded request sequence. The embodiment of the disclosure can reduce the return time of the return data and improve the flexibility and the expandability of the memory access circuit.

Description

Memory access circuit, memory access method, integrated circuit, and electronic device
Technical Field
The present disclosure relates to the field of electronic technologies, and in particular, to a memory access circuit, a memory access method, an integrated circuit, and an electronic device.
Background
Along with the rapid development of the integrated circuit industry, various processor chips are more and more widely applied to various industries, and can be applied to scenes such as network communication, mobile phones, set top boxes, liquid crystal televisions, medical equipment, security equipment, industrial control equipment, intelligent ammeter, intelligent wearing, internet of things, automobile electronics and the like.
In various processor chips, a processing unit may access a memory unit to achieve various target tasks, where memory access circuitry within the processor chip may be used to manage communications between the processing unit and the memory unit, which may directly affect the operating efficiency and stability of the overall processor chip.
Disclosure of Invention
The present disclosure proposes a memory access technique.
According to an aspect of the present disclosure, there is provided a memory access circuit for accessing a plurality of different memory cells, the memory access circuit comprising: the system comprises a plurality of processing units, a plurality of storage units and a plurality of scheduling units, wherein each processing unit comprises a queue module, an arbiter corresponding to each storage unit, a scheduling queue and a scheduling selector; the output end of any processing unit is respectively connected with the input end of each arbiter, the output end of each arbiter is connected with the input end of a corresponding scheduling queue and the input end of a corresponding storage unit, the processing units are used for generating request information for accessing a plurality of storage units, the queue module of each processing unit is used for recording the request sequence of the processing unit for accessing the plurality of storage units, the arbiter is used for arbitrating the request information from the plurality of processing units, sending the request information to the storage units according to the arbitration sequence, and writing the processing unit identifiers of the plurality of processing units into the scheduling queue according to the arbitration sequence; the input end of any scheduling selector is connected with the output end of a corresponding storage unit and the output end of a corresponding scheduling queue, the output end of the scheduling selector is respectively connected with the input end of each processing unit, the scheduling selector is used for transmitting the return data of the storage unit to the processing unit indicated by the processing unit identification according to the processing unit identification read from the scheduling queue, the queue module of the processing unit is used for storing the return data of a plurality of storage units, and the return data is taken out from the queue module to the processing unit according to the request sequence of the processing unit for accessing the plurality of storage units.
In a possible implementation manner, the processing unit is configured to: simultaneously generating a plurality of request messages, each request message being used to access a different memory unit, wherein the plurality of request messages generated simultaneously have the same request sequence; and/or receiving a plurality of return data simultaneously, each return data from a different storage unit, wherein the queue module of the processing unit is used for storing the return data of a plurality of storage units simultaneously.
In one possible implementation, the queue module includes an access queue, a return queue, and a data queue; the access queue is used for recording the request sequence of the processing unit for accessing the plurality of storage units; the return queue is used for recording the return sequence of the return data received by the processing unit from the plurality of storage units; the data queue is used for storing the return data of a plurality of storage units according to the request sequence and the instruction of the return sequence, and fetching the return data from the data queue to the processing unit.
In a possible implementation manner, the access queue includes a plurality of rows and a plurality of columns, the access queue records the request sequence of the processing unit for accessing a plurality of storage units by performing setting operations on different rows of different columns, wherein the request sequence corresponding to each row of the access queue is the same, and each column of the access queue is used for recording the request sequence of the processing unit for accessing a different storage unit.
In one possible implementation manner, the return queue includes a plurality of rows and a plurality of columns, and the return queue records a return sequence of the processing unit receiving return data of a plurality of storage units by performing a set operation on different rows of different columns, where a return sequence corresponding to each row of the return queue is the same, and each column of the return queue is used for recording a return sequence of the processing unit receiving a different storage unit.
In a possible implementation, the data queue includes a plurality of rows and a plurality of columns, and the data queue records that the processing unit receives the return data of a plurality of storage units by performing write operations on different rows of different columns, where each column of the data queue is used to store the return data of a corresponding storage unit.
In a possible implementation manner, the processing unit is configured to: under the condition that the data queue receives the return data of any storage unit, column data corresponding to the storage unit is taken out from the return queue, and inversion operation is carried out to obtain first data, wherein the first data is used for representing the space information of the return data of the storage unit in the data queue; taking out a column of data recording the return sequence of the storage unit from the access queue as second data; performing AND operation on the first data and the second data to obtain third data; writing the return data to a write location indicated by the third data in a corresponding column of the storage unit of the data queue, and performing a set operation on the write location indicated by the third data in the corresponding column of the storage unit of the return queue.
In a possible implementation, the processing unit is further configured to: responding to the returned data to write into the data queue, and acquiring the first line data of the returned queue and the first line data of the access queue; under the condition that the first line data of the return queue is the same as the first line data of the access queue, the return data of the first line in the data queue is taken out to a processing unit; the data queue dequeues the return data of the first line, the access queue dequeues the first line data, and the return queue dequeues the first line data.
In a possible implementation manner, the access queue, the return queue and the data queue of the same processing unit have the same depth, wherein the depth represents the number of lines of the access queue, the return queue and the data queue, the depth is determined according to the concurrency of the processing unit, and the concurrency of the processing unit represents the number of times that the processing unit continuously sends request information to a storage unit without waiting for the return data of the request information; the bit width of the data queue is determined according to the number of the storage units and the bit width of the return data, the bit widths of the access queue and the return queue are the same, and the bit widths of the access queue and the return queue are determined according to the number of the storage units.
In a possible implementation manner, the memory access circuit further includes at least one first buffer, at least one second buffer, and an output end of each arbiter is connected to an input end of a corresponding memory unit through the at least one first buffer, and an output end of each memory unit is connected to an input end of a corresponding schedule selector through the at least one second buffer.
According to an aspect of the present disclosure, there is provided a memory access method applied to a memory access circuit including: the system comprises a plurality of processing units, a plurality of storage units and a scheduling unit, wherein each processing unit comprises a queue module, a scheduling queue corresponding to each storage unit and a scheduling selector; the input end of any scheduling selector is connected with the output end of the corresponding storage unit and the output end of the corresponding scheduling queue, and the output end of the scheduling selector is respectively connected with the input end of each processing unit; the memory access method comprises the following steps: the storage unit generates return data in response to the request information from the processing unit; the dispatch selector is used for transmitting the return data of the storage unit to the processing unit indicated by the processing unit identifier according to the processing unit identifier read from the dispatch queue; the queue module of the processing unit stores return data of a plurality of storage units, and fetches the return data from the queue module to the processing unit according to a request order of the processing unit to access the plurality of storage units.
In one possible implementation, the queue module of the processing unit stores return data of a plurality of storage units, including: the processing unit receives a plurality of return data simultaneously, each return data from a different storage unit, and the queue module of the processing unit is used for storing the return data of the plurality of storage units simultaneously.
In one possible implementation, the queue module includes an access queue, a return queue, and a data queue; the access queue is used for recording the request sequence of the processing unit for accessing the plurality of storage units; the return queue is used for recording the return sequence of the return data received by the processing unit from the plurality of storage units; the data queue is used for storing the return data of a plurality of storage units according to the request sequence and the instruction of the return sequence, and fetching the return data from the data queue to the processing unit.
In a possible implementation manner, the access queue includes a plurality of rows and a plurality of columns, the access queue records the request sequence of the processing unit for accessing a plurality of storage units by executing setting operations on different rows of different columns, wherein the request sequence corresponding to each row of the access queue is the same, and each column of the access queue is used for recording the request sequence of the processing unit for accessing a different storage unit; the return queues comprise a plurality of rows and a plurality of columns, the return queues record the return sequence of the processing unit for receiving the return data of a plurality of storage units by executing setting operation on different rows of different columns, wherein the return sequence corresponding to each row of the return queues is the same, and each column of the return queues is respectively used for recording the return sequence of the processing unit for receiving a different storage unit; the data queue comprises a plurality of rows and a plurality of columns, the data queue records the return data of a plurality of storage units received by the processing unit by executing writing operation on different rows of different columns, wherein each column of the data queue is used for storing the return data of a corresponding storage unit.
In one possible implementation, the queue module of the processing unit stores return data of a plurality of storage units, including: under the condition that the data queue receives the return data of any storage unit, column data corresponding to the storage unit is taken out from the return queue, and inversion operation is carried out to obtain first data, wherein the first data is used for representing the space information of the return data of the storage unit in the data queue; taking out a column of data recording the return sequence of the storage unit from the access queue as second data; performing AND operation on the first data and the second data to obtain third data; writing the return data to a write location indicated by the third data in a corresponding column of the storage unit of the data queue, and performing a set operation on the write location indicated by the third data in the corresponding column of the storage unit of the return queue.
In one possible implementation, the memory access method further includes: responding to the returned data to write into the data queue, and acquiring the first line data of the returned queue and the first line data of the access queue; under the condition that the first line data of the return queue is the same as the first line data of the access queue, the return data of the first line in the data queue is taken out to a processing unit; the data queue dequeues the return data of the first line, the access queue dequeues the first line data, and the return queue dequeues the first line data.
In a possible implementation manner, the access queue, the return queue and the data queue of the same processing unit have the same depth, wherein the depth represents the number of lines of the access queue, the return queue and the data queue, the depth is determined according to the concurrency of the processing unit, and the concurrency of the processing unit represents the number of times that the processing unit continuously sends request information to a storage unit without waiting for the return data of the request information; the bit width of the data queue is determined according to the number of the storage units and the bit width of the return data, the bit widths of the access queue and the return queue are the same, and the bit widths of the access queue and the return queue are determined according to the number of the storage units.
According to an aspect of the present disclosure, there is provided a memory access method applied to a memory access circuit including: the system comprises a plurality of processing units, a plurality of storage units and a plurality of scheduling units, wherein each processing unit comprises a queue module, an arbiter corresponding to each storage unit and a scheduling queue; the output end of any processing unit is respectively connected with the input end of each arbiter, and the output end of each arbiter is connected with the input end of a corresponding scheduling queue and the input end of a corresponding storage unit; the memory access method comprises the following steps: each processing unit sends the generated request information for accessing the storage unit to an arbiter of the storage unit indicated by the request information, wherein the queue module of the processing unit is used for recording the request sequence of the processing unit for accessing a plurality of storage units; the arbiter is configured to arbitrate request information from a plurality of processing units, send the request information to a storage unit in an arbitration order, and write processing unit identifications of the plurality of processing units to a dispatch queue in the arbitration order.
In a possible implementation, the processing unit is configured to generate a plurality of request information simultaneously, each request information being configured to access a different storage unit, where the plurality of request information generated simultaneously has the same request order.
In one possible implementation, the queue module includes an access queue, where the access queue is configured to record a request sequence of the processing unit for accessing the plurality of storage units, and the access queue includes a plurality of rows and a plurality of columns, where the access queue is configured to record a request sequence of the processing unit for accessing the plurality of storage units by performing a set operation on different rows of different columns, where each row corresponds to a request sequence that is the same, and each column is configured to record a request sequence of the processing unit for accessing a different storage unit.
According to an aspect of the present disclosure, there is provided an integrated circuit comprising a memory access circuit as described above.
According to an aspect of the present disclosure, there is provided an electronic device comprising a memory access circuit as described above.
According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.
The memory access circuit of the embodiment of the disclosure realizes that a plurality of processing units access different memory units (for example, the memory units are disposed in different scheduling modules), when the plurality of processing units access the memory units, the access sequence of the plurality of processing units can be recorded by scheduling the processing unit identifiers of the queue memory processing units, and when the memory units reply to the return data, the return data of the memory units can be accurately transmitted to each processing unit according to the arbitration sequence recorded by the scheduling queue.
In addition, in the memory access circuit of the embodiment of the disclosure, a queue module is respectively set for each processing unit, and in the stage of accessing the memory unit by the processing unit, the queue module is used for recording the request sequence of accessing the plurality of memory units by the processing unit where the queue module is located; in the stage that the processing unit receives the return data from the storage units, the queue module may be used to store the return data of a plurality of storage units, and the return data may be fetched from the queue module to the processing unit according to the recorded request order. By arranging the queue module, the returned data of the request information sent by the processing units are cached in the queue module, the returned data of the request information is waited to be initiated, and the order-preserving processing of the returned data is carried out in each processing unit, so that the waste of hardware resources is reduced, the return time of the returned data is reduced, and the flexibility and the expandability of the memory access circuit are improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the technical aspects of the disclosure.
Fig. 1 shows a schematic diagram of a memory access circuit according to an embodiment of the present disclosure.
Fig. 2 shows a schematic diagram of another memory access circuit according to an embodiment of the present disclosure.
Fig. 3 shows a schematic diagram of another memory access circuit according to an embodiment of the present disclosure.
Fig. 4 shows a schematic diagram of an access queue according to an embodiment of the present disclosure.
Fig. 5 shows a schematic diagram of a return queue according to an embodiment of the present disclosure.
FIG. 6 shows a schematic diagram of a data queue according to an embodiment of the present disclosure.
Fig. 7 shows a schematic diagram of an access order of access queue records according to an embodiment of the disclosure.
Fig. 8 shows a flow chart of a memory access method according to an embodiment of the present disclosure.
Fig. 9 shows a flowchart of another memory access method according to an embodiment of the present disclosure.
Detailed Description
Various exemplary embodiments, features and aspects of the disclosure will be described in detail below with reference to the drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Although various aspects of the embodiments are illustrated in the accompanying drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.
Furthermore, numerous specific details are set forth in the following detailed description in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements, and circuits well known to those skilled in the art have not been described in detail in order not to obscure the present disclosure.
Fig. 1 shows a schematic diagram of a memory access circuit for accessing a plurality of different memory cells 0, as shown in fig. 1, according to an embodiment of the present disclosure, the memory access circuit comprising: the processing units 1 each include a queue module 11, an arbiter 2 corresponding to each memory unit 0, a schedule queue 3, and a schedule selector 7.
The output end of any processing unit 1 is respectively connected with the input end of each arbiter 2, the output end of each arbiter 2 is connected with the input end of a corresponding scheduling queue 3 and the input end of a corresponding storage unit 0, the processing unit 1 is used for generating request information for accessing a plurality of storage units 0, a queue module 11 of the processing unit 1 is used for recording the request sequence of the processing unit 1 for accessing the plurality of storage units 0, the arbiter 2 is used for arbitrating the request information from the plurality of processing units 1, sending the request information to the storage units 0 according to the arbitration sequence, and writing the processing unit identifiers of the plurality of processing units 1 into the scheduling queue 3 according to the arbitration sequence.
The input end of any scheduling selector 7 is connected with the output end of a corresponding storage unit 0 and the output end of a corresponding scheduling queue 3, the output end of the scheduling selector 7 is respectively connected with the input end of each processing unit 1, the scheduling selector 7 is used for transmitting the return data of the storage unit 0 to the processing unit 1 indicated by the processing unit identification according to the processing unit identification read from the scheduling queue 3, the queue module 11 of the processing unit is used for storing the return data of a plurality of storage units 0, and the return data is taken out from the queue module 11 to the processing unit 1 according to the request sequence of the processing unit 1 for accessing a plurality of storage units 0.
In one possible implementation, the memory access circuitry of embodiments of the present disclosure may be integrated into a processor chip for accessing a plurality of different memory locations 0 within the processor chip.
Wherein the processor chip comprises, for example: a central processing unit (Central Processing Unit, CPU), a graphics Processor (Graphic Processing Unit, GPU), a General-purpose graphics processing unit (General-Purpose Computing on Graphics Processing Units, GPGPU), a Multi-Core Processor (Multi-Core Processor), a digital signal processing unit (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a tensor Processor (Tensor Processing Unit, TPU), a field programmable gate array (Field Programmable Gate Array, FPGA), or other programmable logic device, as not limited by the present disclosure.
The memory unit 0 may include a random access memory (Random Access Memory, RAM) disposed inside the processor chip, such as a Dynamic RAM (DRAM), a Static RAM (Static RAM), a Synchronous DRAM (SDRAM), a cache memory (Cached DRAM, CDRAM), an enhanced DRAM (Extended Data Out DRAM, EDRAM), etc., and the present disclosure does not limit the type of the memory unit 0.
Illustratively, the multiple processing units 1 may be multiple computing cores of a multi-core processor chip, and may access different memory units 0 in the multi-core processor chip, so as to improve the efficiency of multi-core parallel processing.
Alternatively, the multiple processing units 1 may be multithreaded modules disposed in the same computing core in the multi-core processor chip, and may access different memory units 0 (may be memory units located in the computing core or memory units located outside the computing core) in the multi-core processor chip, so as to improve the efficiency of multithreaded parallel processing.
Alternatively, the processor chip internally comprises a plurality of processing units 1 and a plurality of scheduling modules, and different storage units 0 are deployed in different scheduling modules, and the scheduling modules can be used for executing scheduling tasks (including operation data scheduling and operation program scheduling). In this case, the scheduling module may be configured to receive request information (e.g., an operation instruction) of the processing unit 1, and provide the processing unit 1 with required resources (return data) according to the request information.
In one possible implementation, a memory access circuit may be used to access M (M.gtoreq.1) different memory cells 0, the memory access circuit may include: n (N is larger than or equal to 1) processing units 1, wherein each processing unit 1 comprises an M arbiter 2, M scheduling queues 3 and M scheduling selectors 7, wherein the M arbiter 2, the M scheduling queues 3 and the M scheduling selectors 7 respectively correspond to M storage units 0. It should be understood that the number M of the storage units 0 and the number N of the processing units 1 are not limited in the embodiment of the present disclosure, and the values of M and N may be set according to the actual application scenario.
As shown in FIG. 1, M memory units 0 may be memory units 0_1 through 0_M, and N processing units 1 may be processing units 1_1 through 1_N, respectively.
The processing unit 1_1 is internally provided with a queue module 11_1, and the queue module 11_1 is used for recording the request sequence of the processing unit 1_1 for accessing the storage unit 0_1-the storage unit 0_M in an access stage; the queue module 11_1 is configured to store the return data from the storage units 0_1 to 0_M in a stage of receiving the return data, and fetch the return data from the queue module 11_1 to the processing unit 1_1 according to the request sequence recorded by the queue module 11_1.
A queue module 11_2 is arranged in the processing unit 1_2, and the queue module 11_2 is used for recording the request sequence of the processing unit 1_2 for accessing the storage unit 0_1 to the storage unit 0_M in an access stage; the queue module 11_2 is configured to store the return data from the storage units 0_1 to 0_M in a stage of receiving the return data, and fetch the return data from the queue module 11_2 to the processing unit 1_2 according to the request sequence recorded by the queue module 11_2.
Similarly, the processing unit 1_N is provided with a queue module 11_n, and the queue module 11_n is used for recording the request sequence of the processing unit 1_N for accessing the storage units 0_1-0_M in the access stage; the queue module 11_n is used for storing the return data from the storage units 0_1 to 0_M in the stage of receiving the return data, and retrieving the return data from the queue module 11_n to the processing unit 1_N according to the request sequence recorded by the queue module 11_n.
The M arbiters 2 can be respectively an arbiter 2_1 to an arbiter 2_M; the M scheduling queues 3 can be scheduling queues 3_1 to 3_M respectively; the M schedule selectors 7 may be schedule selectors 7_1 to 7_M, respectively. Wherein, the arbiter 2_1, the dispatch queue 3_1 and the dispatch selector 7_1 correspond to the memory cell 0_1; the arbiter 2_2, the dispatch queue 3_2, and the dispatch selector 7_2 correspond to the memory cell 0_2; with this, the arbiter 2_m, the schedule queue 3_M, and the schedule selector 7_M correspond to the storage unit 0_M.
As shown in fig. 1, the output of any processing unit 1 is connected to the input of each arbiter 2. For example, the output ends of the processing units 1_1 are respectively connected with the input ends of the arbiters 2_1-2_M; the output end of the processing unit 1_2 is respectively connected with the input end of the arbiter 2_1 to the input end of the arbiter 2_M; similarly, the output terminals of the processing unit 1_N are respectively connected to the input terminal of the arbiter 2_1 to the input terminal of the arbiter 2_m.
In an example, the processing unit 1_1 may generate request information to access any one of the storage units 0_1 to 0_M, which may include a schedule identifier for indicating the storage unit 0. For example, assume that there is a schedule flag 1~M, schedule flag 1 can be used to indicate memory location 0_1, schedule flag 2 can be used to indicate memory location 0_2, and so on, schedule flag M can be used to indicate memory location 0_M. By setting the scheduling identifier in the request information, the request information can be more accurately transmitted to the storage unit 0 indicated by the scheduling identifier.
In the example, assuming that the number M of memory cells 0 is 3, i.e., there are 3 memory cells 0, the schedule flag may be binary data having a bit width of 2 bits (bits), where the schedule flag is 00, indicating that memory cell 0_1 is accessed; the schedule flag is 01, indicating that the memory location 0_2 is accessed; schedule identifier 10 indicates access to storage unit 0_3. The present disclosure does not limit the bit width of the schedule identifier, and the bit width of the schedule identifier may be determined according to the number of the memory cells 0. For example, assuming that the number of storage units 0 is M and the bit width of the schedule flag is W, the bit width W of the schedule flag may be determined by solving the inequality 2^W +.M, where 2^W represents the W2 multiplications.
The request information may further include a request type (e.g., including a read request and/or a write request), an access address (e.g., an address of a read request, or an address of a write request), an enable signal (e.g., a signal that enables the current request information to have read and write rights of a certain memory location), etc., which is not particularly limited by the present disclosure.
In this case, if the request information generated by the processing unit 1_1 includes the schedule flag 1, the request information may be transmitted to the arbiter 2_1 corresponding to the memory unit 0_1 indicated by the schedule flag 1; if the request information generated by the processing unit 1_1 includes the schedule identifier 2, the request information may be sent to the arbiter 2_2 corresponding to the storage unit 0_2 indicated by the schedule identifier 2; similarly, if the request information generated by the processing unit 1_1 includes the schedule identifier M, the request information may be sent to the arbiter 2_m corresponding to the storage unit 0_M indicated by the schedule identifier M.
Similarly, the processing units 1_2 to 1_N may also generate request information for accessing any one of the storage units 0_1 to 0_M, and the processing units 1_2 to 1_N may send the generated request information including the scheduling identifier to the arbiter 2 corresponding to the storage unit 0 indicated by the scheduling identifier, which is not described herein.
As shown in fig. 1, the output terminal of the arbiter 2 is connected to the input terminal of a corresponding one of the dispatch queues 3 and the input terminal of a corresponding one of the memory units 0. For example, the output terminal of the arbiter 2_1 is connected to the input terminal of the dispatch queue 3_1 and the input terminal of the memory cell 0_1, respectively; the output end of the arbiter 2_2 is respectively connected with the input end of the scheduling queue 3_2 and the input end of the storage unit 0_2; with this, the output of the arbiter 2_m is connected to the input of the dispatch queue 3_M and the input of the memory 0_M, respectively.
In this way, the arbiter 2_1 may perform round robin arbitration (round robin) on different request information from the processing units 1_1 to 1_N accessing the storage unit 0_1, select one target request information of the current round from the N request information, send the selected target request information of the current round to the storage unit 0_1, and write the identification information of the processing unit to which the target request information of the current round belongs, that is, the processing unit identification, into the dispatch queue 3_1. Since the dispatch queue 3_1 can record one processing unit identifier in each round of arbitration, the dispatch queue 3_1 can record the arbitration sequence of the arbiter 2_1, that is, the access sequence of different processing units accessing the same memory unit 0_1 after multiple rounds of arbitration.
The mechanism for scheduling the writing of the queue 3_1 is as follows: when the arbiter 2_1 selects one of the processing unit identifiers, the writing into the dispatch queue 3_1 is performed. For example, the arbiter 2_1 may continue to poll for request information from the processing units 1_1-1_N, and the first round of selecting request information from the processing unit 1_N may write the processing unit identifier N for indicating the self-processing unit 1_N to the dispatch queue 3_1; the second round selects the request information from the processing unit 1_2, the processing unit identification 2 for indicating the self-processing unit 1_2 can be written into the dispatch queue 3_1; the third round selects the request information from the processing unit 1_1, and the processing unit identification 1 for indicating the self-processing unit 1_1 can be written into the dispatch queue 3_1. In this case, the schedule queue 3_1 sequentially stores the processing unit identifier N, the processing unit identifier 2, and the processing unit identifier 1 in the direction from the head of the queue to the tail of the queue in the arbitration order of the arbiter 2_1 (for example, the processing unit 1_N indicated by the processing unit identifier N accesses the storage unit 0_1 first, the processing unit 1_2 indicated by the processing unit identifier 2 accesses the storage unit 0_1, and then the processing unit 1_1 indicated by the processing unit identifier 1 accesses the storage unit 0_1).
Similarly, the arbiters 2_2 to 2_m may also perform round robin arbitration (round robin) on the request information from the processing units 1_1 to 1_N, respectively send the request information to the corresponding storage units 0_2 to 0_M according to the respective arbitration orders, and respectively write the processing unit identifications into the corresponding dispatch queues 3_2 to 3_M according to the respective arbitration orders, which will not be described herein.
In this way, when a plurality of different processing units 1 access the same memory unit 0, the order of accessing the memory unit 0 can be determined in an arbitration order by the corresponding arbiter 2, and the arbitration order can be recorded by the schedule queue 3.
As shown in fig. 1, an input end of any one of the schedule selectors 7 is connected to an output end of a corresponding storage unit 0 and an output end of a corresponding schedule queue 3, and an output end of any one of the schedule selectors 7 is connected to input ends of a plurality of the processing units 1, respectively.
For example, the output end of the storage unit 0_1 and the output end of the dispatch queue 3_1 are respectively connected to the input end of the dispatch selector 7_1, and the output end of the dispatch selector 7_1 is respectively connected to the input end of the processing unit 1_1 to the input end of the processing unit 1_N in a serial manner; the output end of the storage unit 0_2 and the output end of the scheduling queue 3_2 are respectively connected with the input end of the scheduling selector 7_2, and the output end of the scheduling selector 7_2 is respectively connected with the input end of the processing unit 1_1-the input end of the processing unit 1_N in a one-to-many mode; and so on, the output end of the storage unit 0_M and the output end of the dispatch queue 3_M are respectively connected to the input end of the dispatch selector 7_M, and the output end of the dispatch selector 7_M is respectively connected to the input end of the processing unit 1_1 to the input end of the processing unit 1_N in a serial manner.
Thus, when the storage unit 0_1 transmits the return data to the schedule selector 7_1 in response to the request information, the schedule selector 7_1 can transmit the received return data to the processing unit 1 indicated by the processing unit identification read out from the head of the schedule queue 3_1.
For example, it is assumed that the schedule queue 3_1 stores the processing unit id N, the processing unit id 2, and the processing unit id 1 in this order in the direction from the head of the queue to the tail of the queue. The schedule selector 7_1 receives the return data transmitted from the storage unit 0_1, reads the processing unit identifier N from the head of the schedule queue 3_1, and transmits the received return data to the processing unit 1_N indicated by the processing unit identifier N. At this time, in the dispatch queue 3_1, the processing unit id N dequeues, and the processing unit id 2 becomes the head of the dispatch queue 3_1.
The dispatch selector 7_1 receives the return data sent from the storage unit 0_1 again, and can continue to read the processing unit identifier 2 from the head of the dispatch queue 3_1, and send the received return data to the processing unit 1_2 indicated by the processing unit identifier 2. At this time, in the dispatch queue 3_1, the processing unit id 2 dequeues, and the processing unit id 1 becomes the head of the dispatch queue 3_1.
Then, the schedule selector 7_1 receives the return data sent from the storage unit 0_1 again, and can continue to read the processing unit identifier 1 from the head of the schedule queue 3_1, and send the received return data to the processing unit 1_1 indicated by the processing unit identifier 1.
Similarly, when the storage units 0_2 to 0_M respectively send the return data to the corresponding schedule selectors 7_2 to 7_M in response to the respective request information, the schedule selectors 7_2 to 7_M may send the received return data to the processing unit 1 indicated by the processing unit identifier according to the processing unit identifier read out from the head of the corresponding schedule queue 3_2 to 3_M, which is not described herein.
In response to receipt of the return data, each processing unit 1 may store the return data in the queue module 11 of each processing unit 1, and fetch the return data from the queue module 11 to the processing unit and from the data queue 6 to the processing unit 1 according to the indication of the order of requests recorded by itself.
In the stage of accessing the processing unit 1_1 to the storage unit 0_1 through the storage unit 0_M, the queue module 11_1 of the processing unit 1_1 records the access sequence of accessing the processing unit 1_1 to the storage unit 0_1 through the storage unit 0_M. In the stage that the processing unit 1_1 receives the return data from the storage units 0_1 to 0_M, the processing unit 1_1 buffers the received return data from the storage units 0_1 to 0_M to the queue module 11_1, and then the queue module 11_1 fetches the return data from the queue module 11_1 to the processing unit 1_1 according to the recorded access sequence, so as to perform subsequent processing on the return data.
Thus, when the processing unit 1_1 receives the returned data of the request information sent later, the processing unit 1_1 continuously buffers the returned data of the request information sent later in the queue module 11_1, and waits for the returned data of the request information to be initiated earlier. The return data of the request information sent later is not fetched to the processing unit 1_1 until the return data of the request information initiated earlier has been fetched to the processing unit 1_1 from the queue module 11_1.
For example, it is assumed that the processing unit 1_1 records the order in which the processing unit 1_1 first transmits the request information for accessing the storage unit 0_1, and then transmits the request information for accessing the storage unit 0_2. Thus, when the processing unit 1_1 receives the return data from the storage unit 0_2, the queue module 11_1 of the processing unit 1_1 determines whether the return data from the storage unit 0_1 has been fetched to the processing unit 1_1, and if the queue module 11_1 does not fetch the return data from the storage unit 0_1 to the processing unit 1_1, the queue module 11_1 continues to cache the return data from the storage unit 0_2 until the queue module 11_1 has fetched the return data from the storage unit 0_1 to the processing unit 1_1, and if the queue module 11_1 has fetched the return data from the storage unit 0_1 to the processing unit 1_1, the queue module 11_1 continues to fetch the return data from the storage unit 0_1 to the processing unit 1_1.
It should be understood that the functions of the queue module 11_2 of the processing unit 1_2 to the queue module 11_n of the processing unit 1_N are the same as those of the queue module 11_1 of the processing unit 1_1, and will not be described herein.
In the memory access circuit of the embodiment of the present disclosure, it is achieved that a plurality of processing units 1 access different memory units 0 (for example, including memory units disposed in different scheduling modules), when the plurality of processing units 1 access the memory units 0, the access order of the plurality of processing units 1 may be recorded by storing the processing unit identifiers of the processing units 1 by the scheduling queue 3, and when the memory units 0 reply to the return data, the return data of the memory units 0 may be accurately transmitted to each processing unit 1 according to the arbitration order recorded by the scheduling queue 3.
In the memory access circuit according to the embodiment of the present disclosure, a queue module 11 is provided for each processing unit 1, and in the stage of accessing the processing unit 1 to the storage unit 0, the queue module 11 is used to record the request sequence of accessing the processing unit 1 where the queue module 11 is located to access the plurality of storage units 0; in the stage where the processing unit 1 receives the return data from the storage unit 0, the queue module 11 may be used to store the return data of a plurality of storage units 0, and the return data may be fetched from the queue module 11 to the processing unit 1 according to the recorded request order. By arranging the queue module 11, the returned data of the request information sent by the processing units 1 can be cached in the queue module 11 first, the returned data of the request information is waited for being initiated first, and the order-preserving processing of the returned data is carried out in each processing unit 1, so that the waste of hardware resources is reduced, the return time of the returned data is reduced, and the flexibility and the expandability of the memory access circuit are improved.
In a possible implementation, the processing unit 1 is configured to generate a plurality of request information simultaneously, each request information being configured to access a different storage unit 0, where the plurality of request information generated simultaneously has the same request sequence; and/or receiving a plurality of return data simultaneously, each return data from a different storage unit 0, wherein the queue module of the processing unit is used for storing the return data of the plurality of storage units 0 simultaneously.
In an example, the processing unit 1_1 may generate the request information of the access storage unit 0_1 to the storage unit 0_M at a time, the processing unit 1_1 may send the request information of the access storage unit 0_1 to the arbiter 2_1, send the request information of the access storage unit 0_2 to the arbiter 2_2, and so on, send the request information of the access storage unit 0_M to the arbiter 2_m, and the processing unit 1_1 may write the plurality of storage units 0 accessed for the time to the queue module 11_1 synchronously according to the indication of the schedule identifier in the request information. Thus, after a plurality of times of recording, the queue module 11_1 can record the request sequence of the processing unit 1_1 for accessing the plurality of storage units 0, the plurality of request information of the plurality of storage units 0 accessed by the processing unit 1_1 each time have the same request sequence, and the request information of the storage units 0 accessed by the processing unit 1_1 different times have different request sequences.
The processing unit 1_2 may generate the request information of the access storage units 0_1 to 0_M at the same time each time, the processing unit 1_2 may send the request information of the access storage unit 0_1 to the arbiter 2_1, send the request information of the access storage unit 0_2 to the arbiter 2_2, and so on, send the request information of the access storage unit 0_M to the arbiter 2_m, and the processing unit 1_2 may write the multiple storage units 0 accessed for the time into the queue module 11_2 synchronously according to the instruction of the schedule identifier in the request information. Thus, after a plurality of times of recording, the queue module 11_2 can record the request sequence of the processing unit 1_2 accessing the plurality of storage units 0, the plurality of request information of the plurality of storage units 0 accessed by the processing unit 1_2 each time have the same request sequence, and the request information of the storage units 0 accessed by the processing unit 1_2 different times have different request sequences.
Similarly, the processing unit 1_N can generate the request information of the access storage units 0_1 to 0_M at the same time, the processing unit 1_N can send the request information of the access storage unit 0_1 to the arbiter 2_1, send the request information of the access storage unit 0_2 to the arbiter 2_2, and so on, send the request information of the access storage unit 0_M to the arbiter 2_m, and the processing unit 1_N can write the plurality of storage units 0 accessed at the same time to the queue module 11_n according to the indication of the schedule identifier in the request information. Thus, after multiple times of recording, the queue module 11_n may record the request order of the processing unit 1_N accessing the multiple storage units 0, where the multiple request information of the multiple storage units 0 accessed by the processing unit 1_N each time have the same request order, and the request information of the storage units 0 accessed by the processing unit 1_N different times have different request orders.
In an example, the processing unit 1_1 may simultaneously receive the return data from the storage units 0_1-0_M, and the queue module 11_1 of the processing unit 1_1 may simultaneously store the return data from the storage units 0_1-0_M in response to the processing unit 1_1 simultaneously receiving the return data from the storage units 0_1-0_M.
The processing unit 1_2 may also receive the return data from the storage units 0_1 to 0_M at the same time, and the queue module 11_2 of the processing unit 1_2 may store the return data from the storage units 0_1 to 0_M at the same time in response to the processing unit 1_2 receiving the return data from the storage units 0_1 to 0_M at the same time.
In this category, the processing unit 1_N can simultaneously receive the return data from the storage units 0_1 through 0_M, and the queue module 11_n of the processing unit 1_N can simultaneously store the return data from the storage units 0_1 through 0_M in response to the processing unit 1_N simultaneously receiving the return data from the storage units 0_1 through 0_M.
In this way, in the memory access circuit of the embodiment of the present disclosure, it may be realized that a plurality of processing units 1 access different memory units 0 (for example, including memory units disposed in different scheduling modules) at the same time, when the plurality of processing units 1 access the memory units 0, the access sequence of the plurality of processing units 1 may be recorded by storing the processing unit identifiers of the processing units 1 by the scheduling queue 3, and when the memory units 0 reply to the return data, the return data of the memory units 0 may be accurately transmitted to each processing unit 1 according to the arbitration sequence recorded by the scheduling queue 3.
In the memory access circuit according to the embodiment of the present disclosure, a queue module 11 is provided for each processing unit 1, and in the stage of accessing the processing unit 1 to the storage unit 0, the queue module 11 is used to record the request sequence of accessing the processing unit 1 where the queue module 11 is located to access the plurality of storage units 0; in the stage where the processing unit 1 receives the return data from the storage units 0, the queue module 11 may be used to store the return data of a plurality of storage units 0 at the same time, and the return data may be fetched from the queue module 11 to the processing unit 1 according to the recorded request order. By arranging the queue module 11, the returned data of the request information sent by the processing units 1 can be cached in the queue module 11 first, the returned data of the request information is waited for to be initiated first, and the order-preserving processing of the returned data is carried out in each processing unit 1, so that each processing unit 1 is facilitated to receive the returned data of each storage unit 0 at the same time, and the waste of hardware resources, the time delay of transmission and the beat operation (the number of clock cycles of signal delay) are reduced.
Fig. 2 shows a schematic diagram of a memory access circuit for accessing a plurality of different memory cells 0, as shown in fig. 2, according to an embodiment of the present disclosure, the memory access circuit comprising: a plurality of processing units 1, each processing unit 1 comprising a queue module 11, said queue module 11 comprising an access queue 4, a return queue 5, a data queue 6, one arbiter 2, one dispatch queue 3, one dispatch selector 7 corresponding to each storage unit 0.
The output end of any processing unit 1 is respectively connected with the input end of each arbiter 2, the output end of each arbiter 2 is connected with the input end of a corresponding scheduling queue 3 and the input end of a corresponding storage unit 0, the processing unit 1 is used for generating request information for accessing a plurality of storage units 0, the access queue 4 of the processing unit 1 is used for recording the request sequence of the processing unit 1 for accessing the plurality of storage units 0, the arbiter 2 is used for arbitrating the request information from the plurality of processing units 1, sending the request information to the storage units 0 according to the arbitration sequence, and writing the processing unit identifiers of the plurality of processing units 1 into the scheduling queue 3 according to the arbitration sequence.
The input end of any scheduling selector 7 is connected with the output end of a corresponding storage unit 0 and the output end of a corresponding scheduling queue 3, the output end of the scheduling selector 7 is respectively connected with the input end of each processing unit 1, the scheduling selector 7 is used for transmitting the return data of the storage unit 0 to the processing unit 1 indicated by the processing unit identification according to the processing unit identification read from the scheduling queue 3, the return queue 5 of the processing unit 1 is used for recording the return sequence of the return data of a plurality of storage units 0, the data queue 6 of the processing unit 1 is used for storing the return data of a plurality of storage units 0 according to the request sequence and the indication of the return sequence, and the return data is fetched from the data queue 6 to the processing unit 1.
As shown in fig. 2, an access queue 4_1, a return queue 5_1, and a data queue 6_1 are provided in the processing unit 1_1; the access queue 4_1 is used for recording the request sequence of the processing unit 1_1 for accessing the storage units 0_1 to 0_M, the return queue 5_1 is used for recording the return sequence of the return data of the storage units 0_1 to 0_M to the processing unit 1_1, and the data queue 6_1 is used for storing the return data of the storage units 0_1 to 0_M to the processing unit 1_1.
An access queue 4_2, a return queue 5_2 and a data queue 6_2 are arranged in the processing unit 1_2; the access queue 4_2 is used for recording the request sequence of the processing unit 1_2 to access the storage units 0_1 to 0_M, the return queue 5_2 is used for recording the return sequence of the return data of the storage units 0_1 to 0_M to the processing unit 1_2, and the data queue 6_2 is used for storing the return data of the storage units 0_1 to 0_M to the processing unit 1_2.
Similarly, the processing unit 1_N is provided with an access queue 4_N, a return queue 5_n, and a data queue 6_n. The access queue 4_N is used for recording the request sequence of the processing unit 1_N for accessing the storage units 0_1 to 0_M, the return queue 5_n is used for recording the return sequence of the return data of the storage units 0_1 to 0_M to the processing unit 1_N, and the data queue 6_n is used for storing the return data of the storage units 0_1 to 0_M to the processing unit 1_N.
In the stage of accessing the storage unit 0 by each processing unit 1, the access queue 4 of each processing unit 1 may record the request sequence of accessing the storage unit 0_1 to the storage unit 0_M by the processing unit 1 to which each processing unit 1 belongs.
In an example, each time the processing unit 1_1 may simultaneously generate the request information of the access storage unit 0_1 to the storage unit 0_M, the processing unit 1_1 may send the request information of the access storage unit 0_1 to the arbiter 2_1, send the request information of the access storage unit 0_2 to the arbiter 2_2, and so on, send the request information of the access storage unit 0_M to the arbiter 2_m, and the processing unit 1_1 may write the plurality of storage units 0 accessed for this time to the access queue 4_1 in synchronization according to the indication of the schedule identifier in the request information. Thus, the access queue 4_1 can record the order of requests by the processing unit 1_1 to access the plurality of memory units 0 after a plurality of times of recording.
The processing unit 1_2 may generate the request information of the access storage units 0_1 to 0_M at the same time, the processing unit 1_2 may send the request information of the access storage unit 0_1 to the arbiter 2_1, send the request information of the access storage unit 0_2 to the arbiter 2_2, and so on, send the request information of the access storage unit 0_M to the arbiter 2_m, and the processing unit 1_2 may write the plurality of storage units 0 accessed for the time to the access queue 4_2 synchronously according to the indication of the schedule identifier in the request information. Thus, the access queue 4_2 can record the order of requests by the processing unit 1_2 to access the plurality of memory units 0 after a plurality of times of recording.
Similarly, each time the processing unit 1_N can simultaneously generate the request information of the access storage units 0_1 to 0_M, the processing unit 1_N can send the request information of the access storage unit 0_1 to the arbiter 2_1, send the request information of the access storage unit 0_2 to the arbiter 2_2, and so on, send the request information of the access storage unit 0_M to the arbiter 2_m, and the processing unit 1_N can synchronously write the plurality of storage units 0 accessed for this time to the access queue 4_N according to the indication of the schedule identifier in the request information. Thus, after a plurality of records, the access queue 4_N can record the order of requests by the processing unit 1_N to access the plurality of storage units 0.
In the stage where each processing unit 1 receives the return data, each processing unit 1, in response to receiving the return data, the data queue 6 of each processing unit 1 may store the return data of each storage unit 0 according to the order of requests recorded by the access queue 4 and the instruction of the return order recorded by the return queue 5, and fetch the return data from the data queue 6 to the processing unit 1.
For example, each time the processing unit 1_1 receives return data, the data queue 6_1 of the processing unit 1_1 may store the return data of the storage units 0_1 to 0_M according to the order of requests recorded by the access queue 4_1 and the order of returns recorded by the return queue 5_1, and fetch the return data from the data queue 6_1 to the processing unit 1_1 in the case where the order of requests recorded by the head line of the access queue 4_1 and the order of returns recorded by the head line of the return queue 5_1 are the same.
The processing unit 1_2 may store the returned data of the storage units 0_1 to 0_M according to the order of the requests recorded by the access queue 4_2 and the order of the returns recorded by the return queue 5_2 by the data queue 6_2 of the processing unit 1_2 each time the returned data is received, and fetch the returned data from the data queue 6_2 to the processing unit 1_2 in the case where the order of the requests recorded by the head line of the access queue 4_2 and the order of the returns recorded by the head line of the return queue 5_2 are the same.
Similarly, the processing unit 1_N can store the returned data of the storage units 0_1 to 0_M according to the request sequence recorded by the access queue 4_N and the return sequence recorded by the return queue 5_n each time the returned data is received by the processing unit 1_N, and fetch the returned data from the data queue 6_n to the processing unit 1_N in the case where the request sequence recorded by the head line of the access queue 4_N and the return sequence recorded by the head line of the return queue 5_n are identical.
In this way, in the memory access circuit of the embodiment of the present disclosure, the access queue 4, the return queue 5 and the data queue 6 are respectively set for each processing unit 1, the access queue 4 is used to record the request sequence of accessing the plurality of storage units 0 by the processing unit 1 where the processing unit 1 is located, the return queue 5 is used to record the return sequence of the return data of the plurality of storage units 0 to the processing unit 1 where the processing unit 1 is located, the data queue 6 is used to store the return data of the plurality of storage units 0 to the processing unit 1 where the processing unit 1 is located, and through the cooperation of the access queue 4, the return queue 5 and the data queue 6, the return data of the request information sent by the processing unit 1 is buffered in the data queue 6, waiting for the return data of the request information to be sent first, and the order-preserving processing of the return data is performed in each processing unit 1, so that the return data of each storage unit 0 is received at the same time, and the waste of hardware resources, the time delay of transmission, and the beat operation (the number of clock cycles of signal delay) are reduced.
The memory access circuit of the embodiment of the present disclosure will be described below with the number of memory cells 0 and the number of processing cells 1 each being 3 as an example. It should be understood that the number of the storage units 0 and the processing units 1 is not particularly limited in this disclosure, and may be set according to an actual application scenario.
Fig. 3 shows a schematic diagram of another memory access circuit according to an embodiment of the present disclosure, the memory access circuit shown in fig. 3 being used to address three different processing units 1 accessing memory unit 0 in three different scheduling modules. The three memory cells 0 may be respectively: a storage unit 0_1 disposed in the schedule module 1, a storage unit 0_2 disposed in the schedule module 2, and a storage unit 0_3 disposed in the schedule module 3. The three processing units 1 may be respectively: a processing unit 1_1, a processing unit 1_2, and a processing unit 1_3.
Wherein, the processing unit 1_1 is internally provided with an access queue 4_1, a return queue 5_1 and a data queue 6_1. The access queue 4_1 is used for recording the request sequence of the processing unit 1_1 for accessing the storage units 0_1 to 0_3, the return queue 5_1 is used for recording the return sequence of the return data of the storage units 0_1 to 0_3 to the processing unit 1_1, and the data queue 6_1 is used for storing the return data of the storage units 0_1 to 0_3 to the processing unit 1_1.
In a possible implementation manner, the access queue 4_1 includes a plurality of rows and a plurality of columns, the access queue 4_1 records the request sequence of the processing unit 1_1 for accessing the plurality of storage units 0 by performing a set operation (for example, a set 1 operation) on different rows of different columns, where the request sequence corresponding to each row of the access queue 4_1 is the same, and each column of the access queue 4_1 is used for recording the request sequence of the processing unit 1_1 for accessing one different storage unit 0.
By way of example, fig. 4 shows a schematic diagram of an access queue 4_1 according to an embodiment of the present disclosure, as shown in fig. 4, the access queue 4_1 may be a 2-row 3-column queue, the access queue 4_1 including 2×3 cells, each cell storing 1 bit (bit) of information. That is, the bit width of each cell is 1 bit, and the bit width of each access queue 4 is 3 bits. It should be understood that, in fig. 4, only the access queue 4_1 is taken as an example with 2 rows and 3 columns, the number of rows and columns of the access queue 4_1 are not limited in this disclosure, and may be set according to an actual application scenario. Wherein the number of columns of the access queue 4_1 is the same as the number of the memory units 0, and any column of the access queue 4_1 can record the request sequence of the processing unit 1_1 for accessing a different memory unit 0; the number of lines of the access queue 4_1 is related to the concurrency of the processing unit 1_1 (i.e. the number of times that the request information can be issued at most without waiting for the return data of the request information, and a plurality of request information can be issued at the same time), and the more the concurrency of the processing unit 1_1, the more the access queue 4_1 will have.
As shown in fig. 4, in the access queue 4_1, the 1 st column is for the request order of the recording process unit 1_1 to access the storage unit 0_1, the 2 nd column is for the request order of the recording process unit 1_1 to access the storage unit 0_2, and the 3 rd column is for the request order of the recording process unit 1_1 to access the storage unit 0_3.
By accessing the queue 4, the order of requests by their corresponding processing units 1 to access the plurality of memory units 0 can be more efficiently and accurately recorded.
In a possible implementation manner, the return queue 5_1 includes a plurality of rows and a plurality of columns, the return queue 5_1 records a return sequence of the processing unit 1_1 receiving return data of a plurality of storage units 0 by performing a set operation (for example, a set 1 operation) on different rows of different columns, where the return sequence corresponding to each row of the return queue 5_1 is the same, and each column of the return queue 5_1 is used to record a return sequence of the processing unit 1_1 receiving a different storage unit 0.
By way of example, fig. 5 shows a schematic diagram of a return queue 5_1 according to an embodiment of the present disclosure, as shown in fig. 5, the return queue 5_1 may be a 2-row 3-column queue, the return queue 5_1 including 2×3 cells, each cell storing 1 bit (bit) of information. That is, each cell has a bit width of 1 bit and the return queue has a bit width of 3 bits. It should be understood that, in fig. 5, only the return queue 5_1 is taken as an example with 2 rows and 3 columns, and the number of rows and columns of the return queue 5_1 are not limited in this disclosure, and may be set according to an actual application scenario. Wherein the number of columns of the return queue 5_1 is the same as the number of the storage units 0, and any column of the return queue 5_1 can record the return sequence of the return data transmitted from a different storage unit 0 to the processing unit 1_1; the number of lines of the return queue 5_1 is related to the concurrency of the processing unit 1_1 (i.e. the number of times that the request information can be issued at most without waiting for the return data of the request information, and a plurality of request information can be issued at the same time), and the more the concurrency of the processing unit 1_1, the more the number of lines the return queue 5_1 will have.
As shown in fig. 5, the return queue 5_1 may be used to record the return order of the return data of the storage units 0_1 to 0_3, wherein the 1 st column may be used to record the return order of the return data of the storage unit 0_1, the 2 nd column may be used to record the return order of the return data of the storage unit 0_2, and the 3 rd column may be used to record the return order of the return data of the storage unit 0_3.
In a possible implementation, the data queue 6_1 includes a plurality of rows and a plurality of columns, and the data queue 6_1 records that the processing unit 1_1 receives the return data of a plurality of memory cells 0 by performing the write operation on different rows of different columns, wherein each column is used to store the return data of a corresponding memory cell 0.
By way of example, fig. 6 shows a schematic diagram of a data queue 6_1 according to an embodiment of the present disclosure, as shown in fig. 6, the data queue 6_1 may be a 2-row 3-column queue, and the data queue 6_1 includes 2×3 cells, each of which may store 30 bits (bits) of information. That is, the bit width of each cell is 30 bits, and the bit width of each data queue is 90 bits. It should be understood that, in fig. 6, only the data queue 6_1 is taken as an example, the number of rows, columns and bits of each unit of the data queue 6_1 are not limited in this disclosure, and may be set according to an actual application scenario. Wherein the number of columns of the data queue 6_1 is the same as the number of the memory units 0, and any column of the data queue 6_1 can store return data transmitted from a different memory unit 0 to the processing unit 1_1; the number of lines of the data queue 6_1 is related to the concurrency of the processing unit 1_1 (i.e. the number of times that the request information can be issued at most without waiting for the return data of the request information, and a plurality of request information can be issued at the same time), and the more the concurrency of the processing unit 1_1, the more the number of lines the data queue 6_1 will have.
As shown in fig. 6, the data queue 6_1 may be used to store the return data of the storage units 0_1 to 0_3, wherein the 1 st column may be used to record the return data of the storage unit 0_1, the 2 nd column may be used to record the return data of the storage unit 0_2, and the 3 rd column may be used to record the return data of the storage unit 0_3.
Similarly, the processing unit 1_2 is provided therein with an access queue 4_2, a return queue 5_2, and a data queue 6_2. The access queue 4_2 is used for recording the request sequence of the processing unit 1_2 to access the storage units 0_1 to 0_3, the return queue 5_2 is used for recording the return sequence of the return data of the storage units 0_1 to 0_3 to the processing unit 1_2, and the data queue 6_2 is used for storing the return data of the storage units 0_1 to 0_3 to the processing unit 1_2. An access queue 4_3, a return queue 5_3, and a data queue 6_3 are provided in the processing unit 1_3. The access queue 4_3 is used for recording the request sequence of the processing unit 1_3 to access the storage units 0_1 to 0_3, the return queue 5_3 is used for recording the return sequence of the return data of the storage units 0_1 to 0_3 to the processing unit 1_3, and the data queue 6_3 is used for storing the return data of the storage units 0_1 to 0_3 to the processing unit 1_3. For specific reference, the above description of the access queue 4_1, the return queue 5_1, and the data queue 6_1 of the processing unit 1_1 is omitted here.
In a possible implementation manner, the access queue 4, the return queue 5 and the data queue 6 of the same processing unit 1 have the same depth, the depth represents the number of lines of the access queue 4, the return queue 5 and the data queue 6, the depth is determined according to the concurrency of the processing unit 1, and the concurrency of the processing unit 1 represents the number of times that the processing unit 1 continuously sends request information to the storage unit 0 without waiting for the return data of the request information; the bit width of the data queue 6 is determined according to the number of the memory units 0 and the bit width of the return data (for example, the bit width of the data queue 6 is the product of the number of the memory units 0 and the bit width of the return data), the bit widths of the access queue 4 and the return queue 5 are the same, and the bit widths of the access queue 4 and the return queue 5 are determined according to the number of the memory units 0 (for example, the number of the memory units 0 is taken as the bit widths of the access queue 4 and the return queue 5).
In this way, the depth and width of the access queue 4, the return queue 5 and the data queue 6 can be set more flexibly, so as to be suitable for more application scenes.
As shown in fig. 3, the output ends of the processing units 1_1 may be connected to the input ends of the arbiters 2_1 to 2_3 in a serial manner; the output end of the processing unit 1_2 can be respectively connected with the input end of the arbiter 2_1 to the input end of the arbiter 2_3 in a serial manner; the output terminals of the processing unit 1_3 may be connected to the input terminal of the arbiter 2_1 to the input terminal of the arbiter 2_3 in a serial manner.
For example, the processing unit 1_1 may initiate the request information twice, respectively, assuming that the request information of the access storage unit 0_1 to the storage unit 0_3 is generated in synchronization at the first time (for example, at time T0), the processing unit 1_1 may send the request information of the access storage unit 0_1 to the arbiter 2_1, the request information of the access storage unit 0_2 to the arbiter 2_2, the request information of the access storage unit 0_3 to the arbiter 2_3, and write 111 to the first line of the access queue 4_1.
Assuming that the request information of the access memory unit 0_1 and the memory unit 0_2 is generated in synchronization for the first time (for example, at the time T0), the processing unit 1_1 may transmit the request information of the access memory unit 0_1 to the arbiter 2_1, transmit the request information of the access memory unit 0_2 to the arbiter 2_2, and write 011 to the first line of the access queue 4_1.
Assuming that the request information of the access storage unit 0_1 and the storage unit 0_3 is generated in synchronization for the first time (for example, at time T0), the processing unit 1_1 may transmit the request information of the access storage unit 0_1 to the arbiter 2_1, transmit the request information of the access storage unit 0_3 to the arbiter 2_3, and write 101 to the first line of the access queue 4_1.
Assuming that the request information of the access storage unit 0_2 and the storage unit 0_3 is generated in synchronization for the first time (for example, at time T0), the processing unit 1_1 may transmit the request information of the access storage unit 0_2 to the arbiter 2_2, transmit the request information of the access storage unit 0_3 to the arbiter 2_3, and write 110 to the first line of the access queue 4_1.
Assuming that the request information for accessing the memory unit 0_1 is generated for the first time (for example, at time T0), the processing unit 1_1 may transmit the request information for accessing the memory unit 0_1 to the arbiter 2_1 and write 001 to the first line of the access queue 4_1.
Assuming that the request information for accessing the memory unit 0_2 is generated for the first time (for example, at time T0), the processing unit 1_1 may transmit the request information for accessing the memory unit 0_2 to the arbiter 2_2 and write 010 to the first line of the access queue 4_1.
Assuming that the request information of the access storage unit 0_3 is generated in synchronization for the first time (for example, at time T0), the processing unit 1_1 can send the request information of the access storage unit 0_3 to the arbiter 2_3 and write 100 to the first line of the access queue 4_1.
Assuming that the request information of the access memory unit 0_1 to the memory unit 0_3 is generated in synchronization at the second time (for example, at time T1), the processing unit 1_1 may transmit the request information of the access memory unit 0_1 to the arbiter 2_1, transmit the request information of the access memory unit 0_2 to the arbiter 2_2, transmit the request information of the access memory unit 0_3 to the arbiter 2_3, and write 111 to the second line of the access queue 4_1.
Assuming that the request information of the access memory unit 0_1 and the memory unit 0_2 is generated in synchronization at the second time (for example, at the time T1), the processing unit 1_1 may transmit the request information of the access memory unit 0_1 to the arbiter 2_1, transmit the request information of the access memory unit 0_2 to the arbiter 2_2, and write 011 to the second line of the access queue 4_1.
Assuming that the request information of the access storage unit 0_1 and the storage unit 0_3 is generated in synchronization at the second time (for example, at the time T1), the processing unit 1_1 may transmit the request information of the access storage unit 0_1 to the arbiter 2_1, transmit the request information of the access storage unit 0_3 to the arbiter 2_3, and write 101 to the second line of the access queue 4_1.
Assuming that the request information of the access storage unit 0_2 and the storage unit 0_3 is generated in synchronization at the second time (for example, at time T1), the processing unit 1_1 may transmit the request information of the access storage unit 0_2 to the arbiter 2_2, transmit the request information of the access storage unit 0_3 to the arbiter 2_3, and write 110 to the second line of the access queue 4_1.
Assuming that the request information for accessing the memory unit 0_1 is generated a second time (for example, at time T1), the processing unit 1_1 may transmit the request information for accessing the memory unit 0_1 to the arbiter 2_1 and write 001 to the second line of the access queue 4_1.
Assuming that the request information for accessing the memory unit 0_2 is generated a second time (for example, at time T1), the processing unit 1_1 may transmit the request information for accessing the memory unit 0_2 to the arbiter 2_2 and write 010 to the second line of the access queue 4_1.
Assuming that the request information of the access storage unit 0_3 is generated in synchronization at the second time (e.g., at time T1), the processing unit 1_1 can send the request information of the access storage unit 0_3 to the arbiter 2_3 and write 100 to the second line of the access queue 4_1.
It can be seen that the access queue 4_1 of the processing unit 1_1 is used to record the order of requests by the processing unit 1_1 to access the plurality of memory units 0. The processing unit 1_1 can synchronously generate request information for accessing the plurality of memory units 0 at a time, and each time the processing unit 1_1 transmits the request information for accessing the plurality of memory units 0 to the arbiter 2 of the corresponding memory unit 0, the processing unit 1 synchronously writes the plurality of memory units 0 accessed at the time into the access queue 4_1.
Similarly, the access queue 4_2 of the processing unit 1_2 is used to record the order of requests by the processing unit 1_2 to access the plurality of memory units 0. The processing unit 1_2 can synchronously generate request information for accessing the plurality of memory units 0 at a time, and each time the processing unit 1_2 transmits the request information for accessing the plurality of memory units 0 to the arbiter 2 of the corresponding memory unit 0, the processing unit 1_2 synchronously writes the plurality of memory units 0 accessed at the time into the access queue 4_2. The access queue 4_3 of the processing unit 1_3 is used to record the order of requests by the processing unit 1_3 to access the plurality of memory units 0. The processing unit 1_3 can synchronously generate request information for accessing the plurality of memory units 0 at a time, and each time the processing unit 1_3 transmits the request information for accessing the plurality of memory units 0 to the arbiter 2 of the corresponding memory unit 0, the processing unit 1_3 synchronously writes the plurality of memory units 0 accessed at the time into the access queue 4_3. For specific reference, the above processing unit 1_1 is omitted here.
In a possible implementation, the memory access circuit further comprises at least one first buffer 8, the output of each arbiter 2 being connected to the input of the corresponding memory cell 0 via at least one first buffer 8. The first buffer 8 is arranged in the memory access circuit, so that the driving force of the arbiter 2 for data transmission to the memory unit 0 is enhanced, and the probability of insufficient signal driving capability caused by overlong wiring in the wiring (floorplan) process is reduced. For example, in practical applications, in a scenario where the connection between the arbiter 2 and the memory unit 0 is relatively long, if the first buffer 8 is not provided, the signal may be attenuated along with the trace on the chip (or the circuit board), which may result in a situation where the system frequency cannot be increased and the performance is degraded. By providing the first buffer 8 between the arbiter 2 and the memory unit 0, a stronger driving force can be provided for signal transmission, so that the request information sent by the arbiter 2 can be correctly transmitted to the memory unit 0.
Furthermore, the first buffer 8 is provided in the memory access circuit, and the data may be stored in the first buffer 8, so that a primary access request (for example, the request information after being arbitrated by the arbiter 2) is buffered, where a Handshake (Handshake) manner may be used between the first buffer 8 and the arbiter 2.
As shown in fig. 3, the output of each arbiter 2 is also connected to the input of a corresponding dispatch queue 3. For example, the output of the arbiter 2_1 is connected to the input of the dispatch queue 3_1; the output end of the arbiter 2_2 is connected with the input end of the scheduling queue 3_2; the output of the arbiter 2_3 is connected to the input of the dispatch queue 3_3.
Wherein the dispatch queue 3 is used for storing a processing unit identification selected according to the arbiter 2, said processing unit identification being used for indicating the processing unit 1. For example, assume that processing unit identifier 001 indicates processing unit 1_1, processing unit identifier 010 indicates processing unit 1_2, and processing unit identifier 100 indicates processing unit 1_3. When three processing units 1_1 to 1_3 simultaneously initiate an access request to a certain storage unit 0, if the arbiter 2 selects the request information from the processing unit 1_1, the arbiter 2 writes the processing unit identifier 001 of the processing unit 1_1 into the dispatch queue 3; if the arbiter 2 selects the request information from the processing unit 1_2, the arbiter 2 writes the processing unit identification 010 of the processing unit 1_2 into the dispatch queue 3; if the arbiter 2 selects the request information from the processing unit 1_3, the arbiter 2 will write the processing unit identification 100 of the processing unit 1_3 into the dispatch queue 3. In this process, the mechanism of writing to the dispatch queue 3 is that the arbiter 2 selects one of the processing units 1, and writing to the dispatch queue 3 is performed.
In an example, the arbiter 2_1 may perform round robin arbitration (round robin) on different request information from the processing units 1_1 to 1_3 accessing the storage unit 0_1, select one target request information of the current round from the 3 request information, send the target request information to the storage unit 0_1 through the two first buffers 8, and write the processing unit identifier corresponding to the target request information of the current round into the dispatch queue 3_1. Since the dispatch queue 3_1 can record the processing unit identification of one processing unit in each round of arbitration, the dispatch queue 3_1 can record the arbitration sequence of the arbiter 2_1, that is, the access sequence of different processing units to access the memory unit 0_1, through multiple rounds of arbitration.
For example, the processing units 1_1 to 1_3 simultaneously initiate access requests to the storage unit 0_1 in the scheduling module 1, the arbiter 2_1 receives request information from the processing units 1_1 to 1_3 at the same time, and the arbiter 2_1 may arbitrate the request information of different processing units (processing units 1_1 to 1_3) accessing the same scheduling module 1 by adopting a round robin (round robin) manner.
If the arbiter 2_1 selects the request information from the processing unit 1_1, the arbiter 2_1 sends the request information from the processing unit 1_1 to the storage unit 0_1 via the two first buffers 8, and the arbiter 2_1 writes the processing unit identifier 001 of the processing unit 1_1 into the dispatch queue 3_1; if the arbiter 2_1 selects the request information from the processing unit 1_2, the arbiter 2_1 sends the request information from the processing unit 1_2 to the storage unit 0_1 through the two first buffers 8, and the arbiter 2_1 writes the processing unit identifier 010 of the processing unit 1_2 into the dispatch queue 3_1; if the arbiter 2_1 selects the request information from the processing unit 1_3, the arbiter 2_1 sends the request information from the processing unit 1_3 to the storage unit 0_1 via the two first buffers 8, and the arbiter 2_1 writes the processing unit identification 100 of the processing unit 1_3 into the dispatch queue 3_1.
In an example, the arbiter 2_2 may perform round robin arbitration (round robin) on different request information from the processing units 1_1 to 1_3 accessing the storage unit 0_2, select one target request information of the current round from the 3 request information, send the target request information to the storage unit 0_2 through the two first buffers 8, and write the processing unit identifier corresponding to the target request information of the current round into the dispatch queue 3_2. Since the dispatch queue 3_2 can record the processing unit identification of one processing unit in each round of arbitration, the dispatch queue 3_2 can record the arbitration sequence of the arbiter 2_2, that is, the access sequence of different processing units to access the memory unit 0_2, through multiple rounds of arbitration.
For example, the processing units 1_1 to 1_3 simultaneously initiate access requests to the storage units 0_2 in the scheduling module 2, the arbiter 2_2 receives request information from the processing units 1_1 to 1_3 at the same time, and the arbiter 2_2 may arbitrate the request information of different processing units (processing units 1_1 to 1_3) accessing the same scheduling module 2 by adopting a round robin (round robin) manner.
If the arbiter 2_2 selects the request information from the processing unit 1_1, the arbiter 2_2 sends the request information from the processing unit 1_1 to the storage unit 0_2 via the two first buffers 8, and the arbiter 2_2 writes the processing unit identifier 001 of the processing unit 1_1 into the dispatch queue 3_2; if the arbiter 2_2 selects the request information from the processing unit 1_2, the arbiter 2_2 sends the request information from the processing unit 1_2 to the storage unit 0_2 via the two first buffers 8, and the arbiter 2_2 writes the processing unit identifier 010 of the processing unit 1_2 into the dispatch queue 3_2; if the arbiter 2_2 selects the request information from the processing unit 1_3, the arbiter 2_2 sends the request information from the processing unit 1_3 to the storage unit 0_2 via the two first buffers 8, and the arbiter 2_2 writes the processing unit identification 100 of the processing unit 1_3 into the dispatch queue 3_2.
In an example, the arbiter 2_3 may perform round robin arbitration (round robin) on different request information from the processing units 1_1 to 1_3 accessing the storage unit 0_3, select one target request information of the current round from the 3 request information, send the target request information to the storage unit 0_3 through the two first buffers 8, and write the processing unit identifier corresponding to the target request information of the current round into the dispatch queue 3_3. Since the dispatch queue 3_3 may record the processing unit identification of one processing unit during each round of arbitration, the dispatch queue 3_3 may record the arbitration sequence of the arbiter 2_3, i.e. the access sequence of different processing units to the memory unit 0_3, after multiple rounds of arbitration.
For example, the processing units 1_1 to 1_3 simultaneously initiate access requests to the storage unit 0_3 in the scheduling module 3, the arbiter 2_3 receives request information from the processing units 1_1 to 1_3 at the same time, and the arbiter 2_3 may arbitrate the request information of different processing units (processing units 1_1 to 1_3) accessing the same scheduling module 2 by adopting a round robin (round robin) manner.
If the arbiter 2_3 selects the request information from the processing unit 1_1, the arbiter 2_3 sends the request information from the processing unit 1_1 to the storage unit 0_3 via the two first buffers 8, and the arbiter 2_3 writes the processing unit identifier 001 of the processing unit 1_1 into the dispatch queue 3_3; if the arbiter 2_3 selects the request information from the processing unit 1_2, the arbiter 2_3 sends the request information from the processing unit 1_2 to the storage unit 0_3 via the two first buffers 8, and the arbiter 2_3 writes the processing unit identifier 010 of the processing unit 1_2 into the schedule queue 3_3; if the arbiter 2_3 selects the request information from the processing unit 1_3, the arbiter 2_3 sends the request information from the processing unit 1_3 to the storage unit 0_3 via the two first buffers 8, and the arbiter 2_3 writes the processing unit identifier 100 of the processing unit 1_3 to the dispatch queue 3_3.
In this way, when a plurality of different processing units 1 access the same memory unit 0, the access order of accessing the memory unit 0 can be determined in the arbitration order by the corresponding arbiter 2, and can be recorded by the schedule queue 3.
The process in which each processing unit 1 transmits the request information to each storage unit 0 is described above, and the process in which each storage unit 0 returns the return data to each processing unit 1 in response to the request information of each processing unit 1 is described below.
In a possible implementation, the memory access circuit further comprises at least one second buffer 9, the output of each memory cell 0 being connected to the input of the corresponding schedule selector 7 via at least one second buffer 9.
The second buffer 9 is arranged in the memory access circuit, so that the driving force of the memory unit 0 for data transmission to the corresponding scheduling selector 7 is enhanced, and the probability of insufficient signal driving capability caused by overlong wiring in the wiring (floorplan) process is reduced. For example, in a practical application, in a scenario where the connection between the storage unit 0 and the schedule selector 7 is relatively long, if the second buffer 9 is not provided, the signal may be attenuated along with the trace on the chip (or the circuit board), so that the system frequency cannot be increased, and the performance may be reduced. By providing the second buffer 9 between the memory unit 0 and the schedule selector 7, a stronger driving force can be provided for signal transmission, so that the return data sent by the memory unit 0 can be correctly transmitted to the corresponding schedule selector 7.
Furthermore, the second buffer 9 is provided in the memory access circuit, and the data may be stored in the second buffer 9, and the return data (for example, the return data sent by the storage unit 0 in response to the request information) may be buffered, where the second buffer 9 and the schedule selector 7 may perform data interaction in a Handshake (Handshake) manner.
As shown in fig. 3, the input end of any schedule selector 7 is connected to the output end of the corresponding memory cell 0 through two second buffers 9, and the input end of any schedule selector 7 is also connected to the output end of the corresponding schedule queue 3. For example, the output of the memory cell 0_1 is connected to the input of the schedule selector 7_1 via two second buffers 9, and the input of the schedule selector 7_1 is also connected to the output of the schedule queue 3_1; the output end of the memory cell 0_2 is connected with the input end of the dispatch selector 7_2 through two second buffers 9, and the input end of the dispatch selector 7_2 is also connected with the output end of the dispatch queue 3_2; the output of the memory unit 0_3 is connected to the input of the schedule selector 7_3 via two second buffers 9, and the input of the schedule selector 7_3 is also connected to the output of the schedule queue 3_3.
As shown in fig. 3, the output of any schedule selector 7 is connected to the inputs of a plurality of processing units 1, respectively. For example, the output ends of the schedule selector 7_1 are respectively connected with the input ends of the processing units 1_1 to 1_3 in a serial manner; the output end of the scheduling selector 7_2 is respectively connected with the input end of the processing unit 1_1 to the input end of the processing unit 1_3 in a continuous mode; the output end of the schedule selector 7_3 is connected to the input end of the processing unit 1_1 to the input end of the processing unit 1_3 in a serial manner.
In an example, when the storage unit 0_1 transmits the return data to the schedule selector 7_1 in response to the request information, the schedule selector 7_1 may transmit the received return data to the processing unit 1 indicated by the processing unit identification according to the processing unit identification read out from the head of the schedule queue 3_1; for example, assume that processing unit identifier 001 indicates processing unit 1_1, processing unit identifier 010 indicates processing unit 1_2, and processing unit identifier 100 indicates processing unit 1_3. In this case, if the processing unit identifier read out from the head of the dispatch queue 3_1 by the dispatch selector 7_1 is 001, the received return data may be transmitted to the processing unit 1_1 indicated by the processing unit identifier 001; if the processing unit identifier read out from the head of the dispatch queue 3_1 by the dispatch selector 7_1 is 010, the received return data may be sent to the processing unit 1_2 indicated by the processing unit identifier 010; if the processing unit identifier read out by the schedule selector 7_1 from the head of the schedule queue 3_1 is 100, the received return data may be sent to the processing unit 1_3 indicated by the processing unit identifier 100.
Wherein, since the storage mechanism of the scheduling queue 3_1 is first in first out, if the write information (a plurality of processing unit identifications) of the arbiter 5_1 recorded by the scheduling queue 3_1 is 001,100,010; it indicates that the arbiter 5_1 selects the processing unit 1 in the arbitration order of the processing unit 1_1, the processing unit 1_3, and the processing unit 1_2; the return data should also be returned to the processing units 1_1, 1_3, 1_2 in the arbitration order at the time of writing. The schedule selector 7_1 may sequentially read out the processing unit identifier 001, the processing unit identifier 100, and the processing unit identifier 010 from the head of the schedule queue 3_1 in the arbitration order, and sequentially transmit the received return data to the processing unit 1_1, the processing unit 1_3, and the processing unit 1_2 in the arbitration order.
In an example, when the storage unit 0_2 transmits the return data to the schedule selector 7_2 in response to the request information, the schedule selector 7_2 may transmit the received return data to the processing unit 1 indicated by the processing unit identification according to the processing unit identification read out from the head of the schedule queue 3_2; for example, assume that processing unit identifier 001 indicates processing unit 1_1, processing unit identifier 010 indicates processing unit 1_2, and processing unit identifier 100 indicates processing unit 1_3. In this case, if the processing unit identifier read out from the head of the dispatch queue 3_2 by the dispatch selector 7_2 is 001, the received return data may be transmitted to the processing unit 1_1 indicated by the processing unit identifier 001; if the processing unit identifier read out by the schedule selector 7_2 from the head of the schedule queue 3_2 is 010, the received return data may be sent to the processing unit 1_2 indicated by the processing unit identifier 010; if the processing unit identifier read out by the dispatch selector 7_2 from the head of the dispatch queue 3_2 is 100, the received return data may be sent to the processing unit 1_3 indicated by the processing unit identifier 100.
Wherein, since the storage mechanism of the scheduling queue 3_2 is first in first out, if the write information (a plurality of processing unit identifications) of the arbiter 5_2 recorded by the scheduling queue 3_2 is 001,010,100; it indicates that the arbiter 5_2 selects the arbitration order of the processing unit 1 as the processing unit 1_1, the processing unit 1_2, and the processing unit 1_3; the return data should also be returned to the processing units 1_1, 1_2, 1_3 in the arbitration order at the time of writing. The schedule selector 7_2 may sequentially read out the processing unit identifier 001, the processing unit identifier 010, and the processing unit identifier 100 from the head of the schedule queue 3_2 in the arbitration order, and sequentially transmit the received return data to the processing unit 1_1, the processing unit 1_2, and the processing unit 1_3 in the arbitration order.
In an example, when the storage unit 0_3 transmits the return data to the schedule selector 7_3 in response to the request information, the schedule selector 7_3 may transmit the received return data to the processing unit 1 indicated by the processing unit identification read out from the head of the schedule queue 3_3. For example, assume that processing unit identifier 001 indicates processing unit 1_1, processing unit identifier 010 indicates processing unit 1_2, and processing unit identifier 100 indicates processing unit 1_3. In this case, if the processing unit identifier read out from the head of the schedule queue 3_3 by the schedule selector 7_3 is 001, the received return data may be transmitted to the processing unit 1_1 indicated by the processing unit identifier 001; if the processing unit identifier read out from the head of the dispatch queue 3_3 by the dispatch selector 7_3 is 010, the received return data may be sent to the processing unit 1_2 indicated by the processing unit identifier 010; if the processing unit identifier read out by the schedule selector 7_3 from the head of the schedule queue 3_3 is 100, the received return data may be sent to the processing unit 1_3 indicated by the processing unit identifier 100.
Wherein, since the storage mechanism of the scheduling queue 3_3 is first in first out, if the write information (a plurality of processing unit identifications) of the arbiter 5_3 recorded by the scheduling queue 3_3 is 100,010,001; it indicates that the arbiter 5_3 selects the arbitration order of the processing unit 1 as the processing unit 1_3, the processing unit 1_2, and the processing unit 1_2; the return data should also be returned to the processing units 1_3, 1_2, 1_1 in the arbitration order at the time of writing. The schedule selector 7_3 may sequentially read out the processing unit identifier 100, the processing unit identifier 010, and the processing unit identifier 001 from the head of the schedule queue 3_3 in the arbitration order, and sequentially transmit the received return data to the processing unit 1_3, the processing unit 1_2, and the processing unit 1_1 in the arbitration order.
In response to receipt of the return data, each processing unit 1 may store the return data of each storage unit 0 in accordance with the order of requests recorded by the access queue 4 and the order of returns recorded by the return queue 5 by the data queue 6, and fetch the return data from the data queue 6 to the processing unit 1.
In one possible implementation, the processing unit 1 is configured to: in the case that the data queue 6 receives the return data of any storage unit 0, the column data corresponding to the storage unit 0 is taken out from the return queue 5, and a negation operation (for example, a bit negation operation) is performed, so as to obtain first data, where the first data is used for representing spatial information of the return data of the storage unit 0 in the data queue 6; fetching a list of data recording the return order of the storage unit 0 from the access queue 4 as second data; performing AND operation (such as bit AND operation) on the first data and the second data to obtain third data; writing the return data to a write location indicated by the third data in the corresponding column of the memory cell 0 of the data queue 6, and performing a set operation (e.g., a set 1 operation) at the write location indicated by the third data in the corresponding column of the memory cell 0 of the return queue 5. In this way, it is advantageous to record the return order of the return data more efficiently and accurately.
Illustratively, fig. 7 shows a schematic diagram of an access sequence recorded by the access queue 4_1 according to an embodiment of the present disclosure, and as shown in fig. 7, the processing unit 1_1 initiates two requests of information, the first transmission is the request of information to access the storage unit 0_1 and the storage unit 0_2, and the second transmission is the request of information to access the storage unit 0_1 and the storage unit 0_3, respectively. Here, fig. 7 is only an example, and the access sequence recorded in the access queue 4_1 is not particularly limited, and may be recorded according to an actual application scenario.
It is assumed that the processing unit 1_1 receives the return data of the storage unit 0_1 forwarded by the schedule selector 7_1 for the first time (for example, at time T0), receives the return data of the storage unit 0_1 forwarded by the schedule selector 7_1 for the second time (for example, at time T1), receives the return data of the storage unit 0_2 forwarded by the schedule selector 7_2, and receives the return data of the storage unit 0_3 forwarded by the schedule selector 7_3 for the third time (for example, at time T2).
In this case, at time T0, since the processing unit 1_1 receives the return data of the storage unit 0_1, the data queue 6_1 may fetch the column data corresponding to the storage unit 0_1, that is, 00 from the return queue 5_1, and perform the bit inversion operation to obtain the first data, that is: 11. and fetches column data (for example, column data of the first column of fig. 7) recording the return order of the memory cell 0_1 from the access queue 4_1 as second data, that is, 11. And performing bit-wise AND operation on the first data and the second data to obtain third data, namely 11& 11=11. Then, a 1-seeking operation (loading one) is performed on the third data 11, so as to calculate the row information of the current return data stored in the data queue 6_1, and the return data is written into the corresponding row of the 1 st column (i.e., the corresponding column of the storage unit 0_1) of the data queue 6_1.
The seek 1 operation (seek one) indicates that the first 1 position of the third data is found from the lower position to the upper position of the third data. For example, the third data is 11, the first 1 is 1, and the return data can be written to column 1 and row 1 in the data queue 6_1.
Wherein, since the processing unit 1_1 does not receive any return data at the time of T0, the access sequence stored in the return queue 5_1 isTherefore, the column data corresponding to the memory cell 0_1 at time T0 is 00. In response to writing return data to column 1, row 1 in data queue 6_1, column 1, row 1 in return queue 5_1 handles 1, namely:
at time T1, since the processing unit 1_1 receives the return data of the storage unit 0_1 and the return data of the storage unit 0_2, the data queue 6_1 may fetch the column data corresponding to the storage unit 0_1, that is, 01, from the return queue 5_1, and perform the bit inversion operation to obtain the first data, that is: 10. and fetches column data (for example, column data of the first column of fig. 7) recording the return order of the memory cell 0_1 from the access queue 4_1 as second data, that is, 11. And performing bit-wise AND operation on the first data and the second data to obtain third data, namely 10& 11=10. Then, a 1-seeking operation (loading one) is performed on the third data 10, so as to calculate the row information of the current return data stored in the data queue 6_1, and the return data is written into the corresponding row of the 1 st column (i.e. the corresponding column of the storage unit 0_1) of the data queue 6_1.
The seek 1 operation (seek one) indicates that the first 1 position of the third data is found from the lower position to the upper position of the third data. For example, the third data is 10, the first 1 is 2, and the return data can be written to the 1 st column and the 2 nd row in the data queue 6_1.
Synchronously, the data queue 6_1 may fetch column data corresponding to the memory cell 0_2, i.e. 00, from the return queue 5_1, and perform a bit inversion operation to obtain first data, i.e.: 11. and fetches column data (for example, column data of the second column of fig. 7) recording the return order of the memory cell 0_2 from the access queue 4_1 as second data, i.e., 01. And performing bit-wise AND operation on the first data and the second data to obtain third data, namely 11& 01=01. Then, a 1-seeking operation (loading one) is performed on the third data 01, so as to calculate the row information of the current return data stored in the data queue 6_1, and the return data is written into the corresponding row of the 2 nd column (i.e. the corresponding column of the storage unit 0_2) of the data queue 6_1.
The seek 1 operation (seek one) indicates that the first 1 position of the third data is found from the lower position to the upper position of the third data. For example, the third data is 01, the first 1 is located at 1, and the return data can be written to column 2 and row 1 in the data queue 6_1.
In response to writing the return data of memory cell 0_1 to column 1, row 2 in data queue 6_1, and the return data of memory cell 0_2 to column 2 in data queue 6_1Row 1, column 1, row 2 and column 2, row 1 of return queue 5_1 handle 1, namely:
at time T2, since the processing unit 1_1 receives the return data from the storage unit 0_3, the data queue 6_1 may fetch the column data corresponding to the storage unit 0_3, i.e. 00, from the return queue 5_1, and perform the bit inversion operation to obtain the first data, i.e.: 11. and fetches the column data (for example, the column data of the third column of fig. 7) of the return order of the record storage unit 0_3 from the access queue 4_1 as the second data, that is, 10. And performing bit-wise AND operation on the first data and the second data to obtain third data, namely 11& 10=10. Then, a 1-seeking operation (loading one) is performed on the third data 10, the row information of the current return data stored in the data queue 6_1 is calculated, and the return data is written into the corresponding row of the 3 rd column (i.e., the corresponding column of the storage unit 0_3) of the data queue 6_1.
The seek 1 operation (seek one) indicates that the first 1 position of the third data is found from the lower position to the upper position of the third data. For example, the third data is 10, the first 1 is 2, and the return data can be written to column 3, row 2 in the data queue 6_1.
In response to writing the return data of the memory cell 0_3 to column 3, row 2 in the data queue 6_1, column 3, row 2 in the return queue 5_1 handles 1, namely:
it should be understood that the operation of each queue in the processing units 1_2 and 1_3 may refer to the processing unit 1_1, which is not described herein.
In this way, the return queue 5 records the return sequence of the return data more efficiently and accurately, and the data queue 6 can determine the storage position of the current return data in the data queue 6 more efficiently and accurately, which is beneficial for arranging the return data in the data queue 6 according to the request sequence.
In a possible implementation, the processing unit 1 is further configured to: acquiring the first line data of the return queue 5 and the first line data of the access queue 4 in response to the return data being written into the data queue 6; when the first line data of the return queue 5 is the same as the first line data of the access queue 4, the first line return data in the data queue 6 is fetched to the processing unit 1; the data queue 6 dequeues the first line of return data, the access queue 4 dequeues the first line of data, and the return queue 5 dequeues the first line of data.
When the first line data of the return queue 5 is different from the first line data of the access queue 4, the data queue 6 continues to maintain the current state, and waits for storing the next return data.
Illustratively, fig. 7 shows a schematic diagram of an access sequence recorded by the access queue 4_1 according to an embodiment of the present disclosure, and as shown in fig. 7, the processing unit 1_1 initiates two requests of information, the first transmission is the request of information to access the storage unit 0_1 and the storage unit 0_2, and the second transmission is the request of information to access the storage unit 0_1 and the storage unit 0_3, respectively.
It is assumed that the processing unit 1_1 receives the return data of the storage unit 0_1 forwarded by the schedule selector 7_1 for the first time (for example, at time T0), receives the return data of the storage unit 0_1 forwarded by the schedule selector 7_1 for the second time (for example, at time T1), receives the return data of the storage unit 0_2 forwarded by the schedule selector 7_2, and receives the return data of the storage unit 0_3 forwarded by the schedule selector 7_3 for the third time (for example, at time T2).
As described above, at time T0, in response to writing the return data of the memory cell 0_1 to the data queue 6_1, the first line data of the return queue 5_1 is 001, which indicates that there is no response return data among the plurality of pieces of request information issued for the first time by the processing unit 1_1, unlike the first line data 011 of the access queue 4_1, and the return data of the plurality of pieces of request information issued for the first time is all returned.
At time T1, in response to writing the return data of the storage unit 0_1 and the return data of the storage unit 0_2 into the data queue 6_1, the first line data of the return queue 5_1 is 011, and the return data of the first line in the data queue 6 can be fetched to the processing unit 1_1 as in the first line data 011 of the access queue 4_1.
After the return data of the first line in the data queue 6_1 is fetched to the processing unit 1_1, the data queue 6_1 dequeues the return data of the first line, the access queue 4_1 dequeues the data of the first line, and the return queue 5_1 dequeues the data of the first line.
At time T2, in response to writing the return data of the storage unit 0_3 to the data queue 6_1, and in the case where the dequeuing operation is performed by the data queue 6_1, the access queue 4_1, and the return queue 5_1, the first line data of the return queue 5_1 becomes 101 (i.e., the original second line data), and the return data of the first line (i.e., the original second line data) in the data queue 6 can be fetched to the processing unit 1_1 as the first line data 101 (i.e., the original second line data) of the access queue 4_1.
Similarly, after the return data of the first line in the data queue 6_1 is fetched to the processing unit 1_1, the data queue 6_1 is continued to dequeue the return data of the first line, the access queue 4_1 dequeues the data of the first line, and the return queue 5_1 dequeues the data of the first line.
It should be understood that the operation of each queue in the processing units 1_2 and 1_3 may refer to the processing unit 1_1, which is not described herein.
In this way, by determining whether the first line data of the return queue 5 is the same as the first line data of the access queue 4 to determine whether to fetch the return data from the data queue 6 to the processing unit 1, the return data can be efficiently and accurately provided to the processing unit 1 in a order-preserving manner; after the return data of the first line in the data queue 6 is fetched to the processing unit 1, the data queue 6 dequeues the return data of the first line, the access queue 4 dequeues the first line data, and the return queue 5 dequeues the first line data, which is beneficial to cleaning up useless data (such as the return data already sent to each processing unit, the request sequence and the return sequence associated with the return data already sent to each processing unit 1) in the access queue 4, the return queue 5, and the data queue 6, so as to provide more storage space for new return data, and improve the work efficiency and the resource utilization of the memory access circuit.
In summary, in the memory access circuit according to the embodiments of the present disclosure, it is achieved that a plurality of processing units 1 access different memory units 0 (for example, including memory units disposed in different scheduling modules) simultaneously, when the plurality of processing units 1 access the memory units 0, the access sequence of the plurality of processing units 1 may be recorded by storing the processing unit identifiers of the processing units 1 through the scheduling queue 3, and when the memory unit 0 replies to the return data, the return data of the memory unit 0 may be accurately transmitted to each processing unit 1 according to the arbitration sequence recorded by the scheduling queue 3. In the memory access circuit of the embodiment of the present disclosure, an access queue 4, a return queue 5 and a data queue 6 are respectively set for each processing unit 1, the access queue 4 is used to record the request sequence of the processing unit 1 where the processing unit 1 is located for accessing the plurality of storage units 0, the return queue 5 is used to record the return sequence of the return data of the processing unit 1 where the plurality of storage units 0 are located, the data queue 6 is used to store the return data of the processing unit 1 where the plurality of storage units 0 are located, and through the cooperation of the access queue 4, the return queue 5 and the data queue 6, the return data of the request information sent by the processing unit 1 is buffered in the data queue 6 first, waiting for the return data of the request information to be initiated first, and an order-preserving design is implemented inside the processing unit 1, so that the processing units 1 can receive the return data of the storage units 0 simultaneously, and the waste of hardware resources, the time delay of transmission and the beat operation (the number of clock cycles of signal delay) are reduced.
Fig. 8 shows a flowchart of a memory access method according to an embodiment of the present disclosure, which is applied to a plurality of processing units 1 accessing a memory unit 0 portion of a memory access circuit for accessing a plurality of different memory units 0 as shown in fig. 1, the memory access circuit comprising: the processing units 1 include a queue module 11, an arbiter 2 corresponding to each storage unit 0, and a dispatch queue 3.
The output end of any processing unit 1 is respectively connected with the input end of each arbiter 2, and the output end of each arbiter 2 is connected with the input end of the corresponding scheduling queue 3 and the input end of the corresponding storage unit 0.
As shown in fig. 8, the memory access method includes: in step S11, each processing unit 1 sends the generated request information for accessing the storage unit 0 to the arbiter 2 of the storage unit 0 indicated by the request information, wherein the queue module 11 of the processing unit 1 is configured to record the request order of the processing unit 1 for accessing the plurality of storage units 0.
In step S12, the arbiter 2 is configured to arbitrate the request information from the plurality of processing units 1, send the request information to the storage unit 0 in an arbitration order, and write the processing unit identifications of the plurality of processing units 1 to the dispatch queue 3 in the arbitration order.
In a possible implementation, the processing unit 1 is configured to generate a plurality of request messages simultaneously, each request message being configured to access a different storage unit 0, where the plurality of request messages generated simultaneously have the same request sequence.
In a possible implementation manner, the queue module 11 includes an access queue 4, where the access queue 4 is configured to record an order of requests of the processing unit 1 to access the plurality of storage units 0, the access queue 4 includes a plurality of rows and a plurality of columns, and the access queue 4 records an order of requests of the processing unit 1 to access the plurality of storage units 0 by performing a set operation on different rows of different columns, where each row corresponds to the same order of requests, and each column is configured to record an order of requests of the processing unit 1 to access a different storage unit 0.
In a possible implementation, the memory access circuit further comprises at least one first buffer 8, the output of each arbiter 2 being connected to the input of a corresponding memory cell 0 via at least one first buffer 8.
Fig. 9 shows a flowchart of a memory access method according to an embodiment of the present disclosure, the method being applied to a return data portion of a plurality of memory units 0 in response to an access request of a processing unit 1 in a memory access circuit for accessing a plurality of different memory units 0 as shown in fig. 1, the memory access circuit comprising: the processing units 1 include a queue module 11, a schedule queue 3 corresponding to each storage unit 0, and a schedule selector 7.
The input end of any one of the schedule selectors 7 is connected with the output end of the corresponding storage unit 0 and the output end of the corresponding schedule queue 3, and the output end of the schedule selector 7 is respectively connected with the input end of each processing unit 1.
The memory access method comprises the following steps: in step S13, the storage unit generates return data in response to the request information from the processing unit.
In step S14, the schedule selector 7 is configured to transmit, according to the processing unit identifier read from the schedule queue 3, the return data of the storage unit 0 to the processing unit 1 indicated by the processing unit identifier.
In step S15, the queue module 11 of the processing unit 1 stores the return data of the plurality of storage units 0, and fetches the return data from the queue module 11 to the processing unit 1 according to the order of requests of the processing unit 1 to access the plurality of storage units 0.
In one possible implementation, the queue module 11 of the processing unit 1 stores return data of a plurality of storage units 0, including: the processing unit 1 receives a plurality of return data simultaneously, each from a different memory unit 0, and the queue module 11 of the processing unit 1 is configured to store the return data of the plurality of memory units simultaneously.
In one possible implementation, the queue module 11 includes an access queue 4, a return queue 5, and a data queue 6; the access queue 4 is used for recording the request sequence of the processing unit 1 for accessing the plurality of storage units 0; the return queue 5 is used for recording the return sequence of the return data received by the processing unit 1 from the plurality of storage units 0; the data queue 6 is used for storing the return data of a plurality of storage units 0 according to the request sequence and the instruction of the return sequence, and retrieving the return data from the data queue 6 to the processing unit 1.
In a possible implementation manner, the access queue 4 includes a plurality of rows and a plurality of columns, the access queue 4 records the request sequence of the processing unit 1 for accessing a plurality of storage units 0 by performing a set operation on different rows of different columns, where the request sequence corresponding to each row of the access queue 4 is the same, and each column of the access queue 4 is used for recording the request sequence of the processing unit 1 for accessing a different storage unit 0.
In a possible implementation manner, the return queue 5 includes a plurality of rows and a plurality of columns, the return queue 5 records a return sequence of the processing unit 1 receiving return data of a plurality of storage units 0 by performing a set operation on different rows of different columns, where a return sequence corresponding to each row of the return queue 5 is the same, and each column of the return queue 5 is used to record a return sequence of the processing unit 1 receiving a different storage unit 0.
In a possible implementation, the data queue 6 includes a plurality of rows and a plurality of columns, and the data queue 6 records that the processing unit 1 receives the return data of a plurality of memory units 0 by performing the write operation on different rows of different columns, wherein each column of the data queue 6 is used to store the return data of a corresponding memory unit 0.
In one possible implementation, the queue module 11 of the processing unit 1 stores return data of a plurality of storage units, including: in the case that the data queue 6 receives the return data of any storage unit 0, the column data corresponding to the storage unit 0 is taken out from the return queue 5, and a negation operation (for example, a bit negation operation) is performed, so as to obtain first data, where the first data is used for representing spatial information of the return data of the storage unit 0 in the data queue 6; fetching a list of data recording the return order of the storage unit 0 from the access queue 4 as second data; performing AND operation (such as bit AND operation) on the first data and the second data to obtain third data; writing the return data to a write location indicated by the third data in the corresponding column of the memory cell 0 of the data queue 6, and performing a set operation (e.g., a set 1 operation) at the write location indicated by the third data in the corresponding column of the memory cell 0 of the return queue 5.
In one possible implementation, the memory access method further includes: acquiring the first line data of the return queue 5 and the first line data of the access queue 4 in response to the return data being written into the data queue 6; when the first line data of the return queue 5 is the same as the first line data of the access queue 4, the first line return data in the data queue 6 is fetched to the processing unit 1; the data queue 6 dequeues the first line of return data, the access queue 4 dequeues the first line of data, and the return queue 5 dequeues the first line of data.
In a possible implementation manner, the access queue 4, the return queue 5 and the data queue 6 of the same processing unit 1 have the same depth, the depth represents the number of lines of the access queue 4, the return queue 5 and the data queue 6, the depth is determined according to the concurrency of the processing unit 1, and the concurrency of the processing unit 1 represents the number of times that the processing unit 1 continuously sends request information to the storage unit 0 without waiting for the return data of the request information; the bit width of the data queue 6 is determined according to the number of the memory units 0 and the bit width of the return data, the bit widths of the access queue 4 and the return queue 5 are the same, and the bit widths of the access queue 4 and the return queue 5 are determined according to the number of the memory units 0.
In a possible implementation, the memory access circuit further comprises at least one second buffer 9, the output of each memory cell 0 being connected to the input of the corresponding schedule selector 7 via at least one second buffer 9.
It will be appreciated that the above-mentioned method embodiments of the present disclosure may be combined with each other to form a combined embodiment without departing from the principle logic, and are limited to the description of the present disclosure. It will be appreciated by those skilled in the art that in the above-described methods of the embodiments, the particular order of execution of the steps should be determined by their function and possible inherent logic.
In addition, the disclosure further provides an integrated circuit, an electronic device, and a computer program product, which encapsulate the memory access circuit, and any of the foregoing may be used to implement any of the memory access methods provided in the disclosure, and corresponding technical schemes and descriptions and corresponding descriptions referring to method parts are not repeated.
Embodiments of the present disclosure also provide an integrated circuit including a memory access circuit as described above.
The disclosed embodiments also propose an electronic device comprising a memory access circuit as described above. The electronic device may be provided as a terminal, server or other form of device. For example, the electronic device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a personal digital assistant (Personal Digital Assistant, PDA), a handheld device, a computing device, an in-vehicle device, a wearable device, etc., which is not limited by the present disclosure.
Embodiments of the present disclosure also provide a computer program product comprising computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when run in a processor of an electronic device, performs the above method.
The computer program product may be realized in particular by means of hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.
The foregoing description of various embodiments is intended to highlight differences between the various embodiments, which may be the same or similar to each other by reference, and is not repeated herein for the sake of brevity.
It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.
The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (22)

1. A memory access circuit for accessing a plurality of different memory cells, the memory access circuit comprising: the system comprises a plurality of processing units, a plurality of storage units and a plurality of scheduling units, wherein each processing unit comprises a queue module, an arbiter corresponding to each storage unit, a scheduling queue and a scheduling selector;
the output end of any processing unit is respectively connected with the input end of each arbiter, the output end of each arbiter is connected with the input end of a corresponding scheduling queue and the input end of a corresponding storage unit, the processing units are used for generating request information for accessing a plurality of storage units, the queue module of each processing unit is used for recording the request sequence of the processing unit for accessing the plurality of storage units, the arbiter is used for arbitrating the request information from the plurality of processing units, sending the request information to the storage units according to the arbitration sequence, and writing the processing unit identifiers of the plurality of processing units into the scheduling queue according to the arbitration sequence;
The input end of any scheduling selector is connected with the output end of a corresponding storage unit and the output end of a corresponding scheduling queue, the output end of the scheduling selector is respectively connected with the input end of each processing unit, the scheduling selector is used for transmitting the return data of the storage unit to the processing unit indicated by the processing unit identification according to the processing unit identification read from the scheduling queue, the queue module of the processing unit is used for storing the return data of a plurality of storage units, and the return data is taken out from the queue module to the processing unit according to the request sequence of the processing unit for accessing the plurality of storage units.
2. The memory access circuit of claim 1, wherein the processing unit is configured to:
simultaneously generating a plurality of request messages, each request message being used to access a different memory unit, wherein the plurality of request messages generated simultaneously have the same request sequence;
and/or receiving a plurality of return data simultaneously, each return data from a different storage unit, wherein the queue module of the processing unit is used for storing the return data of a plurality of storage units simultaneously.
3. The memory access circuit of claim 1 or 2, wherein the queue module comprises an access queue, a return queue, a data queue;
the access queue is used for recording the request sequence of the processing unit for accessing the plurality of storage units;
the return queue is used for recording the return sequence of the return data received by the processing unit from the plurality of storage units;
the data queue is used for storing the return data of a plurality of storage units according to the request sequence and the instruction of the return sequence, and fetching the return data from the data queue to the processing unit.
4. The memory access circuit of claim 3, wherein the access queue comprises a plurality of rows and a plurality of columns, the access queue records the order of requests by the processing unit to access the plurality of memory cells by performing a set operation on different rows of different columns, wherein each row of the access queue corresponds to the same order of requests, and each column of the access queue is respectively used for recording the order of requests by the processing unit to access a different memory cell.
5. A memory access circuit according to claim 3, wherein the return queue comprises a plurality of rows and a plurality of columns, the return queue recording a return order in which the processing unit receives return data for a plurality of memory cells by performing a set operation on different rows of different columns, wherein the return order corresponding to each row of the return queue is the same, and wherein each column of the return queue is respectively used to record a return order in which the processing unit receives a different memory cell.
6. The memory access circuit of claim 3 wherein the data queue comprises a plurality of rows and a plurality of columns, the data queue recording the return data of the plurality of memory cells received by the processing unit by performing write operations on different rows of different columns, wherein each column of the data queue is for storing the return data of a corresponding memory cell.
7. The memory access circuit of claim 3, wherein the processing unit is configured to:
under the condition that the data queue receives the return data of any storage unit, column data corresponding to the storage unit is taken out from the return queue, and inversion operation is carried out to obtain first data, wherein the first data is used for representing the space information of the return data of the storage unit in the data queue;
taking out a column of data recording the return sequence of the storage unit from the access queue as second data;
performing AND operation on the first data and the second data to obtain third data;
writing the return data to a write location indicated by the third data in a corresponding column of the storage unit of the data queue, and performing a set operation on the write location indicated by the third data in the corresponding column of the storage unit of the return queue.
8. The memory access circuit of claim 7, wherein the processing unit is further configured to: responding to the returned data to write into the data queue, and acquiring the first line data of the returned queue and the first line data of the access queue;
under the condition that the first line data of the return queue is the same as the first line data of the access queue, the return data of the first line in the data queue is taken out to a processing unit;
the data queue dequeues the return data of the first line, the access queue dequeues the first line data, and the return queue dequeues the first line data.
9. A memory access circuit according to claim 3, wherein the access queue, the return queue, the data queue of the same processing unit are the same in depth, the depth representing the number of lines of the access queue, the return queue, the data queue, the depth being determined in accordance with the concurrency of the processing unit, the concurrency of the processing unit representing the number of times the processing unit continues to send request information to a storage unit without waiting for return data of the request information;
The bit width of the data queue is determined according to the number of the storage units and the bit width of the return data, the bit widths of the access queue and the return queue are the same, and the bit widths of the access queue and the return queue are determined according to the number of the storage units.
10. The memory access circuit according to claim 1 or 2, further comprising at least one first buffer, at least one second buffer, the output of each arbiter being connected to the input of a corresponding memory cell through the at least one first buffer, the output of each memory cell being connected to the input of a corresponding schedule selector through the at least one second buffer.
11. A memory access method, wherein the memory access method is applied to a memory access circuit, the memory access circuit comprising: the system comprises a plurality of processing units, a plurality of storage units and a scheduling unit, wherein each processing unit comprises a queue module, a scheduling queue corresponding to each storage unit and a scheduling selector;
the input end of any scheduling selector is connected with the output end of the corresponding storage unit and the output end of the corresponding scheduling queue, and the output end of the scheduling selector is respectively connected with the input end of each processing unit;
The memory access method comprises the following steps:
the storage unit generates return data in response to the request information from the processing unit;
the dispatch selector is used for transmitting the return data of the storage unit to the processing unit indicated by the processing unit identifier according to the processing unit identifier read from the dispatch queue;
the queue module of the processing unit stores return data of a plurality of storage units, and fetches the return data from the queue module to the processing unit according to a request order of the processing unit to access the plurality of storage units.
12. The memory access method of claim 11, wherein the queue module of the processing unit stores return data for a plurality of memory cells, comprising:
the processing unit receives a plurality of return data simultaneously, each return data from a different storage unit, and the queue module of the processing unit is used for storing the return data of the plurality of storage units simultaneously.
13. The memory access method according to claim 11 or 12, wherein the queue module comprises an access queue, a return queue, a data queue;
the access queue is used for recording the request sequence of the processing unit for accessing the plurality of storage units;
The return queue is used for recording the return sequence of the return data received by the processing unit from the plurality of storage units;
the data queue is used for storing the return data of a plurality of storage units according to the request sequence and the instruction of the return sequence, and fetching the return data from the data queue to the processing unit.
14. The memory access method according to claim 13, wherein the access queue includes a plurality of rows and a plurality of columns, the access queue records a request sequence of the processing unit for accessing the plurality of memory units by performing a set operation on different rows of different columns, wherein the request sequence corresponding to each row of the access queue is the same, and each column of the access queue is used for recording a request sequence of the processing unit for accessing a different memory unit;
the return queues comprise a plurality of rows and a plurality of columns, the return queues record the return sequence of the processing unit for receiving the return data of a plurality of storage units by executing setting operation on different rows of different columns, wherein the return sequence corresponding to each row of the return queues is the same, and each column of the return queues is respectively used for recording the return sequence of the processing unit for receiving a different storage unit;
The data queue comprises a plurality of rows and a plurality of columns, the data queue records the return data of a plurality of storage units received by the processing unit by executing writing operation on different rows of different columns, wherein each column of the data queue is used for storing the return data of a corresponding storage unit.
15. The memory access method of claim 13, wherein the queue module of the processing unit stores return data for a plurality of memory cells, comprising:
under the condition that the data queue receives the return data of any storage unit, column data corresponding to the storage unit is taken out from the return queue, and inversion operation is carried out to obtain first data, wherein the first data is used for representing the space information of the return data of the storage unit in the data queue;
taking out a column of data recording the return sequence of the storage unit from the access queue as second data;
performing AND operation on the first data and the second data to obtain third data;
writing the return data to a write location indicated by the third data in a corresponding column of the storage unit of the data queue, and performing a set operation on the write location indicated by the third data in the corresponding column of the storage unit of the return queue.
16. The memory access method of claim 15, wherein the memory access method further comprises:
responding to the returned data to write into the data queue, and acquiring the first line data of the returned queue and the first line data of the access queue;
under the condition that the first line data of the return queue is the same as the first line data of the access queue, the return data of the first line in the data queue is taken out to a processing unit;
the data queue dequeues the return data of the first line, the access queue dequeues the first line data, and the return queue dequeues the first line data.
17. The memory access method according to claim 13, wherein the access queue, the return queue, the data queue of the same processing unit have the same depth, the depth indicating the number of lines of the access queue, the return queue, the data queue, the depth being determined according to the concurrency of the processing unit, the concurrency of the processing unit indicating the number of times the processing unit continues to transmit request information to a storage unit without waiting for return data of the request information;
The bit width of the data queue is determined according to the number of the storage units and the bit width of the return data, the bit widths of the access queue and the return queue are the same, and the bit widths of the access queue and the return queue are determined according to the number of the storage units.
18. A memory access method, wherein the memory access method is applied to a memory access circuit, the memory access circuit comprising: the system comprises a plurality of processing units, a plurality of storage units and a plurality of scheduling units, wherein each processing unit comprises a queue module, an arbiter corresponding to each storage unit and a scheduling queue;
the output end of any processing unit is respectively connected with the input end of each arbiter, and the output end of each arbiter is connected with the input end of a corresponding scheduling queue and the input end of a corresponding storage unit;
the memory access method comprises the following steps:
each processing unit sends the generated request information for accessing the storage unit to an arbiter of the storage unit indicated by the request information, wherein the queue module of the processing unit is used for recording the request sequence of the processing unit for accessing a plurality of storage units;
the arbiter is configured to arbitrate request information from a plurality of processing units, send the request information to a storage unit in an arbitration order, and write processing unit identifications of the plurality of processing units to a dispatch queue in the arbitration order.
19. The memory access method of claim 18, wherein the processing unit is configured to generate a plurality of request messages simultaneously, each request message being configured to access a different memory location, wherein the plurality of request messages generated simultaneously have the same request sequence.
20. The memory access method according to claim 18 or 19, wherein the queue module comprises an access queue for recording the order of requests by the processing unit to access a plurality of memory units,
the access queue comprises a plurality of rows and a plurality of columns, the access queue records the request sequence of the processing unit for accessing a plurality of storage units by executing setting operation on different rows of different columns, wherein the request sequence corresponding to each row is the same, and each column is respectively used for recording the request sequence of the processing unit for accessing a different storage unit.
21. An integrated circuit comprising the memory access circuit of any one of claims 1 to 10.
22. An electronic device comprising the memory access circuit of any one of claims 1 to 10.
CN202310807727.XA 2023-07-03 2023-07-03 Memory access circuit, memory access method, integrated circuit, and electronic device Active CN116521096B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310807727.XA CN116521096B (en) 2023-07-03 2023-07-03 Memory access circuit, memory access method, integrated circuit, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310807727.XA CN116521096B (en) 2023-07-03 2023-07-03 Memory access circuit, memory access method, integrated circuit, and electronic device

Publications (2)

Publication Number Publication Date
CN116521096A true CN116521096A (en) 2023-08-01
CN116521096B CN116521096B (en) 2023-09-22

Family

ID=87392579

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310807727.XA Active CN116521096B (en) 2023-07-03 2023-07-03 Memory access circuit, memory access method, integrated circuit, and electronic device

Country Status (1)

Country Link
CN (1) CN116521096B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116719479A (en) * 2023-07-03 2023-09-08 摩尔线程智能科技(北京)有限责任公司 Memory access circuit, memory access method, integrated circuit, and electronic device
CN118467418A (en) * 2024-07-08 2024-08-09 杭州登临瀚海科技有限公司 Storage access system and storage access scheduling method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103201725A (en) * 2010-11-25 2013-07-10 国际商业机器公司 Memory access device for memory sharing among plurality of processors, and access method for same
CN103620570A (en) * 2011-06-24 2014-03-05 Arm有限公司 A memory controller and method of operation of such a memory controller
CN105144128A (en) * 2013-04-23 2015-12-09 Arm有限公司 Memory access control
US9626309B1 (en) * 2014-07-02 2017-04-18 Microsemi Storage Solutions (U.S.), Inc. Method and controller for requesting queue arbitration and coalescing memory access commands
CN114450672A (en) * 2020-11-06 2022-05-06 深圳市大疆创新科技有限公司 Access control method and device of memory and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103201725A (en) * 2010-11-25 2013-07-10 国际商业机器公司 Memory access device for memory sharing among plurality of processors, and access method for same
CN103620570A (en) * 2011-06-24 2014-03-05 Arm有限公司 A memory controller and method of operation of such a memory controller
CN105144128A (en) * 2013-04-23 2015-12-09 Arm有限公司 Memory access control
US9626309B1 (en) * 2014-07-02 2017-04-18 Microsemi Storage Solutions (U.S.), Inc. Method and controller for requesting queue arbitration and coalescing memory access commands
CN114450672A (en) * 2020-11-06 2022-05-06 深圳市大疆创新科技有限公司 Access control method and device of memory and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116719479A (en) * 2023-07-03 2023-09-08 摩尔线程智能科技(北京)有限责任公司 Memory access circuit, memory access method, integrated circuit, and electronic device
CN116719479B (en) * 2023-07-03 2024-02-20 摩尔线程智能科技(北京)有限责任公司 Memory access circuit, memory access method, integrated circuit, and electronic device
CN118467418A (en) * 2024-07-08 2024-08-09 杭州登临瀚海科技有限公司 Storage access system and storage access scheduling method

Also Published As

Publication number Publication date
CN116521096B (en) 2023-09-22

Similar Documents

Publication Publication Date Title
CN116521096B (en) Memory access circuit, memory access method, integrated circuit, and electronic device
CN116578245B (en) Memory access circuit, memory access method, integrated circuit, and electronic device
CN111913652A (en) Memory device including processing circuit, memory controller, and memory system
CN116661703B (en) Memory access circuit, memory access method, integrated circuit, and electronic device
WO2006012284A2 (en) An apparatus and method for packet coalescing within interconnection network routers
US9529651B2 (en) Apparatus and method for executing agent
CN111400212B (en) Transmission method and device based on remote direct data access
CN116737083B (en) Memory access circuit, memory access method, integrated circuit, and electronic device
CN110825436A (en) Calculation method applied to artificial intelligence chip and artificial intelligence chip
CN115964319A (en) Data processing method for remote direct memory access and related product
CN115129480A (en) Scalar processing unit and access control method thereof
US8127110B2 (en) Method, system, and medium for providing interprocessor data communication
JPH0358150A (en) Memory controller
CN115391053B (en) Online service method and device based on CPU and GPU hybrid calculation
CN114924794B (en) Address storage and scheduling method and device for transmission queue of storage component
CN116521097B (en) Memory access circuit, memory access method, integrated circuit, and electronic device
CN116719479B (en) Memory access circuit, memory access method, integrated circuit, and electronic device
WO2022227563A1 (en) Hardware circuit, data migration method, chip, and electronic device
EP3495960A1 (en) Program, apparatus, and method for communicating data between parallel processor cores
CN116594570B (en) Memory access circuit, memory access method, integrated circuit, and electronic device
CN113157603A (en) Data reading device, method, chip, computer equipment and storage medium
CN116820344B (en) Memory access circuit, memory access method, integrated circuit, and electronic device
JP2013539577A (en) Interrupt-based command processing
CN112486872B (en) Data processing method and device
CN117908959A (en) Method for performing atomic operations and related products

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant