CN116578245B

CN116578245B - Memory access circuit, memory access method, integrated circuit, and electronic device

Info

Publication number: CN116578245B
Application number: CN202310807731.6A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Moore Threads Technology Co Ltd
Current assignee: Moore Threads Technology Co Ltd
Priority date: 2023-07-03
Filing date: 2023-07-03
Publication date: 2023-11-17
Anticipated expiration: 2043-07-03
Also published as: CN116578245A

Abstract

The present disclosure relates to a memory access circuit, a memory access method, an integrated circuit and an electronic device, and relates to the field of electronic technology, where the memory access circuit is configured to access different memory units, and the memory access circuit includes a scheduling queue, a return queue, a data queue, a processing unit, a scheduling selector corresponding to each processing unit, and an arbiter corresponding to each memory unit, where the scheduling queue is configured to record a request sequence of the processing unit for accessing the memory unit according to an arbitration sequence; the return queue is used for recording the return sequence of the return data of the storage unit, the data queue is used for storing the return data of the storage unit according to the request sequence and the indication of the return sequence, and the return data is fetched from the data queue to the corresponding processing unit. According to the embodiment of the invention, each processing unit can receive the corresponding return data according to the sequence of sending the request information, and the accuracy and stability of the memory access circuit are improved.

Description

Memory access circuit, memory access method, integrated circuit, and electronic device

Technical Field

The present disclosure relates to the field of electronic technologies, and in particular, to a memory access circuit, a memory access method, an integrated circuit, and an electronic device.

Background

Along with the rapid development of the integrated circuit industry, various processor chips are more and more widely applied to various industries, and can be applied to scenes such as network communication, mobile phones, set top boxes, liquid crystal televisions, medical equipment, security equipment, industrial control equipment, intelligent ammeter, intelligent wearing, internet of things, automobile electronics and the like.

In various processor chips, a processing unit may access a memory unit to achieve various target tasks, where memory access circuitry within the processor chip may be used to manage communications between the processing unit and the memory unit, which may directly affect the operating efficiency and stability of the overall processor chip.

Disclosure of Invention

The present disclosure proposes a memory access technique.

According to an aspect of the present disclosure, there is provided a memory access circuit including: the memory access circuit is used for accessing different memory units, and comprises: the system comprises a dispatch queue, a return queue, a data queue, processing units, a dispatch selector corresponding to each processing unit and an arbiter corresponding to each storage unit; the input end of any scheduling selector is connected with the output end of the corresponding processing unit, the output end of any scheduling selector is connected with the input end of each arbiter, and the scheduling selector is used for sending the request information received from the processing unit to the arbiter of the storage unit indicated by the request information; the output end of each arbiter is connected with the input end of the corresponding storage unit and the input end of the scheduling queue, the arbiter is used for arbitrating the request information from the processing unit, the request information is sent to the storage unit according to the arbitration sequence, and the scheduling queue is used for recording the request sequence of the processing unit for accessing the storage unit according to the arbitration sequence; the output end of each storage unit, the output end of the scheduling queue and the output end of the return queue are connected with the input end of the data queue, the output end of the data queue is connected with the input end of each processing unit, the return queue is used for recording the return sequence of the return data of the storage units, the data queue is used for storing the return data of the storage units according to the request sequence and the indication of the return sequence, and the return data is taken out from the data queue to the corresponding processing unit.

In one possible implementation manner, the scheduling queue records the request sequence of multiple processing units accessing multiple storage units by setting different rows of different columns, wherein each column of the scheduling queue is used for recording the request sequence of different processing units accessing different storage units, the columns corresponding to the same storage unit share a row pointer, the row pointer can perform self-adding operation according to the access times of the same storage unit, and the row pointer is used for indicating the current row set in each column.

In one possible implementation, each column of the return queue records a return order of a corresponding storage unit.

In one possible implementation, each column of the data queue is configured to store return data of a corresponding storage unit, and the data queue is configured to: responding to the received return data of any storage unit, taking out a plurality of columns of data corresponding to the storage unit from the scheduling queue, performing OR operation to obtain first data, and taking out a column of data recording the return sequence of the storage unit from the return queue to perform inverse operation to obtain second data; performing AND operation on the first data and the second data to obtain third data; and writing the return data into a writing position indicated by the third data in a corresponding column of the storage unit of the data queue.

In one possible implementation, the return queue is configured to: in response to writing the return data to the write location indicated by the third data, the write location indicated by the third data is set in a corresponding column of the storage unit of the return queue.

In one possible implementation, the data queue is configured to: obtaining a return sequence result of the return queue, wherein the return sequence result is obtained by performing or operation on first line data of the return queue; and under the condition that the return sequence result is a preset result, respectively sending the return data of the first line in the data queue to the corresponding processing unit according to the indication of the first line information of the scheduling queue.

In one possible implementation manner, after the return data of the first row in the data queue are sent to the corresponding processing units, the data queue is used for: dequeuing return data sent to the first line in the data queue of the corresponding processing unit in the data queue; the scheduling queue is used for: in the scheduling queue, dequeuing the head line data of a plurality of columns corresponding to the storage unit which sends the return data; the return queue is used for: and dequeuing the first row data of the corresponding column of the storage unit which sends the return data in the return queue.

In a possible implementation manner, the memory access circuit further comprises at least one first buffer and at least one second buffer, wherein an output end of each arbiter is connected to an input end of a corresponding memory unit through the at least one first buffer, and an output end of each memory unit is connected to an input end of the data queue through the at least one second buffer.

According to an aspect of the present disclosure, there is provided a memory access method including: each processing unit sends the generated request information for accessing the storage unit to a corresponding scheduling selector; the scheduling selector transmits the request information received from the processing unit to an arbiter of a storage unit indicated by the request information; the arbiter arbitrates the request information from the processing units and sends the request information to the corresponding storage units according to the arbitration sequence; the scheduling queue records the request sequence of the processing unit for accessing the storage unit according to the arbitration sequence.

The memory access method is applied to a memory access circuit, and the memory access circuit is used for accessing different memory units, and comprises the following steps: the system comprises a scheduling queue, a plurality of processing units, a scheduling selector corresponding to each processing unit and an arbiter corresponding to each storage unit; the input end of any scheduling selector is connected with the output end of the corresponding processing unit, the output end of any scheduling selector is connected with the input end of each arbiter, and the output end of each arbiter is connected with the input end of the corresponding storage unit and the input end of the scheduling queue.

In one possible implementation, the scheduling queue records, according to the arbitration order, an order of requests by the processing unit to access the storage unit, including: the scheduling queue records the request sequence of a plurality of processing units for accessing a plurality of storage units through setting different rows of different columns, wherein each column of the scheduling queue is used for recording the request sequence of the different processing units for accessing the different storage units, the columns corresponding to the same storage unit share a row pointer, the row pointer can perform self-adding operation according to the access times of the same storage unit, and the row pointer is used for indicating the current row set in each column.

In a possible implementation, the memory access circuit further includes at least one first buffer, and the output terminal of each arbiter is connected to the input terminal of the corresponding memory cell through the at least one first buffer.

According to an aspect of the present disclosure, there is provided a memory access method including: the data queue stores the returned data of the storage unit according to the request sequence recorded by the scheduling queue and the returned sequence of the returned data of the storage unit recorded by the return queue; and taking out the returned data from the data queue to the corresponding processing unit.

The memory access method is applied to a memory access circuit, and the memory access circuit is used for accessing different memory units, and comprises the following steps: scheduling queue, return queue, data queue, multiple processing units; the output end of each storage unit, the output end of the scheduling queue and the output end of the return queue are connected with the input end of the data queue, and the output end of the data queue is connected with the input end of each processing unit.

In one possible implementation manner, each column of the return queue is used for recording a return sequence of a corresponding storage unit, each column of the data queue is used for storing return data of the corresponding storage unit, and the data queue stores the return data of the storage unit according to a request sequence recorded by the scheduling queue and a return sequence of the return data of the storage unit recorded by the return queue, and the method includes: the data queue responds to the received return data of any storage unit, a plurality of columns of data corresponding to the storage unit are taken out of the scheduling queue for OR operation to obtain first data, a column of data recording the return sequence of the storage unit is taken out of the return queue for inverse operation to obtain second data; performing AND operation on the first data and the second data to obtain third data; writing the return data into a writing position indicated by the third data in a corresponding column of the storage unit of the data queue; in response to writing the return data to the write location indicated by the third data, the write location indicated by the third data is set in a corresponding column of the storage unit of the return queue.

In one possible implementation manner, the fetching return data from the data queue to the corresponding processing unit includes: obtaining a return sequence result of the return queue, wherein the return sequence result is obtained by performing or operation on first line data of the return queue; and under the condition that the return sequence result is a preset result, respectively sending the return data of the first line in the data queue to the corresponding processing unit according to the indication of the first line information of the scheduling queue.

In one possible implementation manner, after sending the return data of the first line in the data queue to the corresponding processing unit, the memory access method further includes: dequeuing return data sent to the first line in the data queue of the corresponding processing unit in the data queue; in the scheduling queue, dequeuing the head line data of a plurality of columns corresponding to the storage unit which sends the return data; and dequeuing the first row data of the corresponding column of the storage unit which sends the return data in the return queue.

In a possible implementation, the memory access circuit further comprises at least one second buffer, and the output terminal of each memory cell is connected to the input terminal of the data queue through the at least one second buffer.

According to an aspect of the present disclosure, there is provided an integrated circuit comprising a memory access circuit as described above.

According to an aspect of the present disclosure, there is provided an electronic device comprising a memory access circuit as described above.

According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.

In the embodiment of the disclosure, the processing units access different storage units (such as storage units deployed in different scheduling modules), and when a plurality of processing units access a plurality of storage units, the request sequence of the processing units for accessing the storage units can be recorded through a scheduling queue; when the storage units reply to the returned data, the returned data of each storage unit can be stored in the data queue according to the request sequence recorded by the scheduling queue and the indication of the return sequence recorded by the return queue, and the returned data is taken out from the data queue to the corresponding processing unit, so that the returned data of the request information sent by each processing unit can be cached in the data queue first and wait for the returned data of the request information to be initiated first, the return data obtained by each processing unit is favorable for order-preserving reply (namely, the corresponding returned data is received according to the sequence of the request information sent by each processing unit), and the accuracy and the stability of the memory access circuit are improved.

In addition, the memory access circuit of the embodiment of the disclosure does not need to configure a queue for recording the sending order of the processing unit to different storage units for each processing unit and configure a queue for recording the access order of different processing units for each storage unit, and the recording of different processing units to different storage units can be completed by setting a scheduling queue, so that the complexity of the memory access circuit is reduced, and the applicability of the memory access circuit is improved. And the returned data of a plurality of processing units can share one data queue, so that the resource waste caused by the corresponding storage of each processing unit which is exclusive to the resource is reduced.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the technical aspects of the disclosure.

Fig. 1 shows a schematic diagram of a memory access circuit according to an embodiment of the present disclosure.

Fig. 2 shows a schematic diagram of another memory access circuit according to an embodiment of the present disclosure.

FIG. 3 shows a schematic diagram of a dispatch queue according to an embodiment of the present disclosure.

Fig. 4 shows a schematic diagram of a return queue according to an embodiment of the present disclosure.

FIG. 5 shows a schematic diagram of a data queue according to an embodiment of the present disclosure.

Fig. 6 shows a flow chart of a memory access method according to an embodiment of the present disclosure.

Fig. 7 illustrates a flow chart of another memory access method according to an embodiment of the present disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the disclosure will be described in detail below with reference to the drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Although various aspects of the embodiments are illustrated in the accompanying drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Furthermore, numerous specific details are set forth in the following detailed description in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements, and circuits well known to those skilled in the art have not been described in detail in order not to obscure the present disclosure.

Fig. 1 shows a schematic diagram of a memory access circuit for accessing different memory cells 0, as shown in fig. 1, according to an embodiment of the present disclosure, the memory access circuit comprising: a processing unit 1, a dispatch selector 2 corresponding to each processing unit 1, an arbiter 3 corresponding to each storage unit 0, a dispatch queue 4, a return queue 5, and a data queue 6;

the input end of any scheduling selector 2 is connected with the output end of the corresponding processing unit 1, the output end of any scheduling selector 2 is connected with the input end of each arbiter 3, and the scheduling selector 2 is used for sending the request information received from the processing unit 1 to the arbiter 3 of the storage unit 0 indicated by the request information;

the output end of each arbiter 3 is connected with the input end of the corresponding storage unit 0 and the input end of the scheduling queue 4, the arbiter 3 is used for arbitrating the request information from the processing unit 1, the request information is sent to the storage unit 0 according to the arbitration sequence, and the scheduling queue 4 is used for recording the request sequence of the processing unit 1 for accessing the storage unit 0 according to the arbitration sequence;

The output end of each storage unit 0, the output end of the scheduling queue 4 and the output end of the return queue 5 are connected with the input end of the data queue 6, the output end of the data queue 6 is connected with the input end of each processing unit 1, the return queue 5 is used for recording the return sequence of the return data of the storage units 0, the data queue 6 is used for storing the return data of the storage units 0 according to the request sequence and the indication of the return sequence, and the return data is taken out from the data queue 6 to the corresponding processing units 1.

The depth of the dispatch queue 4, the return queue 5 and the data queue 6 of the memory access circuit in the embodiment of the disclosure is the same, the depth represents the number of lines of the dispatch queue 4, the return queue 5 and the data queue 6, the depth is determined according to the concurrency of the processing unit 1, and the concurrency of the processing unit 1 represents the number of times that the processing unit 1 continuously sends request information to the storage unit 0 at most under the condition of not waiting for the return data of the request information; the bit width of the data queue 6 is determined according to the number of memory cells and the bit width of the return data, the bit widths of the dispatch queue 4 and the return queue 5 are the same, and the bit widths of the dispatch queue 4 and the return queue 5 are determined according to the number of memory cells 0 and the number of processing cells 1.

In one possible implementation, the memory access circuitry of embodiments of the present disclosure may be integrated into a processor chip for accessing a plurality of different memory locations 0 within the processor chip.

Wherein the processor chip comprises, for example: a central processing unit (Central Processing Unit, CPU), a graphics Processor (Graphic Processing Unit, GPU), a General-purpose graphics processing unit (General-Purpose Computing on Graphics Processing Units, GPGPU), a Multi-Core Processor (Multi-Core Processor), a digital signal processing unit (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a tensor Processor (Tensor Processing Unit, TPU), a field programmable gate array (Field Programmable Gate Array, FPGA), or other programmable logic device, as not limited by the present disclosure.

The memory unit 0 may include a random access memory (Random Access Memory, RAM) disposed inside the processor chip, such as a Dynamic RAM (DRAM), a Static RAM (Static RAM), a Synchronous DRAM (SDRAM), a cache memory (Cached DRAM, CDRAM), an enhanced DRAM (Extended Data Out DRAM, EDRAM), etc., and the present disclosure does not limit the type of the memory unit 0.

Illustratively, the multiple processing units 1 may be multiple computing cores of a multi-core processor chip, and may access different memory units 0 in the multi-core processor chip, so as to improve the efficiency of multi-core parallel processing.

Alternatively, the multiple processing units 1 may be multithreaded modules disposed in the same computing core in the multi-core processor chip, and may access different memory units 0 (may be memory units located in the computing core or memory units located outside the computing core) in the multi-core processor chip, so as to improve the efficiency of multithreaded parallel processing.

Alternatively, the processor chip internally comprises a plurality of processing units 1 and a plurality of scheduling modules, and different storage units 0 are deployed in different scheduling modules, and the scheduling modules can be used for executing scheduling tasks (including operation data scheduling and operation program scheduling). In this case, the scheduling module may be configured to receive request information (e.g., an operation instruction) of the processing unit 1, and provide the processing unit 1 with required resources (return data) according to the request information.

In one possible implementation, as shown in fig. 1, the memory access circuit is configured to access M different memory cells 0, and the memory access circuit includes: n processing units 1, N schedule selectors 2 corresponding to the N processing units 1, M arbiters 3 corresponding to the M storage units 0, one schedule queue 4, one return queue 5, one data queue 6, M, N being integers greater than 1.

The M memory units 0 may be the memory units 0_1 to 0_m, respectively, and the n processing units 1 may be the processing units 1_1 to 1_N, respectively.

The N schedule selectors 2 may be schedule selectors 2_1 to 2_N, respectively, that is: a schedule selector 2_1 corresponding to the processing unit 1_1, a schedule selector 2_2 corresponding to the processing unit 1_2, and so on, a schedule selector 2_N corresponding to the processing unit 1_N.

The M arbiters 3 may be respectively an arbiter 3_1 to an arbiter 3_M, that is: an arbiter 3_1 corresponding to the memory cell 0_1, an arbiter 3_2 corresponding to the memory cell 0_2, and so on, an arbiter 3_M corresponding to the memory cell 0_M.

As shown in fig. 1, the input end of any one of the schedule selectors 2 is connected to the output end of a corresponding one of the processing units 1, and the output ends of any one of the schedule selectors 2 are respectively connected to the input ends of M arbiters 3.

For example, the output end of the processing unit 1_1 is connected to the input end of the schedule selector 2_1, and the output end of the schedule selector 2_1 is respectively connected to the input end of the arbiter 3_1 to the input end of the arbiter 3_M; the output end of the processing unit 1_2 is connected with the input end of the scheduling selector 2_2, and the output end of the scheduling selector 2_2 is respectively connected with the input end of the arbiter 3_1 to the input end of the arbiter 3_M; similarly, the output of the processing unit 1_N is connected to the input of the schedule selector 2_N, and the output of the schedule selector 2_N is connected to the input of the arbiter 3_1 to the input of the arbiter 3_M, respectively.

In an example, the processing unit 1_1 may generate request information to access any one of the storage units 0_1 to 0_M, which may include a schedule identification for indicating the storage unit 0. For example, assume that there are scheduling identifications 1-M, scheduling identification 1 can be used to indicate memory location 0_1, scheduling identification 2 can be used to indicate memory location 0_2, and so on, scheduling identification M can be used to indicate memory location 0_M. By setting the scheduling identifier in the request information, the request information can be more accurately transmitted to the storage unit 0 indicated by the scheduling identifier.

In the example, assuming that M is 3, i.e., there are 3 memory cells 0, the schedule flag may be binary data having a bit width of 2 bits (bits), where the schedule flag is 00, indicating that memory cell 0_1 is accessed; the schedule flag is 01, indicating that the memory location 0_2 is accessed; schedule identifier 10 indicates access to storage unit 0_3. The present disclosure does not limit the bit width of the schedule identifier, and the bit width of the schedule identifier may be determined according to the number of the memory cells 0. For example, assuming that the number of storage units 0 is M and the bit width of the schedule flag is W, the bit width W of the schedule flag may be determined by solving the inequality 2^W +.M, where 2^W represents the W2 multiplications.

In an example, the request information may also include a request type (e.g., including a read request and/or a write request), an access address (e.g., an address of a read request, or an address of a write request), an enable signal (e.g., a signal that causes the current request information to have read and write rights for a certain memory location), etc., which is not particularly limited by the present disclosure.

In this case, the schedule selector 2_1 receives the request information transmitted from the processing unit 1_1, and can determine a target arbiter from the arbiters 3_1 to 3_M according to the indication of the schedule flag in the request information, and transmit the request information to the target arbiter, that is, the arbiter 3 corresponding to the memory unit 0 indicated by the schedule flag. For example, if the schedule selector 2_1 receives the request information including the schedule identifier 1, the request information may be transmitted to the arbiter 3_1 corresponding to the memory location 0_1 indicated by the schedule identifier 1; if the schedule selector 2_1 receives the request information including the schedule identifier 2, the request information may be sent to the arbiter 3_2 corresponding to the memory location 0_2 indicated by the schedule identifier 2; similarly, if the schedule selector 2_1 receives the request information including the schedule identifier M, the request information may be sent to the arbiter 3_M corresponding to the storage unit 0_M indicated by the schedule identifier M.

Similarly, the processing units 1_2 to 1_N may also generate request information for accessing any one of the storage units 0_1 to 0_M, the processing units 1_2 to 1_N may send the generated request information including the scheduling identifier to the corresponding scheduling selector 2_2 to 2_N, and the scheduling selectors 2_2 to 2_N may send the request information to the arbiter 3 corresponding to the storage unit 0 indicated by the scheduling identifier according to the indication of the scheduling identifier in the request information in response to the received request information, which is specifically referred to above and will not be described herein.

As shown in fig. 1, the output terminal of each arbiter 3 is connected to the input terminal of the corresponding memory cell 0, and the output terminals of M arbiters 3 are connected to the input terminal of the dispatch queue 4.

For example, the output terminal of the arbiter 3_1 is connected to the input terminal of the memory cell 0_1, the output terminal of the arbiter 3_2 is connected to the input terminal of the memory cell 0_2, and so on, the output terminal of the arbiter 3_M is connected to the input terminal of the memory cell 0_M. The output terminals of the arbiters 3_1 to 3_M are connected to the input terminal of the dispatch queue 4 in a one-to-one manner.

In an example, the arbiter 3_1 may perform a round robin arbitration (round robin) on different request information from the processing units 1_1 to 1_N accessing the storage unit 0_1, select one target request information of the current round from the N request information, and send the selected target request information to the storage unit 0_1. Similarly, the arbiters 3_2 to 3_M can also perform round robin arbitration (round robin) on the request information from the processing units 1_1 to 1_N, and send the request information to the corresponding storage units 0_2 to 0_M according to the respective arbitration orders, which is not described herein.

In the above-described process, each time any one of the arbiters 3_1 to 3_M transmits request information to the corresponding memory unit 0, the identification information of the processing unit 1 currently accessing the memory unit 0 may be recorded in the schedule queue 4. Since the schedule queue 4 may record the identification information once in each round of arbitration performed by the M arbiters 3, the schedule queue 4 may record the arbitration order of the M arbiters 3, that is, the request order of the N processing units 1 to access the M memory units 0, through multiple rounds of arbitration.

In an example, the dispatch queue 4 may include a multi-row queue, any one of which may correspond to a 1-bit (bit) unit, and the dispatch queue 4 records the order of requests by N processing units to access M memory units by marking the 1-bit units of different rows of different columns.

The number of columns of the scheduling queue 4 is determined according to the number N of processing units 1 and the number M of memory units 0, that is, m×n columns. Thus, any column of the dispatch queue 4 may record the order of requests by a different processing unit 1 to access a different memory unit 0. The number of lines (depth) of the dispatch queue 4 is related to the concurrency of the processing unit 1 (i.e., the number of times that the request information can be continuously issued without waiting for the return data of the request information), and the more the concurrency of the processing unit 1, the more the number of lines the dispatch queue 4 needs to be.

As shown in fig. 1, the output ends of the M storage units 0, the output ends of the dispatch queue 4, and the output ends of the return queue 5 are connected to the input ends of the data queue 6, and the output ends of the data queue 6 are connected to the input ends of the N processing units 1.

For example, the output end of the storage unit 0_1 to the output end of the storage unit 0_M, the output end of the dispatch queue 4, and the output end of the return queue 5 are connected to the input end of the data queue 6 in a one-to-one manner, and the output end of the data queue 6 is connected to the input end of the processing unit 1_1 to the input end of the processing unit 1_N in a one-to-many manner.

The number of columns of the return queues 5 is the same as the number M of the memory units 0, and the number of rows of the return queues 5 may be the same as the number of rows of the dispatch queues 4. Any column of any row in the return queue 5 may correspond to a 1-bit (bit) unit, and the return queue 5 may record the return order of the return data of M storage units 0 by marking the 1-bit units of different rows of different columns, where each row may correspond to a different return order of the return data of storage units 0.

The number of columns of the data queues 6 is the same as the number M of the memory units 0, and the number of rows of the data queues 6 may be the same as the number of rows of the schedule queue 4. Any column of any row of the data queue 6 may correspond to a multi-bit (bit) cell (e.g., 30 bits), which is not limiting of the present disclosure. Wherein each row may store return data for a different memory cell 0.

In an example, when any storage unit 0 responds to the respective received request information which can come from different processing units 1, the data queue 6 sends the return data, and the data queue 6 receives the return data of any storage unit 0, the return data can be stored in the positions indicated by the request sequence and the return sequence in the data queue 6 according to the indication of the request sequence recorded in the scheduling queue 4 and the return sequence recorded in the return queue 5, and the corresponding positions of the return queue 5 are marked for updating the return sequence recorded in the return queue 5.

Every time there is return data stored in the data queue 6, after the return queue 5 updates the return sequence synchronously, whether the return data of the first initiating request information has not been returned can be judged by querying whether the first line of the return queue 5 has a flag. In case the first line of the return queue 5 is marked, it is indicated that there is already return data of the first initiating request information stored in the data queue 6, and the return data can be fetched from the data queue 6 to the corresponding processing unit 1 according to the indication of the first line data of the dispatch queue 4.

The memory access circuit of the embodiment of the disclosure realizes that a plurality of processing units 1 access different memory units 0 (for example, including memory units deployed in different scheduling modules), and when the plurality of processing units 1 access the memory units 0, the request sequence of accessing M memory units by N processing units can be recorded through the scheduling queue 4; when the storage units 0 reply to the returned data, the returned data of each storage unit 0 can be stored in the data queue 6 according to the request sequence recorded by the scheduling queue 4 and the return sequence recorded by the return queue 5, and the returned data is taken out from the data queue 6 to the corresponding processing unit 1, so that the returned data of the request information sent by each processing unit 1 is cached in the data queue 6 first, and the returned data of the request information is waited for to be initiated first, which is favorable for the order-preserving reply of the returned data obtained by each processing unit 1 (namely, the corresponding returned data is received according to the order of the request information sent by each processing unit 1), and the accuracy and the stability of the memory access circuit are improved.

In addition, the memory access circuit in the embodiment of the disclosure does not need to configure a queue for recording the sequence of sending the request by the processing unit 1 to the different storage units 0 for each processing unit 1 and configure a queue for recording the sequence of accessing the different processing units 1 for each storage unit 0, and can complete the recording of accessing the different storage units by the different processing units only by scheduling the queue 4, thereby reducing the complexity of the memory access circuit and improving the applicability of the memory circuit. In addition, the return data of a plurality of processing units 1 can share one data queue 6, so that the problem that corresponding storage resources are set up due to the exclusive resources of one processing unit 1 is reduced.

The memory access circuit of the embodiment of the present disclosure will be described below with the number of memory cells 0 and the number of processing cells 1 each being 3 as an example. It should be understood that the number of the storage units 0 and the processing units 1 is not particularly limited in this disclosure, and may be set according to an actual application scenario.

Fig. 2 shows a schematic diagram of a memory access circuit according to an embodiment of the present disclosure, the memory access circuit as shown in fig. 2 being used to address 3 different processing units 1 accessing memory unit 0 in 3 different scheduling modules. Wherein, the 3 processing units 1 may be respectively: a process unit 1_1, a process unit 1_2, and a process unit 1_3; the 3 memory cells 0 may be respectively: a storage unit 0_1 disposed in the schedule module 1, a storage unit 0_2 disposed in the schedule module 2, and a storage unit 0_3 disposed in the schedule module 3.

As shown in fig. 2, the output end of the processing unit 1_1 is connected to the input end of the schedule selector 2_1, and the output end of the schedule selector 2_1 may be connected to the input end of the arbiter 3_1 to the input end of the arbiter 3_3 in a serial manner; the output end of the processing unit 1_2 is connected with the input end of the scheduling selector 2_2, and the output end of the scheduling selector 2_2 can be respectively connected with the input end of the arbiter 3_1 to the input end of the arbiter 3_3 in a continuous mode; the output end of the processing unit 1_3 is connected with the input end of the schedule selector 2_3, and the output end of the schedule selector 2_3 can be respectively connected with the input end of the arbiter 3_1 to the input end of the arbiter 3_3 in a serial manner.

In a possible implementation, the memory access circuit further comprises at least one first buffer 7, the output of each arbiter 3 being connected to the input of the corresponding memory cell 0 via at least one first buffer 7. The first buffer 7 is arranged in the memory access circuit, so that the driving force of the arbiter 3 for data transmission to the memory unit 0 is enhanced, and the probability of insufficient signal driving capability caused by overlong wiring in the wiring (floorplan) process is reduced. For example, in practical applications, in a scenario where the connection between the arbiter 3 and the memory unit 0 is relatively long, if the first buffer 7 is not provided, the signal may be attenuated along with the wiring on the chip (or the circuit board), which may result in a situation where the system frequency cannot be increased and the performance is degraded. By providing the first buffer 7 between the arbiter 3 and the memory unit 0, a stronger driving force can be provided for signal transmission, so that the request information issued by the arbiter 3 can be correctly transmitted to the memory unit 0.

Furthermore, the first buffer 7 is provided in the memory access circuit, and the data may be stored in the first buffer 7, so that a primary access request (for example, the request information after being arbitrated by the arbiter 3) is buffered, where the first buffer 7 and the arbiter 3 may perform data interaction in a Handshake (Handshake) manner.

As shown in fig. 2, the output terminal of the arbiter 3_1 is connected to the input terminal of the memory cell 0_1 through two first buffers 7, the output terminal of the arbiter 3_2 is connected to the input terminal of the memory cell 0_2 through two first buffers 7, and the output terminal of the arbiter 3_3 is connected to the input terminal of the memory cell 0_3 through two first buffers 7.

The output terminals of the arbiters 3_1 to 3_3 may be connected to the input terminal of the dispatch queue 4 in a one-to-one manner.

In a possible implementation manner, the scheduling queue 4 records the request sequence of accessing the plurality of storage units 0 by setting different rows of different columns of the plurality of processing units 1, where each column of the scheduling queue 4 is used to record the request sequence of accessing the different storage units 0 by the different processing units 1, and the row pointer is used to indicate the current row set in each column, where the row pointer is shared by multiple columns corresponding to the same storage unit 0 and performs self-adding operation according to the access times to the same storage unit 0. The setting operation may include setting 1, setting a preset mark, setting a preset letter, and the like, which is not limited in this disclosure.

In the example, the scheduling queue 4 includes m×n columns, each column includes a plurality of rows, the scheduling queue 4 records the request sequence of accessing M storage units 0 by the N processing units 1 at different row positions 1 of different columns, where each column is used to record the request sequence of accessing a different storage unit 0 by a different processing unit 1, N columns corresponding to the same storage unit 0 share a row pointer, where the row pointer may perform a self-adding operation according to the number of accesses to the same storage unit 0, and the row pointer is used to indicate the current row of 1 in each column. Thus, by setting one schedule queue 4 in the memory access circuit, the order of requests by the plurality of processing units 1 to access the plurality of memory units 0 can be recorded more efficiently and accurately.

In the memory access circuit of the embodiment of the present disclosure, there is no need to configure a queue for recording the order of requests sent by the processing unit 1 to the different storage units 0 for each processing unit 1, or to configure a queue for recording the order of accesses by the different processing units 1 for each storage unit 0, respectively. By setting a scheduling queue 4, the request sequence of a plurality of processing units for accessing a plurality of storage units can be recorded, the complexity of the memory access circuit is reduced, and the applicability of the memory access circuit is improved.

By way of example, fig. 3 shows a schematic diagram of a dispatch queue 4 according to an embodiment of the present disclosure, where the dispatch queue 4 may be a 6 row 9 column queue, and the dispatch queue 4 includes 6×9 cells, each cell may store 1 bit (bit) of information, as shown in fig. 3. It should be understood that fig. 3 only takes the example that the scheduling queue 4 is 6 rows and 9 columns, and the present disclosure does not limit the number of rows and columns of the scheduling queue 4, and may be set according to an actual application scenario.

As shown in fig. 3, in the schedule queue 4, the 1 st column is a request order for the recording process unit 1_1 to access the storage unit 0_1, the 2 nd column is a request order for the recording process unit 1_1 to access the storage unit 0_2, the 3 rd column is a request order for the recording process unit 1_1 to access the storage unit 0_3, the 4 th column is a request order for the recording process unit 1_2 to access the storage unit 0_1, the 5 th column is a request order for the recording process unit 1_2 to access the storage unit 0_2, the 6 th column is a request order for the recording process unit 1_2 to access the storage unit 0_3, the 7 th column is a request order for the recording process unit 1_3 to access the storage unit 0_1, the 8 th column is a request order for the recording process unit 1_3 to access the storage unit 0_2, and the 9 th column is a request order for the recording process unit 1_3 to access the storage unit 0_3.

It should be understood that each column is used to record the order of requests of a different processing unit 1 to access a different storage unit 0, the order between columns in fig. 3 is only an example, and other orders are possible, for example, in the scheduling queue 4, column 1 is used to record the order of requests of processing unit 1_1 to access storage unit 0_1, column 2 is used to record the order of requests of processing unit 1_2 to access storage unit 0_1, column 3 is used to record the order of requests of processing unit 1_3 to access storage unit 0_1, column 4 is used to record the order of requests of processing unit 1_1 to access storage unit 0_2, column 5 is used to record the order of requests of processing unit 1_2 to access storage unit 0_2, column 6 is used to record the order of requests of processing unit 1_3 to access storage unit 0_2, column 7 is used to record the order of requests of processing unit 1_1 to access storage unit 0_3, column 8 is used to record the order of requests of processing unit 1_2 to access storage unit 0_1, column 9 is used to record the order of requests of processing unit 1_2 to access storage unit 0_1, and this disclosure does not limit the order of requests of processing unit 1_2 to access storage unit 0_3. Wherein for a clearer description of the aspects of the present disclosure, the following description of the aspects is in terms of the rank-to-rank ordering illustrated in fig. 3.

In the example, a row pointer is used to indicate the current row of 1's in each column. When the processing unit 1_1 accesses the storage unit 0_1 deployed in the scheduling module 1, the 1 st column in the scheduling queue 4 and the wr_ptr1 row are set to 1; when the processing unit 1_1 accesses the storage unit 0_2 deployed in the scheduling module 2, the 2 nd column in the scheduling queue 4 is arranged with the wrptr 1 row 1; when the processing unit 1_1 accesses the storage unit 0_3 disposed in the scheduling module 3, the 3 rd column, wrptr 1 row 1 in the scheduling queue 4 is set; when the processing unit 1_2 accesses the storage unit 0_1 deployed in the scheduling module 1, the 4 th column in the scheduling queue 4 and the wrptr 2 row are set to 1; when the processing unit 1_2 accesses the storage unit 0_2 deployed in the scheduling module 2, the 5 th column in the scheduling queue 4 and the wr_ptr2 row are set to 1; when the processing unit 1_2 accesses the storage unit 0_3 deployed in the scheduling module 3, the 6 th column in the scheduling queue 4 and the wr_ptr2 row are set to 1; when the processing unit 1_3 accesses the storage unit 0_1 deployed in the scheduling module 1, the 7 th column in the scheduling queue 4 and the wrptr 3 row are set to 1; when the processing unit 1_3 accesses the storage unit 0_2 deployed in the scheduling module 2, the 8 th column in the scheduling queue 4 and the wr_ptr3 are arranged in row 1; when the processing unit 1_3 accesses the storage unit 0_3 deployed in the scheduling module 3, the 9 th column, wrptr 3 row 1, in the scheduling queue 4 is set.

Wherein, the 1 st column, the 4 th column and the 7 th column corresponding to the memory cell 0_1 share a row pointer wr_ptr1, and in response to one access of any one of the processing units 1_1 to 1_3 to the memory cell 0_1, the row pointer wr_ptr1 performs a self-adding 1 operation, that is: wrptr1=wrptr1+1; the row pointer wr_ptr1 is incremented according to the memory unit 0_1, for example, when the processing units 1_1 to 1_3 access the memory unit 0_1 at the same time, the wr_ptr1 of the memory unit 0_1 performs +1, +2, +3 operations according to 3 accesses.

The 2 nd, 5 th and 8 th columns of the memory cell 0_2 share a row pointer wr_ptr2, and in response to one access of any one of the processing units 1_1 to 1_3 to the memory cell 0_2, the row pointer wr_ptr2 performs a self-adding 1 operation, that is: wrptr2=wrptr2+1; the row pointer wr_ptr2 is incremented according to the memory unit 0_2, for example, when the processing units 1_1 to 1_3 access the memory unit 0_2 at the same time, the wr_ptr2 of the memory unit 0_2 performs +1, +2, +3 operations according to 3 accesses.

The 3 rd, 6 th and 9 th columns of the memory unit 0_3 share a row pointer wr_ptr3, and in response to an access to the memory unit 0_3 by any one of the processing units 1_1 to 1_3, the row pointer wr_ptr3 performs a self-adding 1 operation, that is: wrptr3=wrptr3+1; the row pointer wr_ptr3 is incremented according to the memory unit 0_3, for example, when the processing units 1_1 to 1_3 access the memory unit 0_3 simultaneously, the wr_ptr3 of the memory unit 0_3 performs +1, +2, +3 operations according to 3 accesses.

In a possible implementation, the memory access circuit further comprises at least one second buffer 8, the output of each memory cell 0 being connected to the input of the data queue 6 via at least one second buffer 8.

The second buffer 8 is arranged in the memory access circuit, so that the driving force of the memory unit 0 for data transmission to the data queue 6 is enhanced, and the probability of insufficient signal driving capability caused by overlong wiring in the wiring (floorplan) process is reduced. For example, in practical applications, in a scenario where the connection between the storage unit 0 and the data queue 6 is relatively long, if the second buffer 8 is not provided, the signal may be attenuated along with the trace on the chip (or the circuit board), which may cause the system frequency to be unable to be improved and the performance to be degraded. By providing the second buffer 8 between the storage unit 0 and the data queue 6, a stronger driving force can be provided for signal transmission, so that the return data sent by the storage unit 0 can be correctly transmitted to the data queue 6.

Furthermore, the second buffer 8 is provided in the memory access circuit, and the data may be stored in the second buffer 8, and the return data (for example, the return data sent by the storage unit 0 in response to the request information) may be buffered, where the second buffer 10 and the data queue 6 may perform data interaction in a Handshake (Handshake) manner.

As shown in fig. 2, the output terminal of the memory cell 0_1 to the output terminal of the memory cell 0_3 are connected to the input terminal of the data queue 6 via two second buffers 8, respectively, in a one-to-one manner.

The output end of the return queue 5 and the output end of the dispatch queue 4 are also connected to the input end of the data queue 6, and the output ends of the data queue 6 are connected to the input ends of the processing units 1_1 to 1_3, respectively, in a serial manner.

In a possible implementation, each column of the return queue 5 records the return order of the corresponding storage unit 0. For example, the return queue 5 includes M columns, each of which records a return order of a corresponding memory cell 0. The return sequence of the return data of all the storage units 0 can be recorded by setting one return queue 5, and a queue for recording the sequence is not required to be set for each storage unit 0, so that the complexity of the system can be reduced, and the resource utilization rate can be improved.

By way of example, fig. 4 shows a schematic diagram of a return queue 5 according to an embodiment of the present disclosure, as shown in fig. 4, the return queue 5 may be a 6-row 3-column queue, the return queue 5 including 6×3 cells, each cell storing 1 bit (bit) of information. It should be understood that fig. 4 only takes the return queue 5 as 6 rows and 3 columns as an example, and the number of rows and columns of the return queue 5 are not limited in this disclosure, and may be set according to an actual application scenario.

The return queue 5 may be used to record the return order of the return data of the storage units 0_1 to 0_3, wherein column 1 may be used to record the return order of the return data of the storage unit 0_1, column 2 may be used to record the return order of the return data of the storage unit 0_2, and column 3 may be used to record the return order of the return data of the storage unit 0_3.

In a possible implementation, each column of the data queue 6 is configured to store the return data of a corresponding memory unit 0, for example, the data queue 6 includes M columns, each of which is configured to store the return data of a corresponding memory unit.

The data queue 6 is configured to: in response to receiving the return data of any storage unit, taking out a plurality of columns (for example, N columns) of data corresponding to the storage unit from the dispatch queue 4, performing or operation (for example, bit-wise or operation) to obtain first data, and taking out a column of data recording the return sequence of the storage unit from the return queue 5, performing inverse operation (for example, bit-wise inverse operation) to obtain second data; performing AND operation (such as bit AND operation) on the first data and the second data to obtain third data; in the corresponding column of the memory cell of the data queue 6, the return data is written to the write location indicated by the third data.

By way of example, fig. 5 shows a schematic diagram of a data queue 6 according to an embodiment of the present disclosure, as shown in fig. 5, the data queue 6 may be a 6 row 3 column queue, the data queue 6 including 6×3 cells, each cell storing 30 bits (bits) of information. It should be understood that, in fig. 5, taking the data queue 6 as 6 rows and 3 columns, and each unit stores 30 bits of data as an example, the present disclosure is not limited to the number of rows and columns of the data queue 6, and the bit width of the maximum storable data per unit, and may be set according to the actual application scenario.

Wherein, column 1 may be used to store the return data of storage unit 0_1 deployed in dispatch module 1, column 2 may be used to store the return data of storage unit 0_2 deployed in dispatch module 2, and column 3 may be used to store the return data of storage unit 0_3 deployed in dispatch module 3.

When the storage unit 0_1 deployed in the scheduling module 1 sends the return data to the data queue 6 in response to the request information, in this case, the data queue 6 may fetch the data in the 1 st, 4 th and 7 th columns of the scheduling queue 4, perform or operation according to bits, to obtain the first data D11, and fetch the data in the 1 st column of the return queue 5, perform inverse operation according to bits, to obtain the second data D12. Then, the first data D11 and the second data D12 are bitwise and operated to obtain third data D13. Then, a 1-seeking operation (loading one) is performed on the third data D13, the line information of the return data stored in the data queue 6 is calculated, and the return data is written into the corresponding line of the 1 st column of the data queue 6.

Here, a seek 1 operation (seek one) indicates that the first 1 position of the third data D13 is found from the lower order upward for the third data D13. For example, assuming that the third data D13 is 000110, the first 1 is at position 2, the return data may be written to column 1, row 2 in the data queue 6.

When the storage unit 0_2 deployed in the scheduling module 2 sends the return data to the data queue 6 in response to the request information, in this case, the data queue 6 may fetch the data in the 2 nd, 5 th and 8 th columns of the scheduling queue 4, perform or operation according to bits, to obtain the first data D21, and fetch the data in the 2 nd column of the return queue 5, perform inverse operation according to bits, to obtain the second data D22. Then, the first data D21 and the second data D22 are bitwise and operated to obtain third data D23. Then, a 1-seeking operation (ranging one) is performed on the third data D23, the line information of the return data stored in the data queue 6 is calculated, and the return data is written into the corresponding line of the 2 nd column of the data queue 6.

Here, a seek 1 operation (seek one) means that the first 1 position of the third data D23 is found from the lower order upward for the third data D23. For example, assuming that the third data D23 is 000110 and the position of the first 1 is 2, the return data may be written to the 2 nd column and 2 nd row in the data queue 6.

When the storage unit 0_3 disposed in the scheduling module 3 sends the return data to the data queue 6 in response to the request information, in this case, the data queue 6 may fetch the data in the 3 rd, 6 th and 9 th columns of the scheduling queue 4, perform or operation according to bits, to obtain the first data D31, and fetch the data in the 3 rd column of the return queue 5, perform inverse operation according to bits, to obtain the second data D32. Then, the first data D31 and the second data D32 are bitwise and operated to obtain third data D33. Then, a 1-seeking operation (ranging one) is performed on the third data D33, the line information of the return data stored in the data queue 6 is calculated, and the return data is written into the corresponding line of the 3 rd column of the data queue 6.

Here, a seek 1 operation (seek one) means that the first 1 position of the third data D33 is found from the lower order upward for the third data D33. For example, assuming that the third data D33 is 000110, the first 1 is at position 2, the return data may be written to column 3, row 2 in the data queue 6.

In this way, the storage location of the current return data in the data queue 6 can be more efficiently and accurately determined, facilitating ordering of the return data in the data queue 6 in the order of the requests.

In one possible implementation, the return queue 5 is configured to: in response to writing the return data to the write location indicated by the third data, the write location indicated by the third data is set in the corresponding column of the memory cells of the return queue 5. In this way, the return order of the return data of each storage unit 0 can be recorded more efficiently and accurately.

For example, assuming return data written in the ith row and jth column in the return queue 5, in response to writing the return data to the ith row and jth column, 1 is placed in the ith row and jth column of the return queue 5, where i, j may be an integer greater than 1, this is not limiting of the present disclosure. The initialization queue of the return queue 5 may be an all 0 queue, which indicates that no return data to be returned to each processing unit 1 has yet occurred in the memory access circuit. The setting operation may include setting 1, setting a preset mark, setting a preset letter, and the like, which is not limited in this disclosure.

In one possible implementation, the data queue 6 is configured to: obtaining a return sequence result of the return queue 5 in response to the return data being written into the data queue 6, wherein the return sequence result is obtained by performing or operation (e.g., bit wise or operation) on top line data of the return queue 5; and in the case that the return sequence result is a preset result (for example, 1), according to the indication of the first line information of the scheduling queue 4, respectively sending the return data of the first line in the data queue 6 to the corresponding processing unit 1. It should be understood that the content of the set operation is different, and the preset result is also different, and in the case where the set operation is a set 1 operation, the preset result may be set to 1.

In an example, each time there is return data written into the data queue 6, the data in the first row in the return queue 5 may be bit-wise or operated on, resulting in a return order result. If the return order result is 1, it indicates that there is return data, and the return data may be returned to the corresponding processing unit 1. If the return sequence result is 0, it indicates that the return data corresponding to the processing unit 1 that initiated the request information first has not yet been returned, and it is necessary to continue waiting.

When the data of the 1 st row in the return queue 5 is bitwise or operated and the return sequence result is 1, the return data may be sent to the corresponding processing unit 1 according to the data cases of the 1 st row 1 to the 9 st column in the dispatch queue 4 (see fig. 3), for example, as follows:

if the 1 st row and 1 st column of the dispatch queue 4 are 1, the return data of the 1 st row and 1 st column of the data queue 6 is sent to the processing unit 1_1.

If the 1 st row and 4 th column of the dispatch queue 4 are 1, the return data of the 1 st row and 1 st column of the data queue 6 is sent to the processing unit 1_2.

If the 1 st row and 7 st column of the dispatch queue 4 are 1, the return data of the 1 st row and 1 st column of the data queue 6 is sent to the processing unit 1_3.

If the 1 st row and 2 nd column of the dispatch queue 4 are 1, the return data of the 1 st row and 2 nd column of the data queue 6 is sent to the processing unit 1_1.

If the 1 st row and 5 th column of the dispatch queue 4 are 1, the return data of the 1 st row and 2 nd column of the data queue 6 is sent to the processing unit 1_2.

If the 1 st row and 8 th column of the dispatch queue 4 are 1, the return data of the 1 st row and 2 nd column of the data queue 6 is sent to the processing unit 1_3.

If the 1 st row and 3 rd column of the dispatch queue 4 are 1, the return data of the 1 st row and 3 rd column of the data queue 6 is sent to the processing unit 1_1.

If the 1 st row and the 6 st column of the dispatch queue 4 are 1, the return data of the 1 st row and the 3 rd column of the data queue 6 are sent to the processing unit 1_2.

If the 1 st row and 9 th column of the dispatch queue 4 are 1, the return data of the 1 st row and 3 rd column of the data queue 6 is sent to the processing unit 1_3.

By the mode, the return data of the first initiated request information is sent to the storage units 0, the return data returned by each storage unit 0 is sequentially transferred to the corresponding processing unit 1, and the accuracy and stability of the memory access circuit are further improved.

In one possible implementation manner, after the first returned data in the data queues 6 are sent to the corresponding processing units 1, the data queues 6 are used for: in the data queues 6, return data sent to the first line in the data queues 6 of the corresponding processing unit 1 are dequeued.

The scheduling queue 4 is configured to: dequeuing, in the dispatch queue 4, first line data of a plurality of columns (for example, N columns) corresponding to the storage unit 0 that transmitted the return data;

the return queue 5 is configured to: in the return queue 5, the first row data of the corresponding column of the storage unit 0 that transmitted the return data is dequeued.

In an example, if the return data of the 1 st row and 1 st column in the data queue 6 are respectively sent to the respective corresponding processing units 1, the 1 st column, the 4 th column, the 7 th column and the 1 st column of the return queue 5 in the dispatch queue 4 may be shifted down by 1 bit, and the 1 st column of the data queue 6 may be shifted down by one bit.

If the return data of the 1 st row and the 2 nd column in the data queue 6 are respectively sent to the corresponding processing units 1, the 2 nd column, the 5 th column, the 8 th column and the 2 nd column of the return queue 5 of the scheduling queue 4 can be shifted down by 1 bit, and the 2 nd column of the data queue 6 can be shifted down by one bit.

If the return data of the 1 st row and 3 rd column in the data queue 6 are respectively sent to the corresponding processing units 1, the 3 rd column, 6 th column, 9 th column of the scheduling queue 4 and the 3 rd column of the return queue 5 can be shifted down by 1 bit, and the 3 rd column of the data queue 6 can be shifted down by one bit.

In this way, the garbage data (e.g. the return data that has been sent to each processing unit, the request order and the return order associated with the return data that has been sent to each processing unit 1) in the dispatch queue 4, the return queue 5, the data queue 6 can be cleaned up in order to provide more storage space for new return data, improving the efficiency of the memory access circuit and the resource utilization.

The memory access circuit of the embodiment of the present disclosure is described below with a specific example.

Assuming that at time T0, processing unit 1_1 accesses storage unit 0_1 disposed in scheduling module 1, processing unit 1_2 also accesses storage unit 0_1 disposed in scheduling module 1, and processing unit 1_3 accesses storage unit 0_3 disposed in scheduling module 3; at time T1, the processing unit 1_1 accesses the storage unit 0_1 disposed in the scheduling module 1.

At time T0, the arbiter 3_1 corresponding to the memory unit 0_1 may receive the request information from the processing unit 1_1 and the request information from the processing unit 1_2 at the same time, and the arbiter 3_1 performs a round robin arbitration (round robin) on the request information from the processing unit 1_1 and the request information from the processing unit 1_2, and selects to send the request information from the processing unit 1_1 to the memory unit 0_1; in synchronization, the arbiter 3_3 corresponding to the memory unit 0_3 receives a request message from the processing unit 1_3 and sends the request message from the processing unit 1_3 to the memory unit 0_3;

in this case, the write situation of the dispatch queue 4 is: the schedule queue 4 column 1, row 1 becomes 1 and column 9, row 1 becomes 1. Row pointer wr_ptr1=wr_ptr1+1 of the memory cell 0_1; the row pointer wr_ptr3=wr_ptr3+1 of the storage unit 0_3.

At time T1, the arbiter 3_1 corresponding to the memory unit 0_1 receives the request information from the processing unit 1_1, and the arbiter 3_1 performs a round robin (round robin) on the request information from the processing unit 1_1 and the request information from the processing unit 1_2 left at time T0, and selects to send the request information from the processing unit 1_2 to the memory unit 0_1;

in this case, the write situation of the dispatch queue 4 is: column 4, row 2 of the dispatch queue 4 becomes 1 and the row pointer wr_ptr1=wr_ptr1+1 of the memory location 0_1.

At time T2, the arbiter 3_1 does not receive new request information, and the arbiter 3_1 will send the request information from the processing unit 1_1 left at time T1 to the storage unit 0_1;

in this case, the write situation of the dispatch queue 4 is: column 1, row 3 of the dispatch queue 4 becomes 1 and the row pointer wr_ptr1=wr_ptr1+1 of the memory location 0_1. The updated 1 setting condition of the dispatch queue 4 is shown in fig. 3.

When the data queue 6 receives the return data sent from the storage unit 0_1 disposed in the scheduling module 1, the 1 st, 4 th and 7 th columns of the corresponding storage unit 0_1 in the scheduling queue 4 are fetched, namely: 000101, 000010, 000000, performing bit-wise or operation 000101|000010|000000, and acquiring the first data at this time is 000111. And, fetch from column 1 of the corresponding memory cell 0_1 in the return queue 5, namely: 000000, the 1 st sequence of the return queue 5 is reversed by bit to 000000, and the second data is 111111. And performing bit-wise AND operations 000111&111111 on the first data 000111 and the second data 111111 to obtain third data 000111. Then, a 1-seeking operation (ranging one) is performed on the third data 000111, and if the first 1 is found to be 1, the received return data is written in the data queue 6 at the 1 st column position 1 (i.e., 1 st row) corresponding to the storage unit 0_1, and the 1 st column 1 st row of the return queue 5 is marked as 1, and the 1 st column of the return queue 5 becomes 000001.

When the data queue 6 receives the return data sent from the storage unit 0_3 disposed in the scheduling module 3, the 3 rd, 6 th and 9 th columns of the corresponding storage units 0_3 in the scheduling queue 4 are fetched, that is: 000000, 000001, and bit pressing or operation 000000|000000|000001 is performed, and the first data acquired at this time is 000001. And, fetch from column 3 of the corresponding storage unit 0_3 in the return queue 5, namely: 000000, the 3 rd sequence of the return queue 5 is subjected to the bit inversion operation to 000000, so that the second data is 111111. The first data 000001 and the second data 111111 are bitwise and operated 000001&111111, resulting in the third data 000001. Then, a 1-seeking operation (ranging one) is performed on the third data 000001, and if the first 1 is found to be 1, the received return data is written into the data queue 6 at the 3 rd column position 1 (i.e., 1 st row) corresponding to the storage unit 0_3, and the 3 rd column 1 st row of the return queue 5 is marked as 1, and the 3 rd column of the return queue 5 is changed to 000001.

When the data queue 6 receives the return data transmitted from the storage unit 0_1 disposed in the scheduling module 1 again, the first data is 000111 and the second data is 111110 (the 1 st order of the return queue 5 performs the bit inversion operation-000001). The first data 000111 and the second data 111110 are bitwise and operated on 000111&111110 to obtain third data 000110. Then, a 1-seeking operation (ranging one) is performed on the third data 000110, and if the position of the first 1 is found to be 2, the received return data is written into the data queue 6 at the position 2 (i.e., the 2 nd row) of the 1 st column corresponding to the storage unit 0_1, and the 1 st column and the 2 nd row of the return queue 5 are marked as 1, and the 1 st column of the return queue becomes 000011.

When the data queue 6 receives the return data transmitted from the storage unit 0_1 disposed in the scheduling module 1 again, the first data is 000111 and the second data is 111100 (the 1 st order of the return queue 5 performs the bit inversion operation-000011). The first data 000111 and the second data 111100 are bitwise and operated on 000111&111100 to obtain third data 000100. Then, a 1-seeking operation (ranging one) is performed on the third data 000100, and if the position of the first 1 is found to be 3, the received return data is written into the data queue 6 at the position 3 (i.e., the 2 nd row) of the 1 st column corresponding to the storage unit 0_1, and the 1 st column and the 3 rd row of the return queue 5 are marked as 1, and the 1 st column of the return queue 5 becomes 000111.

In the above process, every time return data is written into the data queue 6, 1-setting recording is performed at the corresponding position in the return queue 5 synchronously, at this time, the number 101 in the first row in the return queue 5 is bit-wise or operated by 1|0|1, so as to obtain a return sequence result 1, which indicates that there is return data in the data queue 6, and the return data stored in the 1 st row in the data queue 6 can be sent to the corresponding processing unit 1 according to the 1 st row data 100000001 of the dispatch queue 4. In the 1 st row data 100000001 of the dispatch queue 4, the 1 st column data is 1, and the 1 st column data of the 1 st row in the data queue 6 may be sent to the processing unit 1_1; column 9 data is 1, and return data of column 3 of row 1 in the data queue 6 may be sent to the processing unit 1_3.

Then, in sending the return data of the 1 st row and 1 st column in the data queue 6 to the processing unit 1_1, the 1 st column, the 4 st column, the 7 st column in the dispatch queue 4 and the 1 st column in the return queue 5 may be shifted down by 1 bit, and the 1 st column in the data queue 6 may be shifted down by one bit. In sending the return data of the 1 st row and 3 rd column in the data queue 6 to the processing unit 1_3, the 3 rd column, 6 th column, 9 th column of the dispatch queue 4 and the 3 rd column of the return queue 5 may be shifted down by 1 bit, and the 3 rd column of the data queue 6 may be shifted down by one bit.

In summary, the memory access circuit of the embodiment of the present disclosure realizes that the plurality of processing units 1 access different memory units 0 (for example, including memory units disposed in different scheduling modules), and when the plurality of processing units 1 access the memory units 0, the request sequence of accessing the M memory units by the N processing units may be recorded through the scheduling queue 4; when the storage units 0 reply to the returned data, the returned data of each storage unit 0 can be stored in the data queue 6 according to the request sequence recorded by the scheduling queue 4 and the return sequence recorded by the return queue 5, and the returned data is taken out from the data queue 6 to the corresponding processing unit 1, so that the returned data of the request information sent by each processing unit 1 is cached in the data queue 6 first, and the returned data of the request information is waited for to be initiated first, which is favorable for the order-preserving reply of the returned data obtained by each processing unit 1 (namely, the corresponding returned data is received according to the order of the request information sent by each processing unit 1), and the accuracy and the stability of the memory access circuit are improved.

In addition, the memory access circuit of the disclosed embodiment does not need to configure a queue for recording the sequence of requests sent by the processing unit 1 to the different storage units 0 for each processing unit 1 and a queue for recording the sequence of accesses of the different processing units 1 for each storage unit 0, and can complete the recording of accesses of the different processing units 1 to the different storage units 0 only by scheduling the queue 4, thereby reducing the complexity of the memory access circuit and providing the applicability of the memory access circuit. In addition, the returned data of the plurality of processing units 1 can share one data queue 6, so that the exclusive resources of one processing unit 1 are reduced and corresponding storage resources are set up.

Fig. 6 illustrates a flowchart of a memory access method applied to a plurality of processing units 1 of a memory access circuit for accessing a different memory unit 0 as shown in fig. 1 to access a plurality of memory unit 0 portions, according to an embodiment of the present disclosure, the memory access circuit comprising: a schedule queue 4, a plurality of processing units 1, a schedule selector 2 corresponding to each processing unit 1, and an arbiter 3 corresponding to each storage unit 0; the input end of any scheduling selector 2 is connected with the output end of the corresponding processing unit 1, the output end of any scheduling selector 2 is connected with the input end of each arbiter 3, and the output end of each arbiter 3 is connected with the input end of the corresponding storage unit 0 and the input end of the scheduling queue 4.

As shown in fig. 6, the memory access method includes: in step S11, each processing unit 1 transmits the generated request information for accessing the storage unit 0 to the corresponding schedule selector 2.

In step S12, the schedule selector 2 transmits the request information received from the processing unit 1 to the arbiter 3 of the storage unit 0 indicated by the request information.

In step S13, the arbiter 3 arbitrates the request information from the processing unit 1, and sends the request information to the corresponding storage unit 0 according to the arbitration order;

in step S14, the schedule queue 4 records the request order of the processing unit 1 to access the memory unit 0 in the arbitration order.

In one possible implementation, step S14 may include: the scheduling queue 4 records the request sequence of a plurality of processing units 1 for accessing a plurality of storage units 0 by setting different rows in different columns, wherein each column of the scheduling queue 4 is used for recording the request sequence of the different processing units 1 for accessing the different storage units 0, a row pointer is shared by a plurality of columns corresponding to the same storage unit 0, the row pointer can perform self-adding operation according to the access times of the same storage unit 0, and the row pointer is used for indicating the current row set in each column.

In a possible implementation, the memory access circuit further comprises at least one first buffer, through which the output of each arbiter 3 is connected to the input of the corresponding memory cell 0.

Fig. 7 shows a flowchart of another memory access method according to an embodiment of the present disclosure, which is applied to a plurality of memory units 0 of a memory access circuit for accessing different memory units 0 as shown in fig. 1, which returns a return data portion in response to an access request of a processing unit 1, the memory access circuit comprising: dispatch queue 4, return queue 5, data queue 6, and multiple processing units 1; the output end of each storage unit 0, the output end of the dispatch queue 4 and the output end of the return queue 5 are connected with the input end of the data queue 6, and the output end of the data queue 6 is connected with the input end of each processing unit 1.

The memory access method comprises the following steps: in step S15, the data queue 6 stores the return data of the storage unit 0 according to the request order recorded by the schedule queue 4 and the return order of the return data of the storage unit 0 recorded by the return queue 5; in step S16, return data is fetched from the data queue 6 to the corresponding processing unit 1.

In one possible implementation, each column of the return queue 5 is used to record the return sequence of the corresponding storage unit 0, and each column of the data queue 6 is used to store the return data of the corresponding storage unit 0, and step S15 may include: the data queue 6 responds to the received return data of any storage unit 0, takes out a plurality of columns of data corresponding to the storage unit 0 from the scheduling queue 4, performs or operation to obtain first data, and takes out a column of data recording the return sequence of the storage unit 0 from the return queue 5 to perform inverse operation to obtain second data; performing AND operation on the first data and the second data to obtain third data; writing the return data in a write location indicated by the third data in a corresponding column of the memory cell 0 of the data queue 6; in response to writing the return data to the write location indicated by the third data, the write location indicated by the third data is set in the corresponding column of the memory cell 0 of the return queue 5.

In one possible implementation, step S16 may include: obtaining a return sequence result of the return queue 5, wherein the return sequence result is obtained by performing or operation on first line data of the return queue 5; and under the condition that the return sequence result is a preset result, according to the indication of the first line information of the scheduling queue 4, respectively sending the return data of the first line in the data queue 6 to the corresponding processing unit 1.

In one possible implementation manner, after sending the return data of the first line in the data queue 6 to the corresponding processing unit 1, the memory access method further includes: dequeuing, in the data queue 6, return data sent to a first line in the data queue 6 of the corresponding processing unit 1; dequeuing, in the dispatch queue 4, a plurality of first-row data of columns corresponding to the storage unit 0 that sent the return data; in the return queue 5, the first row data of the corresponding column of the storage unit 0 that transmitted the return data is dequeued.

In a possible implementation, the memory access circuit further comprises at least one second buffer, the output of each memory cell 0 being connected to the input of the data queue 6 via at least one second buffer.

It will be appreciated that the above-mentioned method embodiments of the present disclosure may be combined with each other to form a combined embodiment without departing from the principle logic, and are limited to the description of the present disclosure. It will be appreciated by those skilled in the art that in the above-described methods of the embodiments, the particular order of execution of the steps should be determined by their function and possible inherent logic.

In addition, the disclosure further provides an integrated circuit, an electronic device, and a computer program product, which encapsulate the memory access circuit, and any of the foregoing may be used to implement any of the memory access methods provided in the disclosure, and corresponding technical schemes and descriptions and corresponding descriptions referring to method parts are not repeated.

Embodiments of the present disclosure also provide an integrated circuit including a memory access circuit as described above.

The disclosed embodiments also propose an electronic device comprising a memory access circuit as described above. The electronic device may be provided as a terminal, server or other form of device. For example, the electronic device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a personal digital assistant (Personal Digital Assistant, PDA), a handheld device, a computing device, an in-vehicle device, a wearable device, etc., which is not limited by the present disclosure.

Embodiments of the present disclosure also provide a computer program product comprising computer readable code, or a non-transitory computer readable storage medium carrying computer readable code, which when run in a processor of an electronic device, performs the above method.

The computer program product may be realized in particular by means of hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.

The foregoing description of various embodiments is intended to highlight differences between the various embodiments, which may be the same or similar to each other by reference, and is not repeated herein for the sake of brevity.

It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.

The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A memory access circuit, comprising: the memory access circuit is used for accessing different memory units, and comprises: the system comprises a dispatch queue, a return queue, a data queue, processing units, a dispatch selector corresponding to each processing unit and an arbiter corresponding to each storage unit;

the input end of any scheduling selector is connected with the output end of the corresponding processing unit, the output end of any scheduling selector is connected with the input end of each arbiter, and the scheduling selector is used for sending the request information received from the processing unit to the arbiter of the storage unit indicated by the request information;

the output end of each arbiter is connected with the input end of the corresponding storage unit and the input end of the scheduling queue, the arbiter is used for arbitrating the request information from the processing unit, the request information is sent to the storage unit according to the arbitration sequence, and the scheduling queue is used for recording the request sequence of the processing unit for accessing the storage unit according to the arbitration sequence;

the output end of each storage unit, the output end of the scheduling queue and the output end of the return queue are connected with the input end of the data queue, the output end of the data queue is connected with the input end of each processing unit, the return queue is used for recording the return sequence of the return data of the storage units, the data queue is used for storing the return data of the storage units according to the request sequence and the indication of the return sequence, and the return data is taken out from the data queue to the corresponding processing unit.

2. The memory access circuit of claim 1 wherein the dispatch queue records the order of requests by a plurality of processing units to access a plurality of memory units by setting different rows in different columns,

each column of the scheduling queue is used for recording the request sequence of different processing units for accessing different storage units, a plurality of columns corresponding to the same storage unit share a line pointer, the line pointer can perform self-adding operation according to the access times to the same storage unit, and the line pointer is used for indicating the current line set in each column.

3. The memory access circuit of claim 1, wherein each column of the return queue records a return order of a corresponding memory cell.

4. A memory access circuit according to claim 2 or 3, wherein each column of the data queue is for storing return data for a corresponding memory cell,

the data queue is used for:

in response to receiving the returned data of any storage unit, taking out a plurality of columns of data corresponding to the storage unit from the scheduling queue, performing OR operation to obtain first data, and

taking out a row of data for recording the return sequence of the storage unit from the return queue, and performing inverting operation to obtain second data;

Performing AND operation on the first data and the second data to obtain third data;

and writing the return data into a writing position indicated by the third data in a corresponding column of the storage unit of the data queue.

5. The memory access circuit of claim 4, wherein the return queue is to: in response to writing the return data to the write location indicated by the third data, the write location indicated by the third data is set in a corresponding column of the storage unit of the return queue.

6. The memory access circuit of claim 5, wherein the data queue is to:

obtaining a return sequence result of the return queue, wherein the return sequence result is obtained by performing or operation on first line data of the return queue;

and under the condition that the return sequence result is a preset result, respectively sending the return data of the first line in the data queue to the corresponding processing unit according to the indication of the first line information of the scheduling queue.

7. The memory access circuit of claim 6 wherein after sending the return data of the first row in the data queue to the corresponding processing unit,

The data queue is used for: dequeuing return data sent to the first line in the data queue of the corresponding processing unit in the data queue;

the scheduling queue is used for: in the scheduling queue, dequeuing the head line data of a plurality of columns corresponding to the storage unit which sends the return data;

the return queue is used for: and dequeuing the first row data of the corresponding column of the storage unit which sends the return data in the return queue.

8. A memory access circuit according to any of claims 1-3, further comprising at least one first buffer, at least one second buffer, the output of each arbiter being connected to the input of a corresponding memory cell through the at least one first buffer, the output of each memory cell being connected to the input of the data queue through the at least one second buffer.

9. A memory access method, wherein the memory access method is applied to a memory access circuit for accessing different memory cells, the memory access circuit comprising: the system comprises a scheduling queue, a plurality of processing units, a scheduling selector corresponding to each processing unit and an arbiter corresponding to each storage unit; the input end of any scheduling selector is connected with the output end of a corresponding processing unit, the output end of any scheduling selector is connected with the input end of each arbiter, the output end of each arbiter is connected with the input end of a corresponding storage unit and the input end of the scheduling queue, and the memory access method comprises the following steps:

Each processing unit sends the generated request information for accessing the storage unit to a corresponding scheduling selector;

the scheduling selector transmits the request information received from the processing unit to an arbiter of a storage unit indicated by the request information;

the arbiter arbitrates the request information from the processing units and sends the request information to the corresponding storage units according to the arbitration sequence;

the scheduling queue records the request sequence of the processing unit for accessing the storage unit according to the arbitration sequence.

10. The memory access method according to claim 9, wherein the scheduling queue records the request order of the processing units to access the memory units in the arbitration order, comprising:

the dispatch queue records the order of requests by multiple processing units to access multiple storage units by setting in different rows of different columns,

11. A memory access method, wherein the memory access method is applied to a memory access circuit for accessing different memory cells, the memory access circuit comprising: scheduling queue, return queue, data queue, multiple processing units; the output end of each storage unit, the output end of the scheduling queue and the output end of the return queue are connected with the input end of the data queue, the output end of the data queue is connected with the input end of each processing unit, and the memory access method comprises the following steps:

the data queue stores the returned data of the storage unit according to the request sequence recorded by the scheduling queue and the returned sequence of the returned data of the storage unit recorded by the return queue;

and taking out the returned data from the data queue to the corresponding processing unit.

12. The memory access method of claim 11, wherein each column of the return queue is configured to record a return order of a corresponding memory location, each column of the data queue is configured to store return data of the corresponding memory location,

the data queue stores the return data of the storage unit according to the request sequence recorded by the scheduling queue and the return sequence of the return data of the storage unit recorded by the return queue, and the data queue comprises:

The data queue responds to the received return data of any storage unit, fetches the multi-column data corresponding to the storage unit from the dispatch queue, performs or operation to obtain first data, and

writing the return data into a writing position indicated by the third data in a corresponding column of the storage unit of the data queue;

in response to writing the return data to the write location indicated by the third data, the write location indicated by the third data is set in a corresponding column of the storage unit of the return queue.

13. The memory access method of claim 11, wherein the fetching return data from the data queue to the corresponding processing unit comprises:

14. An integrated circuit comprising the memory access circuit of any of claims 1-8.

15. An electronic device comprising the memory access circuit of any one of claims 1-8.