CN109710309B - Method for reducing memory bank conflict - Google Patents

Method for reducing memory bank conflict Download PDF

Info

Publication number
CN109710309B
CN109710309B CN201811580102.XA CN201811580102A CN109710309B CN 109710309 B CN109710309 B CN 109710309B CN 201811580102 A CN201811580102 A CN 201811580102A CN 109710309 B CN109710309 B CN 109710309B
Authority
CN
China
Prior art keywords
data
memory
bank
banks
shift
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811580102.XA
Other languages
Chinese (zh)
Other versions
CN109710309A (en
Inventor
丘正前
孙锦鸿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ARM Technology China Co Ltd
Original Assignee
ARM Technology China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ARM Technology China Co Ltd filed Critical ARM Technology China Co Ltd
Priority to CN201811580102.XA priority Critical patent/CN109710309B/en
Publication of CN109710309A publication Critical patent/CN109710309A/en
Priority to PCT/CN2019/126552 priority patent/WO2020135209A1/en
Application granted granted Critical
Publication of CN109710309B publication Critical patent/CN109710309B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode

Abstract

The invention relates to a method for reducing memory bank conflict, which comprises the steps of determining a plurality of binary addresses of data which can generate memory bank conflict when a memory containing N memory banks is accessed in parallel; determining a shift step, representing the number of banks in which data in a bank is moved, as a value represented by N bits selected from a binary address of the data in the memory, where N ^ 2^ N; and after the data in the memory is moved by the shift step length, the data with the memory bank conflict is no longer positioned in the same memory bank; and moving the data in the memory according to the shifting step, wherein the row address value of the data in the memory bank is unchanged. The invention ensures that data which possibly conflict during access is not positioned in the same memory bank any more, reduces the memory bank conflict and improves the performance of the memory.

Description

Method for reducing memory bank conflict
Technical Field
The invention relates to a method for reducing memory bank conflict, and belongs to the technical field of memories.
Background
Recently, with the increasing demand for the ability of processors to process data, some processors have been designed to access memory in parallel. In the case of a memory having multiple memory banks (banks), the processor can access the multiple banks simultaneously in parallel without conflict.
However, in some cases, the processors may need to access data stored in the same memory bank in parallel, and under normal conditions, the addresses of the data in the memory are fixed, and a bank conflict (bank conflict) occurs, namely, the data in the same memory bank must be accessed one by one. For example, in the memory example shown in fig. 3, each column represents a Bank, i.e., 8 banks in the memory having Bank0, …, Bank 7, if data at multiple addresses in the same Bank is accessed, e.g., 0,8,16,24 is clicked in Bank0, as shown in gray scale in fig. 3, it must take multiple cycles to access Bank0 one by one, resulting in reduced memory performance.
Disclosure of Invention
The present invention is directed to a micro-architectural design of LSU (Load/Store Unit) of a processor, which provides a scheme for reducing bank conflicts to improve memory performance.
A first aspect of the present invention provides a method for reducing bank conflicts, comprising:
determining binary addresses of a plurality of data which can generate bank conflict when a memory containing N banks is accessed in parallel;
determining a shift step, the shift step representing the number of banks in which data in the banks is to be moved, being a value represented by N bits selected from binary addresses of data in the memory, where N ^ 2^ N, and N are both natural numbers; and after the data in the memory is moved by the respective shift step, the plurality of data with the memory bank conflict will not be located in the same memory bank any more; and
and moving the data in the memory according to the shifting step, wherein the row address value of the data in the memory inside the memory bank is unchanged.
The invention ensures that the data which possibly conflict during access is not positioned in the same memory bank any more through address conversion, reduces the memory bank conflict and improves the performance of the memory.
Further, the shift step sizes of the plurality of data in which the bank conflict occurs are different from each other, so that the plurality of data in which the bank conflict occurs will be located in different banks, respectively, after the data in the memory is moved by the shift step size. All conflicts are eliminated as much as possible by moving all the data with conflicts to different memory banks, and the performance of the memory is improved fully.
Further, the n bits selected from the binary address of the data in the memory may be consecutive n bits. Alternatively, the n bits selected from the binary address of the data in the memory may be non-consecutive n bits. More specifically, the row spacing of a plurality of data where a bank conflict occurs can be represented by 2^ i + j, wherein i and j are natural numbers, and j is greater than or equal to 0 and less than 2^ i; when j is 0, the shift step size may be a value represented by n consecutive bits in the upper bit direction from the n + i th bit of the binary address of the data in the memory. When j ≠ 0, the shift step size may be a value represented by n consecutive bits from the n + i-th or n + i + 1-th bit of the binary address of the data in the memory toward the upper bit direction, or a value represented by n consecutive bits in the binary address of the data in the memory. The choice of these different shift steps depends on the specifics of the bank conflict and the configuration of the software.
Further, the memory space of the memory may be divided into a plurality of pages, a plurality of sets of registers being provided to configure the pages to determine the shift step size for each page respectively, the number of registers being smaller than or equal to the number of pages. Further, when the number of registers is less than the number of pages, each register may be mapped to multiple pages in a TLB manner.
A second aspect of the present invention provides an apparatus for reducing bank conflicts, comprising:
a bank conflict determination unit configured to determine binary addresses of a plurality of data at which a bank conflict may occur when a memory including N banks is accessed in parallel;
a shift step determining unit configured to move the data in the memory by respective shift steps, after which a plurality of data in which a bank conflict occurs will no longer be located in the same bank, wherein the shift step represents the number of banks in which the data in the bank will be moved, and is a value represented by N bits selected from binary addresses of the data in the memory, where N ═ 2^ N, and N are both natural numbers; and
a shifting unit configured to shift data in a memory by a shift step, wherein a row address value of the data in the memory inside a bank is unchanged.
Further, the shift step determining unit is further configured to make the shift steps of the plurality of data in which the bank conflict occurs different from each other, and after the data in the memory is moved by the shift steps, the plurality of data in which the bank conflict occurs will be located in different banks, respectively.
Further, the n bits selected from the binary address of the data in the memory may be consecutive n bits. Alternatively, the n bits selected from the binary address of the data in the memory may also be non-consecutive n bits, and the shift step size of the data where a bank conflict occurs may be completely different.
Further, the shift step determining unit may be further configured to divide the storage space of the memory into a plurality of pages, a plurality of sets of registers being provided to configure the pages to select the shift step for each page respectively, the number of registers being smaller than or equal to the number of pages. Further, when the number of registers is less than the number of pages, each register is mapped to a plurality of pages in a TLB manner.
A third aspect of the invention provides a processor comprising a memory having a plurality of memory banks and an apparatus as provided in the second aspect or any implementation of the second aspect.
A fourth aspect of the present invention provides an electronic device, which may be various computers, servers or various mobile devices, including a processor as provided in any of the foregoing fourth aspect or implementation manners of the fourth aspect.
A fifth aspect of the present invention provides a computer program product comprising program code which, when executed by a controller, performs the method provided by the foregoing first aspect or any implementation manner of the first aspect.
The invention carries out the shift operation by simply extracting a plurality of bits in the address of the data stored in the memory, moves the data which is possibly collided into different memory banks, reduces or eliminates the memory bank collision and improves the performance of the memory.
Drawings
FIG. 1 is a flow diagram of a method of reducing bank conflicts according to an embodiment of the present invention.
FIG. 2 is a schematic diagram of a system capable of reducing bank conflicts, according to an embodiment of the invention.
FIG. 3 is a schematic diagram of a memory having eight memory banks according to an embodiment of the present invention.
FIG. 4 is a schematic diagram of a memory after moving data in the memory using addresses [5:3] according to an embodiment of the invention.
FIG. 5 is a schematic diagram of a memory after moving data in the memory using addresses [6:4] according to an embodiment of the invention.
FIG. 6 is a schematic diagram of a memory after moving data in the memory using the values represented by bits 6, 5, and 3 of the address as a shift step according to an embodiment of the invention.
FIG. 7 is a hardware implementation of memory address reorganization according to an embodiment of the present invention.
Detailed Description
The invention is further illustrated with reference to the following specific embodiments and the accompanying drawings. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention. In addition, for convenience of description, only a part of structures or processes related to the present invention, not all of them, is illustrated in the drawings.
Memory devices such as Dynamic Random Access Memory (DRAM) or Static Random Access Memory (SRAM) may include multiple memory banks (banks) that may be independently accessed by a processor or like device. When the processors need to access data stored in the same memory bank in parallel, memory bank conflicts can occur.
According to an embodiment of the present invention, there is provided a method for reducing bank conflicts, as shown in fig. 1, including the steps of:
first, in step S101, binary addresses of a plurality of data at which a bank conflict occurs when a memory including a plurality of banks is accessed in parallel are determined. For example, as shown in fig. 3, in a memory 300 having eight memory banks Bank0, Bank 1, …, Bank 7, data in the eight memory banks can be accessed in parallel by 8 addresses without collision under normal circumstances. However, if multiple data stored in the same Bank are clicked at the same time, for example, four addresses 0,8,16,24 in Bank0 are clicked at the same time, as shown in gray in fig. 3, a Bank conflict occurs, and when data at 0,8,16,24 needs to be accessed at the same time, 0,8,16,24 is the location where the Bank conflict occurs.
Subsequently, step S102, determining a shift step, the shift step representing the number of banks in which the data in the banks is moved, i.e. the data is to be moved by several banks, which is a value represented by N bits selected from binary addresses of the data in the memory, where N ^ 2^ N, and N are both natural numbers; and after the data in the memory is moved by the shift step length, the data with the memory bank conflict is no longer positioned in the same memory bank; and step S103, moving the data in the memory according to the shift step, wherein the row address value of the data in the memory inside the memory bank is not changed.
For example, for the memory 300 shown in FIG. 3, conflicting data 0,8,16,24 may be shifted by a shift step size so that they are located in different memory banks. The memory 300 includes 8 banks (i.e., N-8), and for each data stored therein, a shift of 7 banks at most and 0 banks at minimum is required, and thus the shift step size can be expressed by a three-bit address, and thus 3 bits can be selected from among the addresses of the data stored in the memory to express the shift step size (i.e., N-3). For example, according to an embodiment of the present invention, a value represented by consecutive 3 bits from the 3 rd bit of the address to the upper bit direction may be used as a shift step, that is, a value represented by taking the address [5:3] the data in the memory is moved, that is, the data in the memory is moved according to the value of the three bits, that is, the 5 th bit to the 3 rd bit (in the address representation, the rightmost least significant bit is usually regarded as the 0 th bit, and is sequentially incremented to the left, and the number of bits of the following address is expressed in this manner). Then, in the memory 300, the shift steps of the first row data 0 to 7 are all 000, i.e., the first row data does not move; the shift steps of the second row of data 8-15 are all 001, i.e. shifted by one bit; similarly, the shift step size of the remaining other data can be obtained. In this embodiment, the data of the sending bank conflict is moved to a different bank, but the row address value of the data inside the bank is not changed, i.e. the data is moved in the same row as shown in the figure, because the shift step only changes the decoding of the bank, but not the decoding of the row address inside the bank.
After shifting in the above manner, the result shown in fig. 4 is obtained, and the data 0,8,16,24 in the memory are shifted to different banks Bank0, Bank 1, Bank2, Bank 3, respectively, so that the processors can access 0,8,16,24 in parallel in one cycle. In fig. 4, only the row in which the data of the bank conflict occurs is schematically shown, and the remaining rows are omitted.
As can be seen from a comparison of fig. 3 and fig. 4, after the data is moved in the above manner, the bank conflict is eliminated, and the data at 0,8,16, and 24 can be accessed in parallel in the same cycle, thereby improving the memory performance.
It is noted that the selection of the above shift steps is flexibly configurable according to different data access requirements, e.g., in another embodiment, for a memory having 8 banks, also similar to memory 300 in fig. 3, assuming that the data at 0,16,32,48 in Bank0 collide, to move the above data to a different Bank, the data in the memory can be shifted using, as the shift step, the value represented by the consecutive 3 bits from the 4 th bit of the address to the upper bit direction, that is, taking the value represented by the address [6:4] as the shift step, the shift steps of the data at 0,16,32,48 are 000, 001, 010, 011 respectively, thus, the data at 0,16,32,48 will be shifted into banks Bank0, Bank 1, Bank2, Bank 3, respectively, and the shifted results are shown in fig. 5.
In addition, in the above two examples, all the conflict data are moved to different memory banks respectively, but this is just one preferred embodiment of the present invention, and in some cases, all the conflict data may not be moved to different memory banks respectively, but only part of the conflict data may be moved to different memory banks, so as to eliminate part of the conflict. For example, in the memory shown in fig. 3, it is assumed that data at 0,8,16,24 needs to be accessed simultaneously, except for the above-described fetch address [5:3] to move data in the memory, the address [6:4] the data in the memory is shifted, namely the shifting step size of the four data with conflict is 000, 001 and 001, and the result obtained after shifting is the same as that in fig. 5, at this time, the data at 0,8,16 and 24 with Bank conflict is shifted into Bank0, Bank 1 and Bank 1, respectively, so that the access to the four data is reduced from four cycles to two cycles, the Bank conflict is also reduced, and the memory performance is improved to a certain extent.
The above embodiment shows the case where consecutive bits in the address of the data stored in the memory are selected to represent the shift step size. In addition, several discontinuous bits in the address can be selected as the shift step.
For example, in the example of memory having 8 banks shown in fig. 3, in one embodiment, if the data in conflict is data at 8,16,32 in Bank0, in order to transfer the above data to different banks, data composed of 6 th, 5 th, and 3 rd bits of address can be selected as a shift step, and the data is moved to banks Bank 1, Bank0, and Bank2, respectively, to obtain the result shown in fig. 6, where fig. 6 only schematically shows a part of the rows, and the rest of the rows are omitted. Similarly, in some embodiments, data composed of 5 th, 4 th and 2 nd bits of the address may be selected as the shift step, or data composed of 6 th, 3 th and 1 st bits may be selected as the shift step, and so on, and these different selections depend on different bank conflicts or different software configurations caused by different data access requirements.
Although different choices for the shift step size depend on different bank conflicts or different software configurations due to different data access requirements. However, there may be a certain rule in the selection. For example, in the case where a plurality of data in which a bank conflict occurs are equally spaced, the line spacing of the conflicting data (in the present invention, the line spacing represents the difference between the line address values of two data) may be represented by 2^ i + j, where i and j are natural numbers, and 0 ≦ j < 2^ i, at which time, in different cases, the shift step size may be selected in the following manner.
When j is 0, the shift step size may be selected to be a value represented by n consecutive bits in the upper bit direction from the n + i th bit of the binary address of the data in the memory. For example, in the bank conflict example shown in fig. 3, the row spacing of data at 0,8,16,24 is 1, i.e., i is 0, j is 0, then consecutive 3(n is 3) bits from the n + i is 3, i.e., address [5:3] are selected to move the data in the memory, which can effectively eliminate the bank conflict; as another example, in the example shown in fig. 5 in which the conflict data 0,16,32, and 48 are respectively moved to different banks, and the line pitch of the conflict data is 2, that is, i is 1, and j is 0, then the data in the memory can be moved by selecting consecutive 3(n is 3) bits from the n + i-4 th, that is, the address [6:4], and the bank conflict can be effectively eliminated. And when j ≠ 0, the shift step size may be a value represented by n consecutive bits from the n + i-th or n + i + 1-th bit of the binary address of the data in the memory toward the upper bit direction, or a value represented by n discontinuous bits in the binary address of the data in the memory. For example, also using the example of the memory having 8 banks shown in fig. 3, assuming that the data where the Bank conflict occurs is 0,24,48, the line spacing is 3, i is 1, j is 1, then 3(n is 3) bits consecutive from the n + i is 4, i.e. the address [6:4] can be selected to shift the data in the memory, at this time, the shift steps of the data at 0,24,48 are 000, 001, 011 respectively, and the data is to be shifted to the banks Bank0, Bank 1, Bank 3, respectively, so that the Bank conflict can be effectively eliminated; it is also possible to select consecutive 3 (n-3) bits from the (n + i + 1) -5 th, i.e., address [7:5], to shift the data in the memory, where the shift steps of the data at 0,24,48 are 000, and 001, respectively, and the data is shifted into banks Bank0, and Bank 1, respectively, effectively reducing Bank conflicts; several non-consecutive bits may be selected as long as the data with bank conflicts can be removed from the same bank.
It should be noted that the above selection manner of the shift step when the line spacing of the conflicting data is expressed by 2^ i + j is merely an example, and is intended to illustrate one possible embodiment of the present invention, and does not constitute a limitation to the present invention, and it is obvious for those skilled in the art to select n bits in the address as the shift step in various manners according to the idea of the present invention.
Meanwhile, although an example of a memory including 8 banks is shown in fig. 3 to 5, it will be understood by those skilled in the art that the number of banks and the structure of the memory shown herein are merely for convenience of description and do not constitute a limitation of the present invention. The present invention is also applicable to memories having different numbers of banks, such as 4, 16,32, etc., in which addresses having different numbers of bits can be selected as shift data, for example, in a memory having 32 banks, when a bank conflict occurs, 5 bits of the addresses can be selected as shift data, and the conflict data can be moved to different banks by using a value represented by the selected 5 bits, thereby reducing the bank conflict.
A hardware implementation example of memory address reorganization according to an embodiment of the present invention is described below in conjunction with fig. 7.
According to an embodiment of the present invention, assuming that the memory is divided into N banks, N generally takes an exponent of 2, e.g. 2,4,8,16,32, etc., i.e. N ═ 2^ N, to avoid wasting resources, but in some extreme cases may take other values.
Typically, n bits may be selected from the memory address to decode bank selection. For example, as shown in FIG. 7, this signal may be named bank _ sel (i.e., bank _ sel is the n bits selected from the address). Without the present invention, the Bank _ sel value may directly select the Bank to be accessed, e.g., Bank _ sel-0 denotes Bank0, Bank _ sel-1 denotes Bank 1, etc.
In an embodiment according to the invention a shift step, which may be denoted shift _ sel, may be added to bank _ sel, as shown in fig. 7, for each row of such a memory with N memory banks there are N possible shift cases (0,1,2, …, N-1), i.e. the value of shift _ sel may be 0. Then the bank selection is determined by both bank _ sel and shift _ sel. For example, Bank _ sel + shift _ sel ═ 0 indicates Bank0, Bank _ sel + shift _ sel ═ 1 indicates Bank 1, and the like.
Thus, as shown in fig. 7, a 5-bit domain address including bank selection (bank _ sel), shift step (shift _ sel), byte offset (offset), and 2-bit domain row indices (index _ h and index _ l) can be defined to generate a decoded signal. Wherein the bank select (bank _ sel) and the shift step (shift _ sel) are used for decoding the bank select, as described above; the row indexes (index _ h and index _ l) are used to index to a specific row of the bank, i.e. to index the row address of the data inside the bank; the byte offset (offset) is used to index an offset amount within one bank, e.g., if the bank width is 4 bytes, the byte offset (offset) should be 2 bits. The position of each bit field can be configured by software, for example, whether shift _ sel and bank _ sel are adjacent can be configured by software, if not, for example, as shown in fig. 7, the width of index _ l can also be configurable.
After the hardware is configured, how to obtain the shift step shift _ sel is the key.
According to an embodiment of the invention, a simple method is to select the other n bits in the address as shift _ sel according to the data access requirements. both the bank _ sel bit and the shift _ sel bit may be any n bits in the address. For example, they may be the same n bits in the address, or some bits may overlap. The software configuration can select n bits in the address as shift _ sel according to the application scene. Of course, if no movement is desired, a value may be configured that means no movement. As mentioned above, they may be n consecutive bits or non-consecutive bits in the address.
Another more complex but also more flexible approach is to select different bits in different address spaces as shift _ sel.
For example, in some embodiments, the memory space of the memory may be divided into multiple pages (pages), for example, Page 1, Page 2, Page 3 … … may each have n KB (or other) space, and multiple sets of registers may be provided to configure the pages, so as to define different shifting modes for each Page. For example, Page 1 can be selected to use the address [5:3] to move the data in memory, select to use address [6:4] to move data in memory, select in Page 3 to move data in memory using bits 5, 4, 2 of the address, etc.
A set of registers may be provided for each page, respectively, if available. In some cases, however, the number of register sets may be less than the number of address pages, i.e., a set of register sets may be configured to be TLB-mapped to different pages, thereby saving registers. This approach is actually a special TLB (Translation Lookaside Buffer). Traditionally, TLBs are generally used to translate virtual addresses to physical addresses, but here, a TLB may be provided to transfer data in memory from one memory bank to another memory bank in different ways of shifting among different pages.
In a memory space divided into pages, the size of each page may be an exponent of 2, i.e., M ^ 2^ M. Then the address within a page can be represented by m bits. Assuming that the entire address has k bits, other k-m bit addresses may constitute the page address. Multiple sets of configuration registers may be defined and used as TLBs. When accessing the memory, the page address of the accessed address is compared with the page addresses in all the configuration register sets, and if one and the same value is found, the set of configurations can be used to select shift _ sel; whereas if the same value is not found, the hardware will request an interrupt from the software to ask the software to fill or replace the TLB, or automatically replace the TLB entry using a software pre-configured value in the second TLB. The software must ensure that no two or more register set configurations corresponding to the same page will occur. Thus, a large number of page address spaces can be configured with a small number of register sets, and address conversion from one memory bank to another memory bank in different shifting manners in different page address spaces is realized.
The above hardware implementations are merely illustrative of the concepts of the present invention, and the specific configurations thereof are not to be construed as limiting the invention, which can be embodied in various suitable forms.
According to another embodiment of the present invention, there is also provided a system, which may be implemented in a processor, as shown in fig. 2, i.e., the processor may include a memory 20 having a plurality of memory banks and an apparatus 10 for reducing memory bank conflicts. The device 10 includes a bank conflict determination unit 101 configured to determine a binary address of data where a bank conflict occurs when a memory including N banks is accessed in parallel; a shift step determining unit 103 configured to move the data in the memory by a shift step, after which the data in the memory is no longer located in the same bank, where the shift step represents the number of banks in which the data in the bank is moved, and is a value represented by N bits selected from binary addresses of the data in the memory, where N ^ 2N; and a shift unit 105 configured to shift data in the memory by a shift step, wherein a row address value of the data in the memory inside the bank is unchanged; the apparatus 10 may perform the method of reducing bank conflicts as shown in fig. 1.
It should be noted that, in the embodiments of the present invention, all the units are logic units, and physically, one logic unit may be one physical unit, or may be a part of one physical unit, or may be implemented by a combination of multiple physical units, where the physical implementation manner of the logic units itself is not the most important, and the combination of the functions implemented by the logic units is the key to solve the technical problem provided by the present invention. In addition, in order to highlight the innovative part of the present invention, elements that are not so closely related to solving the technical problem proposed by the present invention are not introduced in the embodiments of the present invention, which does not indicate that other elements are not present in the embodiments.
There is also provided, in accordance with another embodiment of the present invention, an electronic device including a processor as described above, which may be a variety of computing devices, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers; or various forms of mobile devices such as personal digital assistants, cellular telephones, smart phones, Portable Digital Assistants (PDAs), portable games, palmtop or tablet computers, and the like; or various smart devices such as various wearable smart devices, smart appliances, and the like.
While the embodiments of the present invention have been described in detail with reference to the accompanying drawings, the use of the technical solution of the present invention is not limited to the applications mentioned in the embodiments of the patent, and various structures and modifications can be easily implemented with reference to the technical solution of the present invention to achieve various advantageous effects mentioned herein. Variations that do not depart from the gist of the invention are intended to be within the scope of the invention as defined by the appended claims.

Claims (14)

1. A method for reducing bank conflicts, comprising:
determining binary addresses of a plurality of data which can generate bank conflict when a memory containing N banks is accessed in parallel;
determining a shift step, representing a number of banks in which data in the banks is to be moved, as a value represented by N bits selected from binary addresses of data in the memory, where N ^ 2^ N, and N are both natural numbers; and, after the data in the memory is moved by the respective shift step, the plurality of data in which the bank conflict occurs will no longer be located in the same bank; and
and moving the data in the memory according to the shifting step, wherein the row address value of the data in the memory inside the memory bank is unchanged.
2. The method of reducing bank conflicts of claim 1,
the shift step sizes of the plurality of data with the bank conflict are different from each other, so that the plurality of data with the bank conflict will be respectively located in different banks after the data in the memory is moved by the shift step sizes.
3. The method of reducing bank conflicts of claim 1,
the row spacing of the data with the memory bank conflict is 2^ i + j, wherein i and j are natural numbers, and j is more than or equal to 0 and less than 2^ i;
when j is 0, the shift step is a value represented by n bits consecutive in the upper bit direction from the n + i-th bit of the binary address of the data in the memory.
4. The method of reducing bank conflicts of claim 1,
the row spacing of the data with the memory bank conflict is 2^ i + j, wherein i and j are natural numbers, and j is more than or equal to 0 and less than 2^ i;
when j ≠ 0, the shift step is a value represented by n consecutive bits in the upper direction from the n + i-th or n + i + 1-th bit of the binary address of the data in the memory, or a value represented by n discontinuous bits in the binary address of the data in the memory.
5. The method for reducing bank conflicts according to any of claims 1 to 4,
the method comprises the steps of dividing the storage space of the memory into a plurality of pages, providing a plurality of groups of registers to configure the pages so as to respectively determine the shifting step length for each page, wherein the number of the registers is less than or equal to the number of the pages.
6. The method of reducing bank conflicts of claim 5,
when the number of registers is less than the number of pages, each register is mapped to a plurality of pages in a TLB manner.
7. An apparatus for reducing bank conflicts, comprising:
a bank conflict determination unit configured to determine binary addresses of a plurality of data at which a bank conflict may occur when a memory including N banks is accessed in parallel;
a shift step determining unit configured to move the data in the memory by the respective shift steps, and then the plurality of data in which the bank conflict occurs will not be located in the same bank any more, wherein the shift step represents the number of banks in which the data in the bank will be moved, and is a value represented by N bits selected from binary addresses of the data in the memory, where N ^ 2^ N, and N are both natural numbers; and
a shifting unit configured to shift the data in the memory by the shift step, wherein a row address value of the data in the memory inside the memory bank is unchanged.
8. The apparatus for reducing bank conflicts according to claim 7, wherein the shift step determining unit is further configured to make the shift steps of the plurality of data with bank conflicts different from each other, so that the plurality of data with bank conflicts will be located in different banks after the data in the memory is moved by the shift steps.
9. The apparatus for reducing bank conflicts of claim 7,
the row spacing of the data with the memory bank conflict is 2^ i + j, wherein i and j are natural numbers, and j is more than or equal to 0 and less than 2^ i;
when j is 0, the shift step is a value represented by n bits consecutive in the upper bit direction from the n + i-th bit of the binary address of the data in the memory.
10. The apparatus for reducing bank conflicts of claim 7,
the row spacing of the data with the memory bank conflict is 2^ i + j, wherein i and j are natural numbers, and j is more than or equal to 0 and less than 2^ i;
when j ≠ 0, the shift step is a value represented by n consecutive bits in the upper direction from the n + i-th or n + i + 1-th bit of the binary address of the data in the memory, or a value represented by n discontinuous bits in the binary address of the data in the memory.
11. The apparatus for reducing bank conflicts according to any of claims 7 to 10,
the shift step determining unit is further configured to divide a storage space of the memory into a plurality of pages, and to provide a plurality of sets of registers to configure the pages to determine the shift step for each page, respectively, the number of the registers being smaller than or equal to the number of the pages.
12. The apparatus for reducing bank conflicts of claim 11,
when the number of registers is less than the number of pages, each register is mapped to a plurality of pages in a TLB manner.
13. A processor comprising a memory having a plurality of memory banks and the apparatus for reducing memory bank conflicts of any of claims 7-12.
14. An electronic device comprising the processor of claim 13.
CN201811580102.XA 2018-12-24 2018-12-24 Method for reducing memory bank conflict Active CN109710309B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201811580102.XA CN109710309B (en) 2018-12-24 2018-12-24 Method for reducing memory bank conflict
PCT/CN2019/126552 WO2020135209A1 (en) 2018-12-24 2019-12-19 Method for reducing bank conflicts

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811580102.XA CN109710309B (en) 2018-12-24 2018-12-24 Method for reducing memory bank conflict

Publications (2)

Publication Number Publication Date
CN109710309A CN109710309A (en) 2019-05-03
CN109710309B true CN109710309B (en) 2021-01-26

Family

ID=66256110

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811580102.XA Active CN109710309B (en) 2018-12-24 2018-12-24 Method for reducing memory bank conflict

Country Status (2)

Country Link
CN (1) CN109710309B (en)
WO (1) WO2020135209A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109710309B (en) * 2018-12-24 2021-01-26 安谋科技(中国)有限公司 Method for reducing memory bank conflict
CN111857831B (en) * 2020-06-11 2021-07-20 成都海光微电子技术有限公司 Memory bank conflict optimization method, parallel processor and electronic equipment
CN114827091B (en) * 2022-04-25 2023-06-20 珠海格力电器股份有限公司 Physical address conflict processing method and device and communication equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105701036A (en) * 2016-01-19 2016-06-22 中国人民解放军国防科学技术大学 Address transition unit supporting deformation radix 16FFT algorithm parallel access
CN107748723A (en) * 2017-09-28 2018-03-02 中国人民解放军国防科技大学 Storage method and access device supporting conflict-free stepping block-by-block access

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7797493B2 (en) * 2005-02-15 2010-09-14 Koninklijke Philips Electronics N.V. Enhancing performance of a memory unit of a data processing device by separating reading and fetching functionalities
JP2009520295A (en) * 2005-12-20 2009-05-21 エヌエックスピー ビー ヴィ Multiprocessor circuit with shared memory bank
CN101082906A (en) * 2006-05-31 2007-12-05 中国科学院微电子研究所 Fixed-base FFT processor with low memory spending and method thereof
CN102184092A (en) * 2011-05-04 2011-09-14 西安电子科技大学 Special instruction set processor based on pipeline structure
KR102205899B1 (en) * 2014-02-27 2021-01-21 삼성전자주식회사 Method and apparatus for avoiding bank conflict in memory
US10198369B2 (en) * 2017-03-24 2019-02-05 Advanced Micro Devices, Inc. Dynamic memory remapping to reduce row-buffer conflicts
US10061541B1 (en) * 2017-08-14 2018-08-28 Micron Technology, Inc. Systems and methods for refreshing a memory bank while accessing another memory bank using a shared address path
CN109710309B (en) * 2018-12-24 2021-01-26 安谋科技(中国)有限公司 Method for reducing memory bank conflict

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105701036A (en) * 2016-01-19 2016-06-22 中国人民解放军国防科学技术大学 Address transition unit supporting deformation radix 16FFT algorithm parallel access
CN107748723A (en) * 2017-09-28 2018-03-02 中国人民解放军国防科技大学 Storage method and access device supporting conflict-free stepping block-by-block access

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GPU编程模型中存储体冲突的研究;原建伟;《河北工业科技》;20130321;第39-46页 *

Also Published As

Publication number Publication date
CN109710309A (en) 2019-05-03
WO2020135209A1 (en) 2020-07-02

Similar Documents

Publication Publication Date Title
JP5658556B2 (en) Memory control device and memory control method
JP3599352B2 (en) Flexible N-way memory interleave scheme
US7089398B2 (en) Address translation using a page size tag
CN109710309B (en) Method for reducing memory bank conflict
US9280474B2 (en) Adaptive data prefetching
EP0508577A1 (en) Address translation mechanism
CN104252392A (en) Method for accessing data cache and processor
WO1998043168A1 (en) Address mapping for system memory
US10915459B2 (en) Methods and systems for optimized translation of a virtual address having multiple virtual address portions using multiple translation lookaside buffer (TLB) arrays for variable page sizes
JP6945544B2 (en) Priority-based access of compressed memory lines in memory in processor-based systems
US9063860B2 (en) Method and system for optimizing prefetching of cache memory lines
JPH04343151A (en) Memory access device
US9875191B2 (en) Electronic device having scratchpad memory and management method for scratchpad memory
CN104102586A (en) Address mapping processing method and address mapping processing device
EP3716080A1 (en) System, apparatus and method for application specific address mapping
WO2013030628A1 (en) Integrated circuit device, memory interface module, data processing system and method for providing data access control
US7024536B2 (en) Translation look-aside buffer for improving performance and reducing power consumption of a memory and memory management method using the same
JP3935871B2 (en) MEMORY SYSTEM FOR COMPUTER CIRCUIT HAVING PIPELINE AND METHOD FOR PROVIDING DATA TO PIPELINE FUNCTIONAL UNIT
US10817433B2 (en) Page tables for granular allocation of memory pages
CN114116533B (en) Method for storing data by using shared memory
CN113656330B (en) Method and device for determining access address
CN114925001A (en) Processor, page table prefetching method and electronic equipment
US7162609B2 (en) Translation lookaside buffer prediction mechanism
Lee et al. A low-power VLSI architecture for a shared-memory FFT processor with a mixed-radix algorithm and a simple memory control scheme
US20050188172A1 (en) Reduction of address aliasing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant