CN114741329A

CN114741329A - Multi-granularity combined memory data interleaving method and interleaving module

Info

Publication number: CN114741329A
Application number: CN202210643436.7A
Authority: CN
Inventors: 卢红召; 何颖
Original assignee: Core Microelectronics Technology Zhuhai Co ltd
Current assignee: Core Microelectronics Technology Zhuhai Co ltd
Priority date: 2022-06-09
Filing date: 2022-06-09
Publication date: 2022-07-12
Anticipated expiration: 2042-06-09
Also published as: CN115309669A; CN114741329B

Abstract

The invention discloses a multi-granularity combined memory data interleaving method and an interleaving module. An interleaving method, comprising: dividing a system address space to obtain a first system interleaving area with a first interleaving granularity, a second system interleaving area with a second interleaving granularity and a system linear area; dividing a memory address space to obtain a first memory interleaving area with a first interleaving granularity, a second memory interleaving area with a second interleaving granularity and a memory linear area; the first system interleaving area and the first storage interleaving area form a mapping relation, the second system interleaving area and the second storage interleaving area form a mapping relation, the system linear area and the storage linear area form a mapping relation, and the first interleaving granularity is smaller than the second interleaving granularity. The invention can effectively avoid low utilization rate of storage space caused by insufficient parallel processing degree in a single memory and unnecessary power consumption waste caused by simultaneous working of all memories.

Description

Multi-granularity combined memory data interleaving method and interleaving module

Technical Field

The invention belongs to the technical field of data processing, and particularly relates to a multi-granularity combined memory data interleaving method and an interleaving module.

Background

Currently, a computing device is generally composed of a System on Chip (SoC), and many high-performance computing devices such as a Graphics Processing Unit (GPU), an artificial intelligence computing device, and the like require the SoC to provide a higher memory read/write rate. In order to realize the requirements, the SoC can divide a bus into a plurality of storage channels, each storage provides a fixed read-write rate, and the plurality of storages are scheduled and operated simultaneously through a proper storage channel interleaving technology, so that the read-write rate is improved.

Generally, after the memory channel interleaving process, the addresses of the memories are symmetrically, uniformly and alternately mapped into the virtual address space at a certain fixed granularity (hereinafter, referred to as uniform interleaving process). From the application point of view, incremental addresses sent out on the basis of a certain base address are mapped into a plurality of memories after uniform interleaving processing; the addresses received in a single memory are also linearly incremented, which results in a single memory in which only one to two memory blocks are active most of the time and most of the memory blocks are inactive, and thus, the degree of parallel processing in the memory is insufficient. In addition, some high-performance computing devices are not in an extremely high-performance operating state at all times, and in an operating scenario with low or low performance requirements, the uniform interleaving process still requires all memories to operate simultaneously, which causes unnecessary power consumption waste. The above problems are more apparent in mobile computing devices, such as smart phones/tablet computers/notebook computers, etc.

Disclosure of Invention

Aiming at the defects or improvement requirements of the prior art, the invention provides a multi-granularity combined memory data interleaving method and an interleaving module, which can realize single-layer interleaving of various granularities and overlapped interleaving of different granularities by adjusting bus configuration and reasonably dividing different storage areas in a memory, and can effectively avoid low storage space utilization rate caused by insufficient parallel processing degree in a single memory and unnecessary power consumption waste caused by simultaneous work of all memories through a flexible interleaving scheme.

To achieve the above object, according to an aspect of the present invention, there is provided a memory data interleaving method, including: dividing a system address space to obtain a first system interleaving area with a first interleaving granularity, a second system interleaving area with a second interleaving granularity and a system linear area; dividing a memory address space to obtain a first memory interleaving area with a first interleaving granularity, a second memory interleaving area with a second interleaving granularity and a memory linear area; the first system interleaving area and the first storage interleaving area form a mapping relation, the second system interleaving area and the second storage interleaving area form a mapping relation, the system linear area and the storage linear area form a mapping relation, and the first interleaving granularity is smaller than the second interleaving granularity.

In some embodiments, the first memory interleaving area comprises an even number of first sub memory interleaving areas with equal capacity, and the even number of first sub memory interleaving areas are respectively arranged in different memories; the second memory interleaving area comprises a second sub memory interleaving area or a plurality of second sub memory interleaving areas with the same capacity, and the plurality of second sub memory interleaving areas are respectively arranged in different memories; the storage linear region includes a plurality of sub-storage linear regions, which are respectively disposed in different memories.

In some embodiments, the capacity of the first system interleaving area is the sum of the capacities of the first sub-memory interleaving areas in each memory; the capacity of the second system interleaving area is the sum of the capacities of the second sub-storage interleaving areas in each memory; the capacity of the linear area of the system is the sum of the capacities of the sub-storage linear areas in the memories.

In some embodiments, the method further comprises: dividing a system address space to obtain a system overlapping area with a first interleaving granularity and a second interleaving granularity; dividing a memory address space to obtain a memory overlapping area with a first interleaving granularity and a second interleaving granularity; the system overlapping area and the storage overlapping area form a mapping relation.

In some embodiments, the storage overlap region includes an even number of sub-storage overlap regions having equal capacity, and the even number of sub-storage overlap regions are respectively disposed in different memories.

In some embodiments, the capacity of the system overlap region is the sum of the capacities of the sub-storage overlap regions in the respective memories.

In some embodiments, in the system address space, the addresses of the first system interleaving area, the second system interleaving area, and the system linear area are arranged in order from small to large; or the addresses of the first system interleaving area, the system overlapping area, the second system interleaving area and the system linear area are arranged from small to large.

In some embodiments, in each memory of the memory address space, the addresses of the first sub-memory interleaving region, the sub-memory overlapping region, the second sub-memory interleaving region, and the sub-memory linear region are arranged in order from small to large.

In some implementations, there is one or more memories in the memory address space that do not have one or more of a first sub-memory interleave region, a sub-memory overlap region, a second sub-memory interleave region, and a sub-memory linear region.

In some embodiments, in each memory of the memory address space, a first sub-memory interleaving area address < sub-memory overlapping area address < second sub-memory interleaving area address < sub-memory linear area address is satisfied, and addresses of each area are sequentially arranged.

In some embodiments, the method further comprises: and allocating read-write space of a proper area for each process of the processing unit according to the read-write performance requirement of the processing unit.

In some embodiments, allocating the read-write space of the suitable area to each process of the processing unit according to the read-write performance requirement of the processing unit specifically includes: configuring the opening and closing states, the number and the capacity of a first sub-storage interleaving area, a second sub-storage interleaving area and a sub-storage linear area in a memory address space according to the read-write performance requirement of a processing unit; or configuring the opening and closing states, the number and the capacity of a first sub-memory interleaving area, a sub-memory overlapping area, a second sub-memory interleaving area and a sub-memory linear area in a memory address space; according to the read-write performance requirements of the processing unit, the read-write performance requirements of each process of the processing unit are further judged, and appropriate read-write space is allocated to each process by combining configuration information; starting each process of the processing unit, and performing interleaving processing on the current address in the data read-write instruction according to the data read-write instruction and the configuration information sent by the processing unit.

According to another aspect of the present invention, there is provided an interleaving module, including an interleaving pre-processing module, a first interleaving processing module, a second interleaving processing module, and a linear processing module; the interleaving preprocessing module is used for determining which partition of the system address space the current address belongs to according to the interleaving control signal, further judging the subsequent processing mode of the current address and sending the judgment result to the first interleaving processing module; the first interleaving processing module is used for determining whether to perform interleaving processing of a second interleaving granularity on the current address according to the judgment result of the interleaving preprocessing module; the second interleaving processing module is used for determining whether to perform interleaving processing of a first interleaving granularity on the address from the first interleaving processing module according to the judgment result of the interleaving preprocessing module; the first interleaving granularity is less than the second interleaving granularity; the linear processing module is used for determining that the address from the second interleaving processing module is processed by adopting a first linear processing mode or a second linear processing mode according to the judgment result of the interleaving preprocessing module.

In some embodiments, the system address space includes a first system interleave region having a first interleave granularity, a second system interleave region having a second interleave granularity, and a system linear region; when the current address falls in the second system interleaving area, the interleaving preprocessing module judges that the current address needs to be processed by the first interleaving processing module; when the current address is in the first system interleaving area, the interleaving preprocessing module judges that the current address needs to be processed by the second interleaving processing module.

In some embodiments, the system address space further includes a system overlap region where the first granularity of interleaving and the second granularity of interleaving overlap; when the current address is in the system overlapping area, the interleaving preprocessing module judges that the current address needs to be processed by the first interleaving processing module and the second interleaving processing module.

In some embodiments, when the judgment result of the interleaving preprocessing module is that the current address needs to be processed by the first interleaving processing module, the first interleaving processing module performs interleaving processing of a second interleaving granularity on the current address according to the interleaving control signal, and sends the address after the interleaving processing to the second interleaving processing module; and when the judgment result of the interleaving preprocessing module is that the current address does not need to be processed by the first interleaving processing module, the first interleaving processing module sends the current address to the second interleaving processing module.

In some embodiments, when the judgment result of the interleaving preprocessing module is that the current address needs to be processed by the second interleaving processing module, the second interleaving processing module performs interleaving processing with a first interleaving granularity on the address from the first interleaving processing module according to the interleaving control signal, and sends the address after the interleaving processing to the linear processing module; and when the judgment result of the interleaving preprocessing module is that the current address does not need to be processed by the second interleaving processing module, the second interleaving processing module sends the address from the first interleaving processing module to the linear processing module.

In some embodiments, when the judgment result of the interleaving preprocessing module is that the current address needs to be processed by at least one of the first interleaving processing module and the second interleaving processing module, a first linear processing mode is adopted, otherwise, a second linear processing mode is adopted.

In some embodiments, the first linear processing mode comprises: mapping the addresses from the second interleaving processing module to corresponding areas of each memory address space in a sequential equal division manner; the second linear processing mode comprises the following steps: and sequentially filling the addresses from the second interleaving processing module into corresponding areas of the address spaces of the memories.

According to still another aspect of the present invention, there is provided a system on a chip including one or more processing units, one or more of the above interleaving modules in one-to-one correspondence with the one or more processing units, a plurality of memory controllers, and a plurality of memories in one-to-one correspondence with the plurality of memory controllers.

Generally, compared with the prior art, the above technical solution conceived by the present invention has the following beneficial effects: dividing storage resources in a plurality of memories into a plurality of interleaving areas and linear areas, for example, including an interleaving area with a first interleaving granularity (e.g. fine granularity), an interleaving area with a second interleaving granularity (e.g. coarse granularity), an overlapping area where the first interleaving granularity and the second interleaving granularity are overlapped, and a linear area, setting different interleaving modes for different interleaving areas, determining which area the current data belongs to according to address information and configuration information when the bus forwards the data, and processing the address by using the interleaving method corresponding to the area where the current data belongs to realize data interleaving; interleaving is carried out step by step, the interleaving is carried out from coarse to fine according to granularity, if the address hits the corresponding area, the interleaving is processed according to the corresponding rule, if a plurality of interleaving areas are mapped to the same block of area in the memory, the current address is processed for a plurality of times, and overlapping of different interleaving granularities is realized. By adjusting the bus configuration and reasonably dividing different storage areas in the memory, single-layer interleaving of various different granularities and overlapping interleaving of different granularities can be realized, so that a flexible data interleaving scheme is realized.

Drawings

FIG. 1 is a schematic diagram of a partial structure of a system-on-chip with multi-granularity combined interleaving capability according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating a mapping relationship between a system address space and address spaces in memories according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating a mapping relationship between a system address space and address spaces in memories according to another embodiment of the present invention;

FIG. 4 is a diagram illustrating a mapping relationship between a system address space and address spaces in memories according to another embodiment of the present invention;

FIG. 5 is a diagram illustrating a mapping relationship between a system address space and address spaces in memories according to another embodiment of the present invention;

FIG. 6A is an example of a mapping relationship of the interleaving area 2A1 with the memory address space shown in FIG. 2;

FIG. 6B is an example of a mapping relationship of the interleaved zone 2A3 with memory address space shown in FIG. 2;

FIG. 6C is an example of a mapping relationship of the overlap area 2A2 shown in FIG. 2 to a memory address space;

FIG. 6D is an example of a mapping relationship of the linear region 2A4 shown in FIG. 2 to a memory address space;

FIG. 7 is a schematic diagram of an interleaving module according to an embodiment of the present invention;

fig. 8 is a flowchart of interleaving policy configuration according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. As those skilled in the art would realize, the described embodiments may be modified in various different ways, all without departing from the spirit or scope of the present application. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.

The system on a chip may be integrated in any computing device, including but not limited to personal computers, mobile devices, portable computers, servers, graphics cards, artificial intelligence computing devices, and the like. As shown in fig. 1, in the embodiment of the present invention, the SoC includes 4 Processing units, which are a first Processing Unit Central Processing Unit (CPU), a second Processing Unit high-speed serial computer extension bus (PCIe), a third Processing Unit Direct Memory Access (DMA), and a fourth Processing Unit Graphics Processing Unit (GPU), a system register, 4 interleaving modules corresponding to the 4 Processing units one by one, which are a first interleaving module corresponding to the CPU, a second interleaving module corresponding to the PCIe, a third interleaving module corresponding to the DMA, and a fourth interleaving module corresponding to the GPU, respectively, 4 Memory controllers, which are a first Memory controller, a second Memory controller, a third Memory controller, and a fourth Memory controller, respectively, and 4 memories corresponding to the 4 Memory controllers one by one, respectively a first memory, a second memory, a third memory and a fourth memory.

The CPU is connected with the SoC configuration bus through the configuration bus interface, the SoC configuration bus ensures that read-write signals sent by the CPU can be sent to a system register, the SoC configuration bus is connected with the first interleaving module through the data bus interface, the PCIe is connected with the second interleaving module through the data bus interface, the DMA is connected with the third interleaving module through the data bus interface, the GPU is connected with the fourth interleaving module through the data bus interface, and the system register is respectively connected with the first interleaving module, the second interleaving module, the third interleaving module and the fourth interleaving module. The first storage controller is connected with the first storage, the second storage controller is connected with the second storage, the third storage controller is connected with the third storage, and the fourth storage controller is connected with the fourth storage. The first interleaving module, the second interleaving module, the third interleaving module and the fourth interleaving module are connected with an SoC data bus, and the SoC data bus ensures that read-write signals sent by the first interleaving module, the second interleaving module, the third interleaving module and the fourth interleaving module can be sent to the first storage controller, the second storage controller, the third storage controller and the fourth storage controller.

The CPU stores the configuration information in a system register through an SoC configuration bus, the system register outputs interleaving control signals to each interleaving module, and each interleaving module receives the interleaving control signals from the system register. The CPU sends a data read-write instruction to the first interleaving module through the data bus interface, the PCIe sends a data read-write instruction to the second interleaving module through the data bus interface, the DMA sends a data read-write instruction to the third interleaving module through the data bus interface, and the GPU sends a data read-write instruction to the fourth interleaving module through the data bus interface. Each interleaving module receives a data read-write instruction from each processing unit, the data read-write instruction is divided into address information and data information, the address information is interleaved according to an interleaving control signal set in advance, and the processed address information and data information are sent to each storage controller through an SoC data bus. The memory controller receives a read-write request from the SoC data bus, converts the read-write request into a data format specified by the memory, and writes data information in a data read-write command into the memory or reads data from the memory and sends the data to the SoC data bus.

It should be understood that the number of processing units is not limited, and may be one, two or more, and likewise, the number of memory controllers and memories, the capacity of the memories, and the form of the memories are also not limited, and fig. 1 only shows 4 sets of memory controllers and memories by way of example, and may also show two, three, five or more sets. The Memory may be an on-chip Static Random Access Memory (SRAM), an off-chip Double Data Rate Synchronous Dynamic Random Access Memory (ddr sdram), and the like, which is not limited in the present invention.

In some embodiments of the present invention, the SoC includes one or more processing units of active read/write memory, one or more interleaving modules in one-to-one correspondence with the processing units, a plurality of memory controllers, and a plurality of memories in one-to-one correspondence with the memory controllers. Each interleaving module is respectively connected with a plurality of memory controllers, namely, a plurality of memories can be shared by a plurality of processing units. In some embodiments, the SoC includes a processing unit that is a CPU, PCIe, or other processing unit capable of issuing configuration signals. In some embodiments, the SoC includes a plurality of processing units, the processing units may be CPUs, PCIe, DMAs, GPUs, or other processing units, and at least one of the processing units is a CPU, PCIe, or other processing unit capable of issuing configuration signals. Fig. 1 shows a case of obtaining the interleaving control signal from the CPU, and in fact, the interleaving control signal may be obtained from PCIe or another processing unit having the capability of sending the configuration signal, and accordingly, the PCIe or another processing unit having the capability of sending the configuration signal may be used to replace the CPU and be connected to the system register through the SoC configuration bus.

The addresses accessed by the processing units are in a system address space, and the system address space is divided to obtain a first system interleaving area with a first interleaving granularity (for example 128 bytes), a second system interleaving area with a second interleaving granularity (for example 4096 bytes), a system overlapping area with the first interleaving granularity and the second interleaving granularity overlapping, and a system linear area without interleaving. The system address space and the memory address space form a mapping relation, and a first memory interleaving area with a first interleaving granularity, a second memory interleaving area with a second interleaving granularity, a memory overlapping area and a memory linear area are also divided in the memories. The first system interleaving area and the first storage interleaving area form a mapping relation, the second system interleaving area and the second storage interleaving area form a mapping relation, the system overlapping area and the storage overlapping area form a mapping relation, and the system linear area and the storage linear area form a mapping relation. The capacity of the first system interleaving area is equal to that of the first storage interleaving area, the capacity of the second system interleaving area is equal to that of the second storage interleaving area, the capacity of the system overlapping area is equal to that of the storage overlapping area, and the capacity of the system linear area is equal to that of the storage linear area.

In some embodiments, the first interleaving granularity is 2^mByte, second interleave granularity of 2ⁿA byte. In some embodiments, the first interlace granularity is smaller than the second interlace granularity, i.e., m < n, the first interlace granularity is a fine granularity, and the second interlace granularity is a coarse granularity. In some embodiments, the first interleaving granularity is 128 bytes. In some embodiments, the second interleaving granularity is 4096 bytes.

In some embodiments, since symmetric interleaving is required, the first memory interleaving area includes an even number of first sub memory interleaving areas with equal capacity, and the even number of first sub memory interleaving areas are respectively arranged in different memories, that is, each of the even number of first sub memory interleaving areas occupies one memory, and the total occupies the even number of memories. In some embodiments, since symmetric interleaving is required, the storage overlap region includes an even number of sub-storage overlap regions with equal capacity, and the even number of sub-storage overlap regions are respectively disposed in different memories, that is, each of the even number of sub-storage overlap regions occupies one memory, and occupies a total of even number of memories. In some embodiments, the second memory interleaving area comprises one second sub memory interleaving area or a plurality of second sub memory interleaving areas with equal capacity. In some embodiments, the total number of the memories in the SoC is N, and the number of the second sub-memory interleaving regions may be an odd number or an even number, which is less than or equal to N. In some embodiments, the address space in each memory excluding the first sub memory interleaving area, the second sub memory interleaving area, and the sub memory overlapping area is a sub memory linear area.

In some embodiments, a linear region exists for each memory. In some embodiments, when the capacity of a certain memory is occupied by an interleaving area and an overlapping area, the memory does not have a linear area. A user may configure according to actual needs, so as to determine whether to set the first sub-storage interleaving region, the second sub-storage interleaving region, the sub-storage overlapping region, and the sub-storage linear region in each memory, that is, one or more of the first sub-storage interleaving region, the second sub-storage interleaving region, the sub-storage overlapping region, and the sub-storage linear region may not be set in each memory.

The capacity of each region in the system address space is the sum of the capacities of the corresponding regions in all the memories. For example, the capacity of the first system interleaving area is the sum of the capacities of the first sub-storage interleaving areas in each memory; the capacity of the second system interleaving area is the sum of the capacities of the second sub-storage interleaving areas in each memory; the capacity of the system overlapping area is the sum of the capacities of the sub-storage overlapping areas in the memories; the capacity of the linear area of the system is the sum of the capacities of the sub-storage linear areas in the memories.

In some embodiments, the addresses of the first system interleaving region, the system overlapping region, the second system interleaving region, and the system linear region are arranged in order from small to large in the system address space. In some embodiments, in the memory address space, the addresses of the first sub-memory interleaving area, the sub-memory overlapping area, the second sub-memory interleaving area, and the sub-memory linear area are arranged in order from small to large, and one or more of the first sub-memory interleaving area, the sub-memory overlapping area, the second sub-memory interleaving area, and the sub-memory linear area are not necessary, that is, the area may not be provided. For example, a first sub-memory interleaving area is not arranged in the address space of a certain memory, and the addresses of the sub-memory overlapping area, the second sub-memory interleaving area and the sub-memory linear area are arranged in the order from small to large; or, the address space of a certain memory is not provided with a sub-memory overlapping area, and the addresses of the first sub-memory interleaving area, the second sub-memory interleaving area and the sub-memory linear area are arranged from small to large; or the address space of a certain memory is not provided with a second sub-memory interleaving area, and the addresses of the first sub-memory interleaving area, the sub-memory overlapping area and the sub-memory linear area are arranged from small to large; or, the address space of a certain memory is not provided with a first sub-memory interleaving area and a sub-memory overlapping area, and the addresses of the second sub-memory interleaving area and the sub-memory linear area are arranged in the order from small to large.

By way of example, the mapping relationship between the system address space and each address space in the memory is given below under 4 configurations, and it should be understood that configurations satisfying the above relationship are all within the scope of the present invention and are not limited to the following 4 configurations.

FIG. 2 is a diagram illustrating a mapping relationship between a system address space and each in-memory address space according to an embodiment of the present invention. As shown in fig. 2, the system address space and the address spaces in the 4 memories form a mapping relationship. Specifically, the system address space is divided into a first system interleaving region (i.e., interleaving region 2a 1) having a first interleaving granularity, a second system interleaving region (i.e., interleaving region 2 A3) having a second interleaving granularity, a system overlapping region (i.e., overlapping region 2a 2) in which the first interleaving granularity and the second interleaving granularity overlap, and a system linear region (i.e., linear region 2a 4) in which interleaving is not performed. The address space of the first memory (i.e., memory 2B 0) includes a first sub-memory interleaved region (i.e., interleaved region 2B 01), a sub-memory overlapped region (i.e., overlapped region 2B 02), a second sub-memory interleaved region (i.e., interleaved region 2B 03), and a sub-memory linear region (i.e., linear region 2B 04); the address space of the second memory (i.e., memory 2B 1) includes a first sub-memory interleave area (i.e., interleave area 2B 11), a sub-memory overlap area (i.e., overlap area 2B 12), a second sub-memory interleave area (i.e., interleave area 2B 13), and a sub-memory linear area (i.e., linear area 2B 14); the address space of the third memory (i.e., memory 2B 2) includes a first sub-memory interleaved region (i.e., interleaved region 2B 21), a sub-memory overlapped region (i.e., overlapped region 2B 22), a second sub-memory interleaved region (i.e., interleaved region 2B 23), and a sub-memory linear region (i.e., linear region 2B 24); the address space of the fourth memory (i.e., memory 2B 3) includes a first sub-memory interleaved region (i.e., interleaved region 2B 31), a sub-memory overlapped region (i.e., overlapped region 2B 32), a second sub-memory interleaved region (i.e., interleaved region 2B 33), and a sub-memory linear region (i.e., linear region 2B 34).

Interleaving region 2a1 and interleaving regions 2B01, 2B11, 2B21 and 2B31 form a mapping relationship, the capacity of interleaving region 2a1 is the sum of the capacities of interleaving regions 2B01, 2B11, 2B21 and 2B31, and the capacities of interleaving regions 2B01, 2B11, 2B21 and 2B31 are all equal. The overlapping area 2a2 and the overlapping areas 2B02, 2B12, 2B22 and 2B32 form a mapping relation, the capacity of the overlapping area 2a2 is the sum of the capacities of the overlapping areas 2B02, 2B12, 2B22 and 2B32, and the capacities of the overlapping areas 2B02, 2B12, 2B22 and 2B32 are all equal. Interleaving region 2a3 and interleaving regions 2B03, 2B13, 2B23 and 2B33 form a mapping relationship, the capacity of interleaving region 2a3 is the sum of the capacities of interleaving regions 2B03, 2B13, 2B23 and 2B33, and the capacities of interleaving regions 2B03, 2B13, 2B23 and 2B33 are all equal. Linear region 2a4 maps with linear regions 2B04, 2B14, 2B24, and 2B34, and the capacity of linear region 2a4 is the sum of the capacities of linear regions 2B04, 2B14, 2B24, and 2B 34.

The first interleaving granularity is 128 bytes, and the second interleaving granularity is 4096 bytes, which is a natural boundary of most bus pages. In some embodiments, the capacity of interleaving zones 2B01, 2B11, 2B21, and 2B31 may be 128 megabytes, 256 megabytes, 512 megabytes, 768 megabytes, 1024 megabytes, 1536 megabytes, 2048 megabytes, 4096 megabytes, or other. In some embodiments, the capacity of interleaving zones 2B03, 2B13, 2B23, and 2B33 may be 128 megabytes, 256 megabytes, 512 megabytes, 768 megabytes, 1024 megabytes, 1536 megabytes, 2048 megabytes, 4096 megabytes, or other. When the system address is in the overlap area 2a2, the first interleaving granularity and the second interleaving granularity overlap, and for example, the first interleaving granularity is 128 bytes, and the second interleaving granularity is 4096 bytes, a first interleaving with granularity of 4096 bytes is performed, and then a second interleaving with granularity of 128 bytes is performed.

FIG. 3 is a diagram illustrating a mapping relationship between a system address space and each in-memory address space according to another embodiment of the present invention. As shown in fig. 3, the system address space and the address spaces in the 3 memories form a mapping relationship, and there is no overlap area in the system address space, and only one interleaving area is set in one memory. Specifically, the system address space is divided into a first system interleaving region (i.e., interleaving region 3a 1) having a first interleaving granularity (e.g., 128 bytes), a second system interleaving region (i.e., interleaving region 3a 2) having a second interleaving granularity (e.g., 4096 bytes), and a system linear region (i.e., linear region 3 A3) that does not perform interleaving. The address space of the first memory (i.e., memory 3B 0) includes a first sub-memory interleaved zone (i.e., interleaved zone 3B 01), a second sub-memory interleaved zone (i.e., interleaved zone 3B 02), and a sub-memory linear zone (i.e., linear zone 3B 03); the address space of the second memory (i.e., memory 3B 1) includes a first sub-memory interleaved zone (i.e., interleaved zone 3B 11), a second sub-memory interleaved zone (i.e., interleaved zone 3B 12), and a sub-memory linear zone (i.e., linear zone 3B 13); the address space of the third memory (i.e., memory 3B 2) includes a second sub-memory interleaved zone (i.e., interleaved zone 3B 22) and a sub-memory linear zone (i.e., linear zone 3B 23).

The interleaving area 3a1 and the interleaving areas 3B01 and 3B11 form a mapping relation, address interleaving with the granularity of 128 bytes is carried out, the capacity of the interleaving area 3a1 is the sum of the capacities of the interleaving areas 3B01 and 3B11, and the capacities of the interleaving areas 3B01 and 3B11 are equal. The interleaving area 3a2 and the interleaving areas 3B02, 3B12 and 3B22 form a mapping relation, address interleaving with the granularity of 4096 bytes is carried out, the capacity of the interleaving area 3a2 is the sum of the capacities of the interleaving areas 3B02, 3B12 and 3B22, and the capacities of the interleaving areas 3B02, 3B12 and 3B22 are all equal. Linear region 3A3 is mapped with linear regions 3B03, 3B13, and 3B23, and linear region 3A3 has a capacity equal to the sum of the capacities of linear regions 3B03, 3B13, and 3B 23.

FIG. 4 is a diagram illustrating a mapping relationship between a system address space and each in-memory address space according to another embodiment of the present invention. As shown in fig. 4, the system address space and the address spaces in the 4 memories form a mapping relationship, and only one interleaving area is arranged in the two memories. Specifically, the system address space is divided into a first system interleaving region (i.e., interleaving region 4a 1) having a first interleaving granularity, a second system interleaving region (i.e., interleaving region 4 A3) having a second interleaving granularity, a system overlapping region (i.e., overlapping region 4a 2) having the first interleaving granularity and the second interleaving granularity, and a system linear region (i.e., linear region 4a 4) not performing interleaving processing. The address space of the first memory (i.e., memory 4B 0) includes a first sub-memory interleaved region (i.e., interleaved region 4B 01), a sub-memory overlapped region (i.e., overlapped region 4B 02), a second sub-memory interleaved region (i.e., interleaved region 4B 03), and a sub-memory linear region (i.e., linear region 4B 04); the address space of the second memory (i.e., memory 4B 1) includes a first sub-memory interleaved region (i.e., interleaved region 4B 11), a sub-memory overlapped region (i.e., overlapped region 4B 12), a second sub-memory interleaved region (i.e., interleaved region 4B 13), and a sub-memory linear region (i.e., linear region 4B 14); the address space of the third memory (i.e., memory 4B 2) includes a first sub-memory interleaved region (i.e., interleaved region 4B 21), a sub-memory overlapped region (i.e., overlapped region 4B 22), and a sub-memory linear region (i.e., linear region 4B 24); the address space of the fourth memory (i.e., memory 4B 3) includes a first sub-memory interleaved region (i.e., interleaved region 4B 31), a sub-memory overlapped region (i.e., overlapped region 4B 32), and a sub-memory linear region (i.e., linear region 4B 34).

The interleaving region 4a1 and the interleaving regions 4B01, 4B11, 4B21 and 4B31 form a mapping relation, the capacity of the interleaving region 4a1 is the sum of the capacities of the interleaving regions 4B01, 4B11, 4B21 and 4B31, and the capacities of the interleaving regions 4B01, 4B11, 4B21 and 4B31 are all equal. The overlapping region 4a2 and the overlapping regions 4B02, 4B12, 4B22 and 4B32 form a mapping relationship, the capacity of the overlapping region 4a2 is the sum of the capacities of the overlapping regions 4B02, 4B12, 4B22 and 4B32, and the capacities of the overlapping regions 4B02, 4B12, 4B22 and 4B32 are all equal. Interleaved zone 4A3 forms a mapping relationship with interleaved zones 4B03 and 4B13, the capacity of interleaved zone 4A3 is the sum of the capacities of interleaved zones 4B03 and 4B13, and the capacities of interleaved zones 4B03 and 4B13 are equal. Linear region 4a4 maps with linear regions 4B04, 4B14, 4B24, and 4B34, and the capacity of linear region 4a4 is the sum of the capacities of linear regions 4B04, 4B14, 4B24, and 4B 34.

FIG. 5 is a diagram illustrating a mapping relationship between a system address space and each in-memory address space according to another embodiment of the present invention. As shown in fig. 5, the system address space and the address spaces in the 4 memories form a mapping relationship, and one of the memories is provided with only one interleaving area, and the other memory is not provided with the interleaving area. Specifically, the system address space is divided into a first system interleaving region (i.e., interleaving region 5a 1) having a first interleaving granularity, a second system interleaving region (i.e., interleaving region 5 A3) having a second interleaving granularity, a system overlapping region (i.e., overlapping region 5a 2) in which the first interleaving granularity and the second interleaving granularity overlap, and a system linear region (i.e., linear region 5a 4) in which interleaving is not performed. The address space of the first memory (i.e., memory 5B 0) includes a first sub-memory interleaving region (i.e., interleaving region 5B 01), a sub-memory overlapping region (i.e., overlapping region 5B 02), a second sub-memory interleaving region (i.e., interleaving region 5B 03), and a sub-memory linear region (i.e., linear region 5B 04); the address space of the second memory (i.e., memory 5B 1) includes a first sub-memory interleaved region (i.e., interleaved region 5B 11), a sub-memory overlapped region (i.e., overlapped region 5B 12), a second sub-memory interleaved region (i.e., interleaved region 5B 13), and a sub-memory linear region (i.e., linear region 5B 14); the address space of the third memory (i.e., memory 5B 2) includes a sub-memory overlap region (i.e., overlap region 5B 22), a second sub-memory interleave region (i.e., interleave region 5B 23), and a sub-memory linear region (i.e., linear region 5B 24); the address space of the fourth memory (i.e., the memory 5B 3) includes a sub-memory overlap region (i.e., the overlap region 5B 32) and a sub-memory linear region (i.e., the linear region 5B 34).

Interleaving region 5a1 and interleaving regions 5B01 and 5B11 form a mapping relationship, the capacity of interleaving region 5a1 is the sum of the capacities of interleaving regions 5B01 and 5B11, and the capacities of interleaving regions 5B01 and 5B11 are equal. The overlapping area 5a2 and the overlapping areas 5B02, 5B12, 5B22 and 5B32 form a mapping relation, the capacity of the overlapping area 5a2 is the sum of the capacities of the overlapping areas 5B02, 5B12, 5B22 and 5B32, and the capacities of the overlapping areas 5B02, 5B12, 5B22 and 5B32 are all equal. Interleaved region 5A3 forms a mapping relationship with interleaved regions 5B03, 5B13, and 5B23, the capacity of interleaved region 5A3 is the sum of the capacities of interleaved regions 5B03, 5B13, and 5B23, and the capacities of interleaved regions 5B03, 5B13, and 5B23 are equal. Linear region 5a4 maps with linear regions 5B04, 5B14, 5B24, and 5B34, and the capacity of linear region 5a4 is the sum of the capacities of linear regions 5B04, 5B14, 5B24, and 5B 34.

Fig. 6A is an example of the mapping relationship of the interleaving area 2a1 with the memory address space shown in fig. 2. The address of the interleaving area 2a1 in the system address space increases linearly, and the interleaving process is performed in 128-byte blocks. As shown in fig. 6A, an offset address of the input address with respect to the base address of the interleaving area 2a1 is taken, and 128-byte interleaving processing is performed at the offset address.Since the interleaving granularity is 128 bytes, i.e. 2⁷Byte, and 4 memories 2B0, 2B1, 2B2 and 2B3 are all provided with interleaving areas with the granularity of 128 bytes, bit 6 to bit 0 of the offset address should be kept unchanged, bit 8 and bit 7 of the offset address are used for distinguishing the memories, that is, the offset addresses of bit 8 and bit 7 are put together, and the remaining high-order addresses increase linearly, that is, in the offset addresses of bit 8 and bit 7, the high-order addresses of bit 9 and above increase linearly to form the offset addresses inside each memory interleaving area.

Specifically, as shown in table one below, bit 6 to bit 0 of the offset address remain unchanged, and are all 0; bit 8 and bit 7 of the offset address are used to distinguish the memory, i.e., bit 8 and bit 7 are put together with an offset address of 11, bit 8 and bit 7 are put together with an offset address of 10, bit 8 and bit 7 are put together with an offset address of 01, bit 8 and bit 7 are put together with an offset address of 00, and the addresses of bit 8 and bit 7 increase linearly, 00, 01, 10 and 11 in order; in the offset addresses of bit 8 and bit 7, the addresses of the remaining upper bits 15 to bit 9 increase linearly to 0000000, 0000001, and 0000010 in this order, and constitute offset addresses inside each memory interleave area.

The interleaved addresses are further divided into four equal parts in sequence, and the interleaved addresses are allocated to the memory 2B0 if they fall within the first quarter block, to the memory 2B1 if they fall within the second quarter block, to the memory 2B2 if they fall within the third quarter block, and to the memory 2B3 if they fall within the fourth quarter block. Specifically, in connection with Table one below, the address of bit 8 and bit 7 of 00 is allocated to memory 2B0, the address of bit 8 and bit 7 of 01 is allocated to memory 2B1, the address of bit 8 and bit 7 of 10 is allocated to memory 2B2, and the address of bit 8 and bit 7 of 11 is allocated to memory 2B 3; the addresses of the remaining high-order bits 15 to 9 increase linearly, and the address areas of the corresponding memories are sequentially filled in sequence to occupy different blocks in the address areas of the memories. Where addresses of bits 15 to 9 of 0000000 occupy a first block address region of each memory, addresses of bits 15 to 9 of 0000001 occupy a second block address region of each memory, addresses of bits 15 to 9 of 0000010 occupy a third block address region of each memory, and so on.

Watch 1

After the interleaving process, 128 bytes of data starting at address 0x0000 are allocated to the first block address area in interleaved zone 2B01, 128 bytes of data starting at address 0x0080 are allocated to the first block address area in interleaved zone 2B11, 128 bytes of data starting at address 0x0100 are allocated to the first block address area in interleaved zone 2B21, 128 bytes of data starting at address 0x0180 are allocated to the first block address area in interleaved zone 2B31, 128 bytes of data starting at address 0x0200 are allocated back to the second block address area in interleaved zone 2B01, and so on. That is, after the interleaving process, the addresses are evenly interleaved into the memory interleaving areas 2B01, 2B11, 2B21 and 2B31 in units of 128 bytes as blocks, the linearly increasing addresses in the system address space are evenly distributed into 4 memories 2B0, 2B1, 2B2 and 2B3 until the upper limit of the address of the interleaving area 2a1, and it can be seen that after the fine-grained interleaving, the continuous linear read-write tasks are evenly distributed into 4 memory controllers, so the read-write performance of the memory controllers can be more effectively utilized.

In the method, if two memories are provided with interleaving areas with the granularity of 128 bytes, bits 6 to 0 of the offset address are kept unchanged, bits 7 of the offset address are taken for distinguishing the memories, namely the offset addresses with the same bits 7 are put together, and the rest high-order addresses are linearly increased, namely the high-order addresses with the bits 8 and above are linearly increased in the offset addresses with the same bits 7, so that the offset addresses in the interleaving areas of the memories are formed.

Fig. 6B is an example of the mapping relationship of the interleaving area 2a3 with the memory address space shown in fig. 2. The address of the interleaving area 2a3 in the system address space increases linearly, and interleaving is performed in blocks of 4096 bytes. As shown in fig. 6B, an offset address of the input address with respect to the base address of the interleaving area 2a3 is taken, 4096 bytes of interleaving is performed at the offset address, and a linearly increasing address is scrambled using a pseudo-random algorithm. The interleaved addresses are further divided into four equal parts in sequence, and the interleaved addresses are allocated to the memory 2B0 if they fall within the first quarter block, to the memory 2B1 if they fall within the second quarter block, to the memory 2B2 if they fall within the third quarter block, and to the memory 2B3 if they fall within the fourth quarter block. Specifically, in FIG. 6B, the 1 st to 3 rd 4096 byte blocks of memory 2B0 are from the 1 st to 3 rd 4096 byte data of the first quarter block after interleaving, i.e., 4096 byte data from the system address space beginning at addresses 0x000000, 0x3f4000 and 0x187000, respectively; the 1 st to 3 rd 4096 byte blocks of data of memory 2B1 are from the 1 st to 3 rd 4096 byte data of the second four equal blocks after interleaving, i.e. 4096 byte data from the system address space starting with addresses 0x28e000, 0x364000 and 0x221000, respectively, and so on. After interleaving, the continuous read-write addresses which are originally linearly increased are randomly distributed to different positions of different memories by taking 4096 bytes as blocks.

Fig. 6C is an example of a mapping relationship of the overlapping area 2a2 shown in fig. 2 with a memory address space. The addresses of the overlap area 2a2 in the system address space increase linearly, and as shown in fig. 6C, an offset address of the input address with respect to the base address of the overlap area 2a2 is taken, and interleaving is performed at the offset address. Specifically, first interleaving with 4096 bytes as blocks is performed, a linear increasing address is scrambled by using a pseudo-random algorithm, then the scrambled address is subjected to second interleaving, each 4096-byte block is first partitioned according to 128 bytes, namely each 4096-byte block is further partitioned to obtain 32 128-byte blocks, and then data corresponding to the address are uniformly distributed to 4 memories in sequence according to the rule of fig. 6A. After the first interleaving process, 4096 bytes of data beginning at address 0x000000 are mapped to a first block address space, 4096 bytes of data beginning at address 0x3f4000 are mapped to a second block address space, and 4096 bytes of data beginning at address 0x187000 are mapped to a third block address space … …. After the second interleaving process, the data in the first 128 bytes space of the memory 2B0 comes from 128 bytes of data starting at 0x000000 in the system address space, the data in the first 128 bytes space of the memory 2B1 comes from 128 bytes of data starting at 0x000080 in the system address space, the data in the first 128 bytes space of the memory 2B2 comes from 128 bytes of data starting at 0x000100 in the system address space, the data in the first 128 bytes space of the memory 2B3 comes from 128 bytes of data starting at 0x000180 in the system address space, … …, the data in the 33 th 128 bytes space of the memory 2B0 comes from 128 bytes of data starting at 0x3f4000 in the system address space, the data in the 33 th 128 bytes space of the memory 2B1 comes from 128 bytes of data starting at 0x3f4080 in the system address space, the data in the 33 th 128 bytes space of the memory 2B2 comes from 128 bytes of data starting at 0x3f4100 in the system address space, the 33 rd block of 128 bytes of space data for memory 2B3 comes from the 128 bytes of data beginning at 0x3f4180 in the system address space.

It can be seen that after the interleaving process of fig. 6A, the original linear address is still uniformly distributed to 4 memories in a linear form, and after the two interleaving processes of fig. 6C, the original linear address is distributed to different positions in 4 memories in an irregular form. For the memory controller, in a large number of continuous incremental read-write accesses, through two times of interleaving, received read-write requests are disturbed to each area in the memory, and compared with the method that 4 memory controllers are uniformly utilized to perform local linear continuous read-write in fig. 6A, the read-write access efficiency of the memory controller can be further improved.

It should be understood that the mapping relationships in fig. 6B and 6C are only examples for convenience of expression and to help the reader understand that the actual address mapping relationships may not be so, and therefore the mapping relationships should not be construed as limiting the present invention.

Fig. 6D is an example of a mapping relationship of the linear region 2a4 shown in fig. 2 to a memory address space. The addresses of the linear region 2a4 in the system address space increase linearly, and the linear region 2a4 is mapped to linear regions in the 4 memories 2B0, 2B1, 2B2, and 2B3 in order, and the addresses also increase continuously and linearly in the linear region of each memory. Specifically, the addresses of the linear region 2A4 are mapped one by one from small to large to the linear region 2B04 of the memory 2B0, upon reaching the top of linear region 2B04 of memory 2B0, the next address of linear region 2a4 is mapped to the starting location of linear region 2B14 of memory 2B1, and continues to be mapped one by one from smaller to larger to linear region 2B14 of memory 2B1, upon reaching the top of linear region 2B14 of memory 2B1, the next address of linear region 2a4 is mapped to the starting location of linear region 2B24 of memory 2B2, and continues to be mapped one by one from smaller to larger to linear region 2B24 of memory 2B2, upon reaching the top of linear region 2B24 of memory 2B2, the next address of linear region 2a4 is mapped to the starting position of linear region 2B34 of memory 2B3, and continues to map one by one from smaller to larger to the linear region 2B34 of memory 2B3, filling the linear regions of each memory in turn. Unlike the above interleaving region and the overlapping region, if the processing unit uses only a certain section of the linear region, only the memory corresponding to the section needs to be kept in an active state, and the remaining memories do not need to be kept in an active state and may even be powered off.

Similar to fig. 6A, the linearly incremented addresses in the system address space interleaved region 3a1 in fig. 3 are partitioned into blocks at a first interleaving granularity (e.g., 128 bytes), and are evenly allocated to the interleaved region 3B01 and the interleaved region 3B11 of the memory address space, when the read and write tasks in the interleaved region 3a1 do not affect the memory 3B 2; similar to FIG. 6B, the linearly incremented addresses in system address space interleaved region 3A2 are assigned to interleaved regions 3B02, 3B12, and 3B22 of the memory address space after being blocked at a second interleaving granularity (e.g., 4096 bytes), scrambled using a pseudo-random algorithm; similarly to FIG. 6D, the linearly incremented addresses in system address space linear region 3A3 are sequentially mapped in a linear fashion to linear regions 3B03, 3B13, and 3B23, sequentially filling the linear region of each memory.

Similar to FIG. 6A, the linearly incremented addresses in the system address space interleaved zone 4A1 in FIG. 4 are partitioned at a first interleaving granularity (e.g., 128 bytes), evenly distributed to the interleaved zones 4B01, 4B11, 4B21, and 4B31 of the memory address space; similarly to fig. 6C, the linearly incremented addresses in the system address space overlap region 4a2 undergo two interleaving processes and are randomly divided into the overlap regions 4B02, 4B12, 4B22, and 4B32 of the memory address space; similar to FIG. 6B, the linearly increasing addresses in the system address space interleaving region 4A3 are distributed to the interleaving regions 4B03 and 4B13 after being blocked at a second interleaving granularity (e.g., 4096 bytes) and scrambled using a pseudo-random algorithm; similarly to fig. 6D, linearly increasing addresses in system address space linear region 4a4 are sequentially mapped in a linear fashion to linear regions 4B04, 4B14, 4B24, and 4B34, sequentially filling the linear regions of each memory.

Similar to FIG. 6A, the linearly incremented addresses in the interleaved section of system address space 5A1 in FIG. 5 are partitioned at a first interleaving granularity (e.g., 128 bytes), evenly distributed to the interleaved sections of memory address space 5B01 and 5B 11; similarly to fig. 6C, the linearly incremented addresses in the system address space overlap region 5a2 are subjected to two interleaving processes and are randomly allocated to the overlap regions 5B02, 5B12, 5B22, and 5B32 of the memory address space; similar to FIG. 6B, the linearly increasing addresses in system address space interleaved zone 5A3 are distributed into interleaved zones 5B03, 5B13, and 5B23 after being blocked at a second interleaving granularity (e.g., 4096 bytes) and scrambled using a pseudo-random algorithm; similarly to fig. 6D, linearly increasing addresses in system address space linear region 5a4 are sequentially mapped in a linear fashion to linear regions 5B04, 5B14, 5B24, and 5B34, sequentially filling the linear regions of each memory.

A specific implementation manner of the interleaving process for completing the second interleaving granularity according to the embodiment of the present invention is described below by taking 4096 bytes as an example.

The address mapping of the system overlapping area and the second system interleaving area both need to perform interleaving processing of the second interleaving granularity, and the interleaving processing target is an offset address of an input address relative to a base address of the system overlapping area/the second system interleaving area. Since the interleaving granularity is 4096 bytes, i.e. 2^tByte, t =12, bit 11 to bit 0 of the offset address should be kept unchanged, and the upper bits of bit 12 and above of the offset address are processed so as to be non-linearly scrambled. Furthermore, the system overlap region/second system is not interleaved in view of flexibilityThe capacity of the area is set as a fixed value, but the minimum capacity of the system overlapping area/the second system interleaving area is set, 12 is subtracted from the address width corresponding to the minimum capacity to be used as the width of a one-to-one mapping function, and the rest high-order bits can realize nonlinear scrambling by exchanging positions with the mapping result.

Specifically, the method comprises the following steps:

(1) setting the minimum capacity M of the system overlapping area/the second system interleaving area, wherein the address width corresponding to the minimum capacity M is w, and taking bits w-1 to t (marked as addr _ offset [ w-1: t ]) of an offset address of an input address relative to the base address of the system overlapping area/the second system interleaving area as the input of a nonlinear mapping function to obtain an output hash _ out with the number of bits equal to the number of the input bits.

For example, if the minimum size of the system overlap area/second system interleave area is 256MB, the address width w =28 corresponding to the minimum size is taken, and bits 27 to 12 of the offset address (labeled addr _ offset [27:12], 16 bits in total, where addr _ offset [27] represents bit 27 of the offset address, addr _ offset [26] represents bit 26 of the offset address, and so on, addr _ offset [12] represents bit 12 of the offset address) are taken as the input of the following nonlinear mapping function, and an output hash _ out of 16 bits is obtained.

hash_out[0] = x[15] ^ x[13] ^ x[4] ^ x[0];

hash_out[1] = x[15] ^ x[14] ^ x[13] ^ x[5] ^ x[4] ^ x[1] ^ x[0];

hash_out[2] = x[14] ^ x[13] ^ x[6] ^ x[5] ^ x[4] ^ x[2] ^ x[1] ^ x[0];

hash_out[3] = x[15] ^ x[14] ^ x[7] ^ x[6] ^ x[5] ^ x[3] ^ x[2] ^ x[1];

hash_out[4] = x[13] ^ x[8] ^ x[7] ^ x[6] ^ x[3] ^ x[2] ^ x[0];

hash_out[5] = x[14] ^ x[9] ^ x[8] ^ x[7] ^ x[4] ^ x[3] ^ x[1];

hash_out[6] = x[15] ^ x[10] ^ x[9] ^ x[8] ^ x[5] ^ x[4] ^ x[2];

hash_out[7] = x[15] ^ x[13] ^ x[11] ^ x[10] ^ x[9] ^ x[6] ^ x[5] ^ x[4] ^ x[3] ^ x[0];

hash_out[8] = x[15] ^ x[14] ^ x[13] ^ x[12] ^ x[11] ^ x[10] ^ x[7] ^ x[6] ^ x[5] ^ x[1] ^ x[0];

hash_out[9] = x[14] ^ x[12] ^ x[11] ^ x[8] ^ x[7] ^ x[6] ^ x[4] ^ x[2] ^ x[1] ^ x[0];

hash_out[10] = x[15] ^ x[13] ^ x[12] ^ x[9] ^ x[8] ^ x[7] ^ x[5] ^ x[3] ^ x[2] ^ x[1];

hash_out[11] = x[15] ^ x[14] ^ x[10] ^ x[9] ^ x[8] ^ x[6] ^ x[3] ^ x[2] ^ x[0];

hash_out[12] = x[13] ^ x[11] ^ x[10] ^ x[9] ^ x[7] ^ x[3] ^ x[1] ^ x[0];

hash_out[13] = x[14] ^ x[12] ^ x[11] ^ x[10] ^ x[8] ^ x[4] ^ x[2] ^ x[1];

hash_out[14] = x[15] ^ x[13] ^ x[12] ^ x[11] ^ x[9] ^ x[5] ^ x[3] ^ x[2];

hash_out[15] = x[15] ^ x[14] ^ x[12] ^ x[10] ^ x[6] ^ x[3] ^ x[0];

Where ^ represents exclusive or logic, x [ i ] represents the ith bit of binary number x, where i =0, 1, …, 15, addr _ offset [27:12] is substituted into the nonlinear mapping function, x [ i ] = addr _ offset [ i +12], hash _ out [0] represents bit 0 of hash _ out, hash _ out [1] represents bit 1 of hash _ out, and so on, hash _ out [15] represents bit 15 of hash _ out.

It should be understood that the above functions are only examples, and are not exclusive, and the conditions of 16-bit input, 16-bit output, one-to-one non-linear mapping, etc. are satisfied.

(2) And aiming at the total capacity of different system overlapping areas/second system interleaving areas, taking different bits of address offset for standby.

Specifically, in some embodiments, the total capacity of the system overlap area/second system interleave area is 2 of the minimum capacity, corresponding to an address width WⁿWhen the address is multiplied by 3, the bits W-1 to W-2 of the offset address are taken for standby and are recorded as addr _ offset [ W-1: W-2%]。

In some embodiments, the total capacity of the system overlap region/second system interleave region is 2 of the minimum capacityⁿWhen 9 times, the bits W-1 to W-4 of the offset address are taken for standby and recorded as addr _ offset [ W-1: W-4]](ii) a Wherein n is not less than 0 andn is an integer.

For example, when the total capacity of the system overlap area/the second system interleave area is 768MB, the address width W =30 corresponding to the total capacity is obtained by taking bits 29 to 28 of the offset address as addr _ offset [29:28 ];

when the total capacity of the system overlapping area/the second system interleaving area is 1.5GB, the address width W =31 corresponding to the total capacity, and bits 30 to 29 of the offset address are taken for standby and are marked as addr _ offset [30:29 ];

when the total capacity of the system overlapping area/the second system interleaving area is 3GB, the address width W =32 corresponding to the total capacity, and bits 31 to 30 of the offset address are taken for standby and recorded as addr _ offset [31:30 ];

when the total capacity of the system overlapping area/the second system interleaving area is 6GB, the address width W =33 corresponding to the total capacity, and bits 32 to 31 of the offset address are taken for standby and recorded as addr _ offset [32:31 ];

when the total capacity of the system overlapping area/the second system interleaving area is 12GB, the address width W =34 corresponding to the total capacity, and bits 33 to 32 of the offset address are taken for standby and are marked as addr _ offset [33:32 ];

when the total capacity of the system overlap area/the second system interleaving area is 2.25GB, the address width W =32 corresponding to the total capacity, and bits 31 to 28 of the offset address are taken for standby and recorded as addr _ offset [31:28 ];

when the total capacity of the system overlap area/the second system interleave area is 4.5GB, the address width W =33 corresponding to the total capacity is set from bit 32 to bit 29 of the offset address as addr _ offset [32:29 ].

(3) Mapping the spare bits based on the hash _ out and the offset address to obtain an output hash _ div with the bits equal to the spare bits in number;

specifically, in some embodiments, the total capacity of the system overlap region/second system interleave region is 2 of the minimum capacityⁿAnd when the number is 3 times, trisecting partial bit data based on the hash _ out, and mapping according to the interval of the partial bit data of the hash _ out in the trisection and the value of the spare bit to obtain the output with the bit number equal to the number of the spare bit.

In some embodiments, the total capacity of the system overlap region/second system interleave region is 2 of the minimum capacityⁿAnd 9 times, performing nine divisions on all the bit data based on the hash _ out, and mapping according to the interval of all the bit data of the hash _ out in the nine divisions and the values of the spare bits to obtain the output with the number of the bits equal to the number of the spare bits.

For example, when the total capacity is 768MB, 1.5GB, 3GB, 6GB, or 12GB, the mapping result may be as shown in table two below, where hash _ out [15:8] is bits 15 to 8 of hash _ out, and hash _ div3_ h2b = addr _ offset [ W-1: W-2 ].

Watch two

For example, when the total capacity is 2.25GB or 4.5GB, the mapping result may be obtained by dividing the value 0-65536 of hash _ out into nine equal parts as shown in table three below, to obtain an interval 0-8, where hash _ div9_ h4b = addr _ offset [ W-1: W-4 ].

Watch III

(4) And calculating to obtain the output address by adopting different modes according to the total capacity of the different system overlapping areas/the second system interleaving areas.

Specifically, in some embodiments, when the total capacity is the minimum capacity, the output address is { L'd 0, hash _ out, addr _ offset [ t-1:0] }, representing the concatenation of L binary 0 s, hash _ out, and addr _ offset [ t-1:0 ].

In some embodiments, at 2 of minimum capacity in total capacity^mWhen the address width is multiplied, the total capacity corresponds to W, and the output addresses are { L'd 0, hash _ out, addr _ offset [ W-1: W], addr_offset[11:0]Represents L binary 0 s, hash _ out, addr _ offset [ W-1: W]And addr _ offset [11: 0]]Wherein m is a positive integer. That is, by applying itThe remaining high order bits addr _ offset [ W-1: W ]]And the positions of the mapping result hash _ out are interchanged, so that nonlinear scrambling is realized.

In some embodiments, at a total capacity of 3 or 9 times the minimum capacity, the output address is { L'd 0, hash _ div, hash _ out, addr _ offset [ t-1:0] }, representing the concatenation of L binary 0 s, hash _ div, hash _ out, and addr _ offset [ t-1:0 ].

In some embodiments, at 2 of minimum capacity in total capacity^mAt 3 times, the output address is { L'd 0, hash _ div, hash _ out, addr _ offset [ W-3: W], addr_offset[t-1:0]Represents L binary 0 s, hash _ div, hash _ out, addr _ offset [ W-3: W]And addr _ offset [ t-1:0]]Splicing.

In some embodiments, at 2 of minimum capacity in total capacity^m9 times, the output address is { L'd 0, hash _ div, hash _ out, addr _ offset [ W-5: W], addr_offset[t-1:0]Denotes L binary 0 s, hash _ div, hash _ out, addr _ offset [ W-5: W]And addr _ offset [ t-1:0]And (4) splicing.

In the above situation, the value of L may be different, and L should be a proper value, so that the total number of bits of the output address meets the requirement. Table four below gives examples of the output addresses addr _ out [39:0] at different capacities with the output address bit 40.

Watch four

Fig. 7 is a schematic structural diagram of an interleaving module according to an embodiment of the present invention. As shown in fig. 7, the interleaving module includes an interleaving pre-processing module, a first interleaving processing module, a second interleaving processing module, and a linear processing module. The first input end of the interleaving preprocessing module is used for connecting with the system register and obtaining the interleaving control signal from the system register, and the second input end of the interleaving preprocessing module is used for connecting with the processing unit through an address signal line in the data bus and obtaining the address information in the data read-write command from the processing unit. The output end of the interleaving preprocessing module is connected with the first input end of the first interleaving processing module. The second input end of the first interleaving processing module is used for connecting with the system register and obtaining the interleaving control signal from the system register, and the output end of the first interleaving processing module is connected with the first input end of the second interleaving processing module. The second input end of the second interleaving processing module is used for connecting the system register and obtaining the interleaving control signal from the system register, and the output end of the second interleaving processing module is connected with the first input end of the linear processing module. The second input end of the linear processing module is used for connecting with the system register and obtaining the interleaving control signal from the system register, and the output end of the linear processing module is used for connecting with the SoC data bus and sending the address information after interleaving processing to the memory controller. In addition, the interleaving module is used for being connected with the processing unit through a data signal line in the data bus, and is used for acquiring data information in a data read-write instruction from the processing unit and sending the data information to the storage controller through the SoC data bus.

The interleaving preprocessing module analyzes the received interleaving control signal to determine which partition of the system address space the current address belongs to, further judges the subsequent processing mode of the current address, and sends the judgment result to the first interleaving processing module along with the current address.

Specifically, the interleaving preprocessing module judges that the current address needs to be processed by the first interleaving processing module when the current address falls in the second system interleaving area or the system overlapping area, or judges that the current address does not need to be processed by the first interleaving processing module. When the current address is in the first system interleaving area or the system overlapping area, the interleaving preprocessing module judges that the current address needs to be processed by the second interleaving processing module, otherwise, judges that the current address does not need to be processed by the second interleaving processing module.

The interleaving control signal includes: enabling state of the first system interleaving area, the number of first sub-storage interleaving areas forming a mapping relation with the first system interleaving area, and capacity of the first sub-storage interleaving areas; enabling state of the system overlapping area, number of sub storage overlapping areas forming a mapping relation with the system overlapping area, and capacity of the sub storage overlapping areas; enabling state of the second system interleaving area, the number of second sub-storage interleaving areas forming a mapping relation with the second system interleaving area, and capacity of the second sub-storage interleaving areas; the number of memories and the capacity of each memory.

The first interleaving module is responsible for performing interleaving at a second interleaving granularity (e.g., 4096 bytes). Specifically, the first interleaving processing module determines whether to perform interleaving processing on the current address according to the judgment result from the interleaving preprocessing module, and when the judgment result of the interleaving preprocessing module is that the current address needs to be processed by the first interleaving processing module, the first interleaving processing module performs interleaving processing of a second interleaving granularity on the current address according to the interleaving control signal, and sends the interleaved address and the judgment result of the interleaving preprocessing module to the second interleaving processing module; and when the judgment result of the interleaving preprocessing module is that the current address does not need to be processed by the first interleaving processing module, the first interleaving processing module sends the current address and the judgment result of the interleaving preprocessing module to the second interleaving processing module.

The specific way of performing the interleaving process with the second interleaving granularity on the current address may refer to the embodiments described in fig. 6B and fig. 6C, and is not described herein again because it has been described in detail above.

The second interleaving processing module is responsible for performing interleaving at the first interleaving granularity (e.g., 128 bytes). Specifically, the second interleaving processing module determines whether to interleave the address from the first interleaving processing module according to the judgment result of the interleaving preprocessing module, and when the judgment result of the interleaving preprocessing module is that the current address needs to be processed by the second interleaving processing module, the second interleaving processing module performs interleaving processing with a first interleaving granularity on the address from the first interleaving processing module according to the interleaving control signal, and sends the interleaved address and the judgment result of the interleaving preprocessing module to the linear processing module; and when the judgment result of the interleaving preprocessing module is that the current address does not need to be processed by the second interleaving processing module, the second interleaving processing module sends the address from the first interleaving processing module and the judgment result of the interleaving preprocessing module to the linear processing module.

The specific way of performing the interleaving process with the first interleaving granularity on the current address may refer to the embodiments described in fig. 6A and fig. 6C, and is not described herein again because it has been described in detail above. If the bus allows burst transfers and the address of a burst transfer spans 128 bytes, then a long burst transfer needs to be split into multiple short burst transfers to accommodate 128-byte interleaving.

The linear processing module determines the processing mode of the address from the second interleaving processing module according to the judgment result of the interleaving preprocessing module. Specifically, when the judgment result of the interleaving pre-processing module is that the current address needs to be processed by at least one of the first interleaving processing module and the second interleaving processing module, a first linear processing mode is adopted, which includes mapping the address from the second interleaving processing module to the corresponding region of each memory address space in a sequentially and equally dividing mode, and specifically refer to the embodiments described in fig. 6A to 6C, which is not described herein again because the detailed description is given above; when the result of the interleaving pre-processing module is that the current address does not need to be processed by the first interleaving processing module or the second interleaving processing module, a second linear processing mode is adopted, including sequentially filling the address from the second interleaving processing module into the sub-storage linear regions of the address spaces of the memories, which may be referred to as the embodiment described in fig. 6D.

As described above, the CPU stores the configuration information in the system register through the configuration bus, the system register outputs the interleaving control signal to each interleaving module, and each interleaving module receives the interleaving control signal from the system register. Specifically, the configuration information may be reasonably configured by the user according to the read-write performance requirements of each processing unit.

Fig. 8 shows an implementation step of configuring an interleaving policy in the SoC and allocating read-write spaces of appropriate areas to processes of the processing unit.

The method comprises the following steps: before starting a task, a user needs to evaluate the read-write performance requirements of each processing unit in a future period of time, and perform reasonable interleaving configuration according to the read-write performance requirements, including configuring the opening and closing states, the number, the capacity and the like of an interleaving area/an overlapping area/a linear area in a memory.

In some embodiments, the read-write performance requirements of the processing unit include: high read and write performance requirements (labeled as first read and write performance requirements), medium read and write performance requirements (labeled as second read and write performance requirements), and low read and write performance requirements (labeled as third read and write performance requirements). Under the requirement of high read-write performance, each process of each processing unit needs a bus to have higher throughput rate, so all storage spaces are configured into a storage overlapping area (marked as a first configuration type) to provide the highest read-write performance as much as possible, for example, the storage overlapping areas of 4 memories are opened, other areas are closed, and the total capacity of the storage overlapping areas is 2048 MB; under the requirement of medium read-write performance, part of processes in each processing unit need higher read-write throughput, part of processes need medium read-write throughput, and part of processes need lower read-write throughput, therefore, a first memory interleaving area, a memory overlapping area, a second memory interleaving area and a memory linear area are simultaneously arranged in a memory, and the quantity and the capacity (marked as a second configuration type) of the first memory interleaving area, the memory overlapping area, the second memory interleaving area and the memory linear area are distributed according to the memory occupation ratio required by each type of process, for example, a first memory interleaving area (interleaving area 1) of 2 memories is opened, the total capacity is 256MB, memory overlapping areas of 4 memories are opened, the total capacity is 1024MB, a second memory interleaving area (interleaving area 2) of 4 memories is opened, the total capacity is 256MB, and memory linear areas of 4 memories are opened; under the low read-write performance requirement, the read-write throughput rate of a single memory can meet the performance requirement of most processes, and only a few processes need multiple memory interleaving to meet the read-write performance requirement, therefore, a first memory interleaving area, a second memory interleaving area and a memory linear area are simultaneously arranged in the memory, and most of the capacity of the memory is used as the memory linear area (marked as a third configuration type), for example, the first memory interleaving area (interleaving area 1) of 2 memories is opened, the total capacity is 128MB, the memory interleaving area is closed, the second memory interleaving area (interleaving area 2) of 2 memories is opened, the total capacity is 128MB, and the memory linear area of 4 memories is opened.

It should be noted that the three performance requirements are only a part of the support in the present solution, the configuration types supported by the present solution are not limited to the three, and there are many configuration types for the user to select, and the user can select a suitable configuration type according to the actual requirement.

And step two, the user allocates proper read-write space to each process according to the read-write performance requirement of each process.

Under a first configuration type, all processes are in high-performance read-write requirements, and space in a storage overlapping area is allocated to all processes; under a second configuration type, allocating space in a storage overlapping area to a process with the highest performance requirement, allocating space in a first storage interleaving area to a process with a higher performance requirement, allocating space in a second storage interleaving area to a process with a lower performance requirement, and allocating space in a storage linear area to a process with the lowest performance requirement; under the third configuration type, the space in the linear storage area is allocated to the processes occupying most of the processes with low performance requirements, and the space in the first storage interleaving area or the second storage interleaving area is allocated to the processes occupying less of the processes with higher performance requirements. And the interleaving granularity of the first memory interleaving area is smaller than that of the second memory interleaving area.

And thirdly, after acquiring the allocated memory space, each process starts to work, the processing unit of each process sends a data read-write request (namely a data read-write instruction) to the corresponding interleaving module, and after receiving the data read-write request, each interleaving module performs interleaving processing on the current address in the data read-write request according to the configuration type (namely configuration information).

Step four, if the read-write performance requirement of the process changes, judging whether the current interweaving configuration can meet the read-write requirement of the process, if so, returning to the step two, and reallocating the memory space; and if the current interleaving configuration can not meet the process reading and writing requirements, returning to the step one, reevaluating the reading and writing performance requirements of each processing unit, and carrying out reasonable interleaving configuration according to the reevaluation. Therefore, the problems that data interleaving provided by the SoC is not flexible enough, the interleaving processing mode is limited, and the SoC cannot provide channel interleaving technologies with various granularities and various modes according to the requirements of a working scene can be solved.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Moreover, various embodiments or examples and features of various embodiments or examples described in this specification can be combined and combined by one skilled in the art without being mutually inconsistent.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more (two or more) executable instructions for implementing specific logical functions or steps in the process. And the scope of the preferred embodiments of the present application includes other implementations in which functions may be performed out of the order shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. All or part of the steps of the method of the above embodiments may be implemented by hardware that is configured to be instructed to perform the relevant steps by a program, which may be stored in a computer-readable storage medium, and which, when executed, includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module may also be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. The storage medium may be a read-only memory, a magnetic or optical disk, or the like.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive various changes or substitutions within the technical scope of the present application, and these should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for interleaving memory data, comprising:

dividing a system address space to obtain a first system interleaving area with a first interleaving granularity, a second system interleaving area with a second interleaving granularity and a system linear area;

dividing a memory address space to obtain a first memory interleaving area with a first interleaving granularity, a second memory interleaving area with a second interleaving granularity and a memory linear area;

the first system interleaving area and the first storage interleaving area form a mapping relation, the second system interleaving area and the second storage interleaving area form a mapping relation, the system linear area and the storage linear area form a mapping relation, and the first interleaving granularity is smaller than the second interleaving granularity.

2. The memory data interleaving method according to claim 1, wherein the first memory interleaving areas comprise an even number of first sub memory interleaving areas with equal capacity, and the even number of first sub memory interleaving areas are respectively arranged in different memories; the second memory interleaving area comprises a second sub memory interleaving area or a plurality of second sub memory interleaving areas with the same capacity, and the plurality of second sub memory interleaving areas are respectively arranged in different memories; the storage linear region includes a plurality of sub-storage linear regions, which are respectively disposed in different memories.

3. The memory data interleaving method according to claim 2, wherein the capacity of the first system interleaving area is the sum of the capacities of the first sub-memory interleaving areas in each memory; the capacity of the second system interleaving area is the sum of the capacities of the second sub-storage interleaving areas in each memory; the capacity of the linear area of the system is the sum of the capacities of the sub-storage linear areas in the memories.

4. The memory data interleaving method of claim 3, further comprising:

dividing a system address space to obtain a system overlapping area with a first interleaving granularity and a second interleaving granularity;

dividing a memory address space to obtain a memory overlapping area with a first interleaving granularity and a second interleaving granularity;

the system overlapping area and the storage overlapping area form a mapping relation.

5. The memory data interleaving method as claimed in claim 4, wherein the memory overlap area includes an even number of sub memory overlap areas having equal capacity, and the even number of sub memory overlap areas are respectively disposed in different memories.

6. The method of memory data interleaving as claimed in claim 5, wherein the capacity of the system overlap region is the sum of the capacities of the sub-memory overlap regions in the respective memories.

7. The memory data interleaving method according to any one of claims 1 to 6, wherein addresses of the first system interleaving area, the second system interleaving area, and the system linear area are arranged in order from small to large in the system address space; or the addresses of the first system interleaving area, the system overlapping area, the second system interleaving area and the system linear area are arranged from small to large.

8. The memory data interleaving method according to claim 7, wherein addresses of the first sub-memory interleaving area, the sub-memory overlapping area, the second sub-memory interleaving area, and the sub-memory linear area are arranged in order from small to large in the respective memories of the memory address space.

9. The memory data interleaving method of claim 7, wherein there are one or more memories having no one or more of the first sub-memory interleaving area, the sub-memory overlapping area, the second sub-memory interleaving area, and the sub-memory linear area in the memory address space.

10. The memory data interleaving method as claimed in claim 9, wherein in each memory of the memory address space, a first sub-memory interleaving area address < sub-memory overlap area address < second sub-memory interleaving area address < sub-memory linear area address is satisfied, and addresses of the respective areas are sequentially arranged.

11. The memory data interleaving method according to any one of claims 1 to 6, further comprising: and distributing the read-write space of the proper area for each process of the processing unit according to the read-write performance requirement of the processing unit.

12. The method for interleaving memory data according to claim 11, wherein said allocating a read-write space of an appropriate area for each process of the processing unit according to the read-write performance requirement of the processing unit specifically comprises:

configuring the opening and closing states, the quantity and the capacity of a first sub-storage interleaving area, a second sub-storage interleaving area and a sub-storage linear area in a memory address space according to the read-write performance requirement of a processing unit; or configuring the opening and closing states, the number and the capacity of a first sub-memory interleaving area, a sub-memory overlapping area, a second sub-memory interleaving area and a sub-memory linear area in a memory address space;

according to the read-write performance requirements of the processing unit, further judging the read-write performance requirements of each process of the processing unit, and distributing proper read-write space for each process by combining configuration information;

starting each process of the processing unit, and performing interleaving processing on the current address in the data read-write instruction according to the data read-write instruction and the configuration information sent by the processing unit.

13. An interleaving module is characterized by comprising an interleaving preprocessing module, a first interleaving processing module, a second interleaving processing module and a linear processing module;

the interleaving preprocessing module is used for determining which partition of the system address space the current address belongs to according to the interleaving control signal, further judging the subsequent processing mode of the current address and sending the judgment result to the first interleaving processing module;

the first interleaving processing module is used for determining whether to perform interleaving processing of a second interleaving granularity on the current address according to the judgment result of the interleaving preprocessing module;

the second interleaving processing module is used for determining whether to perform interleaving processing of a first interleaving granularity on the address from the first interleaving processing module according to the judgment result of the interleaving preprocessing module; the first interleaving granularity is less than the second interleaving granularity;

and the linear processing module is used for determining to process the address from the second interleaving processing module by adopting a first linear processing mode or a second linear processing mode according to the judgment result of the interleaving preprocessing module.

14. The interleaving module of claim 13, wherein the system address space includes a first system interleaving region having a first interleaving granularity, a second system interleaving region having a second interleaving granularity, and a system linear region; when the current address is in the second system interleaving area, the interleaving preprocessing module judges that the current address needs to be processed by the first interleaving processing module; and when the current address falls in the interleaving area of the first system, the interleaving preprocessing module judges that the current address needs to be processed by the second interleaving processing module.

15. The interleaving module of claim 14, wherein the system address space further comprises a system overlap region where the first interleaving granularity and the second interleaving granularity overlap; and when the current address falls in the system overlapping area, the interleaving preprocessing module judges that the current address needs to be processed by the first interleaving processing module and the second interleaving processing module.

16. The interleaving module according to claim 15, wherein when the result of the interleaving pre-processing module is that the current address needs to be processed by the first interleaving processing module, the first interleaving processing module performs interleaving processing with a second interleaving granularity on the current address according to an interleaving control signal, and sends the address after interleaving processing to the second interleaving processing module; and when the judgment result of the interleaving preprocessing module is that the current address does not need to be processed by the first interleaving processing module, the first interleaving processing module sends the current address to the second interleaving processing module.

17. The interleaving module according to claim 15, wherein when the result of the interleaving pre-processing module is that the current address needs to be processed by the second interleaving processing module, the second interleaving processing module performs interleaving processing with a first interleaving granularity on the address from the first interleaving processing module according to an interleaving control signal, and sends the address after interleaving processing to the linear processing module; and when the judgment result of the interleaving preprocessing module is that the current address does not need to be processed by the second interleaving processing module, the second interleaving processing module sends the address from the first interleaving processing module to the linear processing module.

18. The interleaving module according to claim 15, wherein when the result of the interleaving pre-processing module is that the current address needs to be processed by at least one of the first interleaving processing module and the second interleaving processing module, the first linear processing manner is adopted, otherwise, the second linear processing manner is adopted.

19. The interleaving module of claim 18, wherein said first linear processing comprises: mapping the addresses from the second interleaving processing module to corresponding areas of each memory address space in a sequential equal division manner; the second linear processing mode comprises the following steps: and sequentially filling the addresses from the second interleaving processing module into corresponding areas of the address spaces of the memories.

20. A system on a chip comprising one or more processing units, one or more interleaving modules according to any one of claims 13 to 19 in a one-to-one correspondence with the one or more processing units, a plurality of memory controllers, and a plurality of memories in a one-to-one correspondence with the plurality of memory controllers.