CN110704362A

CN110704362A - Processor array local storage hybrid management technology

Info

Publication number: CN110704362A
Application number: CN201910864444.2A
Authority: CN
Inventors: 高剑刚; 施晶晶; 李宏亮; 过锋; 唐勇; 吴铁彬; 郑方; 许勇
Original assignee: Wuxi Jiangnan Computing Technology Institute
Current assignee: Wuxi Jiangnan Computing Technology Institute
Priority date: 2019-09-12
Filing date: 2019-09-12
Publication date: 2020-01-17
Anticipated expiration: 2039-09-12
Also published as: CN110704362B

Abstract

The invention provides a processor array local storage hybrid management technology, and belongs to the technical field of computer system structures and processor microstructures. The processor array local storage hybrid management technology comprises the following steps: s1: dividing an on-chip local store (LDM) of each core in the array processor into a first type area, a second type area and a third type area; s2: setting a first type area as a private storage space for storing local private data, wherein the specific addressing of the first type area is only visible to the application program of the core; s3: setting a second type area to a shared memory space for holding shared data of the plurality of cores, the shared memory space specifically addressing the shared memory space visible to applications of the plurality of cores; s4: and setting the third type area as a Cache storage space which is used for mapping to the whole main memory space and managed in a Cache mode so as to enable the access of the application program of the core to the Cache space to be visible. The invention flexibly configures the application characteristics and efficiently exerts the actual running performance of the application.

Description

Processor array local storage hybrid management technology

Technical Field

The invention belongs to the technical field of computer system structures and processor microstructures, and relates to a processor array local storage hybrid management technology.

Background

With the continuous increase of the number of cores of the many-core processor and the great improvement of the computing capability, the memory access capability of the chip is improved far slower than the computing capability, and the problem of a memory wall becomes an important factor for restricting the performance of the chip. The on-chip storage hierarchy design matched with the application characteristic depth is an important technical approach for relieving the access and storage wall problem.

The method has the advantages that the core data sharing of the many-core processor is efficiently realized, and the key for improving the on-chip data reuse rate is realized. However, different applications have great difference in the requirements for on-chip storage, and different sizes of the shared working set and data access mechanisms have great influence on the application performance. The single processor core data management mode has the defect of adaptability.

Disclosure of Invention

The present invention provides a processor array local storage hybrid management technology aiming at the above problems in the prior art, and the technical problems to be solved by the present invention are: how to provide a processor array local storage hybrid management technique.

The purpose of the invention can be realized by the following technical scheme:

a processor array local storage hybrid management technique, comprising the steps of:

s1: dividing an on-chip local storage (LDM) of each core in the array processor into a first type area, a second type area and a third type area;

s2: setting a first type area as a private storage space for storing local private data, wherein the specific addressing of the first type area is only visible to the application program of the core;

s3: setting a second type area to a shared memory space for holding shared data of the plurality of cores, the shared memory space specifically addressing the shared memory space visible to applications of the plurality of cores;

s4: and setting the third type area as a Cache storage space which is used for mapping to the whole main memory space and managed in a Cache mode so as to enable the access of the application program of the core to the Cache space to be visible.

Preferably, the capacity of the private storage space, the capacity of the shared storage space and the capacity of the Cache storage space are all variable.

Preferably, the shared memory space supports sharing by mapping to accomplish multiple granularities and shapes.

Preferably, the array processor is an array processor with 8 x 8 cores and 8 cores per row and column, and the shared memory space includes 16 four-core neighborhood shares, which are four-core neighborhood shares to divide the array processor equally into 16, including four cores.

Preferably, the array processor is an array processor with 8 x 8 cores and 8 cores per row and column, and the shared memory space includes 4 sixteen core neighborhood shares, which are sixteen core neighborhood shares to divide the array processor equally into 4, including sixteen, cores.

Preferably, the array processor is an 8 x 8 core array processor with 8 cores per row and column, and the shared memory space includes 1 sixty-four core neighborhood share with sixty-four cores.

Preferably, the array processor is an array processor with 8 × 8 cores and 8 cores per row and column, the shared memory space includes 8 row shares, and the row share is a row share for setting each row of the array processor as a shared memory space.

Preferably, the array processor is an array processor with 8 × 8 cores and 8 cores per row and column, and the shared memory space includes 8 column shares, which is a column share for setting each column of the array processor as a shared memory space.

Preferably, the array processor is an 8 x 8 core array processor with 8 cores per row and column, the shared space includes a plurality of irregular shares, the irregular shares are non-16 four core adjacent shares, non-4 sixteen core adjacent shares, non-1 sixty four core adjacent shares, non-8 row shares, non-8 column shares, the column shares are column shares to set each column of the array processor as a shared memory space, the row shares are row shares to set each row of the array processor as a shared memory space, the sixteen core adjacent shares are sixteen core adjacent shares to equally divide the array processor into 4 cores including sixteen cores, the four core adjacent shares are four core adjacent shares to equally divide the array processor into 16 cores including four cores, the sixty-four core neighbor sharing is sixty-four core neighbor sharing with sixty-four cores.

Preferably, after the shared memory space is configured, the cores in the irregular sharing are addressed in a unified manner, the memory access of the cores in the irregular sharing cannot exceed the range of the irregular sharing, and if the preset cores in the irregular sharing exceed the range of the irregular sharing, an exception is generated, but the execution correctness of other cores in the irregular cores is not affected.

In the present invention, the on-chip local storage (LDM) of each core in the array processor is first divided into a first type area, a second type area and a third type area, then respectively setting the first type area as a private storage space for storing local private data, wherein the specific addressing of the private storage space is only visible for the application program of the core, setting the second type area as a shared storage space for storing shared data of a plurality of cores, wherein the specific addressing of the shared storage space is visible for the application programs of the cores, setting the third type area as a Cache storage space for mapping to the whole main storage space, managing in a Cache manner to make the access of the application program of the core to the Cache space visible, therefore, the local storage can store local private data and shared data of other cores, flexible configuration is carried out on application characteristics, and the actual operation performance of the application is effectively exerted.

Drawings

FIG. 1 is a schematic flow diagram of the present invention;

FIG. 2 is a schematic diagram of the LDM of the present invention;

FIG. 3 is a schematic diagram of a four-core neighborhood sharing architecture according to the present invention;

FIG. 4 is a schematic diagram of a sixteen core neighborhood sharing architecture according to the present invention;

FIG. 5 is a schematic diagram of the architecture of the sixty-four core neighborhood sharing of the present invention;

FIG. 6 is a schematic diagram of the row sharing structure of the present invention;

FIG. 7 is a schematic diagram of the structure of column sharing in the present invention;

FIG. 8 is a schematic diagram of the structure of irregular sharing in the present invention.

Detailed Description

The following are specific embodiments of the present invention and are further described with reference to the drawings, but the present invention is not limited to these embodiments.

Referring to fig. 1 and 2, the technique for hybrid management of local storage of a processor array in the present embodiment includes the following steps:

s1: dividing an on-chip local store (LDM) of each core in the array processor into a first type area, a second type area and a third type area;

Here, the on-chip local stores (LDM) of each core in the array processor are first divided into a first type area, a second type area, and a third type area, then respectively setting the first type area as a private storage space for storing local private data, wherein the specific addressing of the private storage space is only visible for the application program of the core, setting the second type area as a shared storage space for storing shared data of a plurality of cores, wherein the specific addressing of the shared storage space is visible for the application programs of the cores, setting the third type area as a Cache storage space for mapping to the whole main storage space, managing in a Cache manner to make the access of the application program of the core to the Cache space visible, therefore, the local storage can store local private data and shared data of other cores, flexible configuration is carried out on application characteristics, and the actual operation performance of the application is effectively exerted.

The capacity of the private storage space, the capacity of the shared storage space and the capacity of the Cache storage space can be variable. The capacity of the private storage space, the capacity of the shared storage space and the capacity of the Cache storage space can be flexibly adjusted according to the application requirements, dynamic configuration of LDM space application, dynamic configuration and dynamic protection of the shared range in the array are achieved, the type and the shared range of the storage space are suitable for application characteristics, and the storage structure is matched with the application to the maximum extent.

The shared memory space may support sharing by mapping to accomplish multiple granularities and shapes. For the shared space, in order to adapt to the working set and the affinity characteristics of different applications, the sharing of various granularities and shapes is completed through mapping.

Referring to fig. 3, the array processor may be an array processor with 8 x 8 cores, each row and column may be 8 cores, the shared memory space includes 16 four-core neighborhood shares, and the four-core neighborhood share is a four-core neighborhood share for equally dividing the array processor into 16, including four cores. After configuration, each quad core is adjacent to the cores in the shared black frame, the seen shared space is uniformly addressed, the memory access can not exceed the range of the black frame, namely, the addresses of the cores in the one quad core adjacent sharing are the same, and the memory access can only be in the local quad core adjacent sharing.

Referring to fig. 4, the array processor may be an array processor with 8 x 8 cores, 8 cores per row and column, the shared memory space includes 4 sixteen core neighbor shares, and the sixteen core neighbor share is a sixteen core neighbor share that is used to divide the array processor equally into 4, including sixteen, cores. By configuring, each sixteen core is adjacent to the core in the shared black frame, the seen shared space is uniformly addressed, the memory access can not exceed the range of the black frame, namely, the address of the core in the adjacent sharing of one sixteen core is the same, and the memory access can only be in the adjacent sharing of the sixteen core.

Referring to fig. 5, the array processor may be an 8 x 8 core array processor with 8 cores per row and column, and the shared memory space includes 1 sixty-four core neighborhood sharing with sixty-four cores. By configuring, sixty-four cores are adjacent to cores in a shared black frame, the seen shared space is uniformly addressed, and memory access cannot exceed the range of the black frame, namely addresses of the cores in the sixty-four core adjacent sharing are the same, and memory access can only be in the sixty-four core adjacent sharing.

Referring to fig. 6, the array processor may be an array processor with 8 × 8 cores per row and 8 cores per column, the shared memory space includes 8 row shares, and the row share is a row share for setting each row of the array processor as the shared memory space. After configuration, the cores in the black frame shared by each row see the shared space addressed uniformly, the memory access can not exceed the range of the black frame, namely, the addresses of the cores in one row share are the same, and the memory access can only be in the local row share.

Referring to fig. 7, the array processor may be an array processor with 8 x 8 cores and 8 cores per row and column, the shared memory space includes 8 column shares, and the column share is a column share for setting each column of the array processor as the shared memory space. After configuration, the cores in the black frame shared by each column are seen to share the space and are addressed uniformly, the memory access cannot exceed the range of the black frame, namely, the addresses of the cores in one column share are the same, and the memory access can only be in the column share.

Referring to fig. 8, the array processor may be an 8 x 8 core array processor with 8 cores per row and column, the shared space includes a plurality of irregular shares, the irregular shares are not 16 four-core adjacent shares, not 4 sixteen core adjacent shares, not 1 sixty-four core adjacent shares, not 8 row shares, and not 8 column shares, the column shares are column shares for setting each column of the array processor as a shared memory space, the row shares are row shares for setting each row of the array processor as a shared memory space, the sixteen core adjacent shares are sixteen core adjacent shares for equally dividing the array processor into 4 cores including sixteen cores, the four core adjacent shares are four core adjacent shares for equally dividing the array processor into 16 cores including four cores, the sixty-four core neighbor sharing is sixty-four core neighbor sharing with sixty-four cores. The shared memory space includes an eight core share with 8 cores, a ten core share with 10 cores, a two core share with 2 cores, a three core share with 3 cores, a four core share with 4 cores, and a five core share with 5 cores. After configuration, shared spaces seen by cores in the irregular shared black frame are addressed uniformly, memory access cannot exceed the range of the black frame, namely the addresses of the cores in one irregular shared frame are the same, and the memory access can only be in the irregular shared frame. The size and the shape of the storage space can be adjusted according to the capacity of the storage content, dynamic configuration of the LDM space application, dynamic configuration and dynamic protection of the array internal sharing range are achieved, and the wide applicability of the LDM space is improved.

After the shared memory space is configured, the cores in the irregular sharing are uniformly addressed, the memory access of the cores in the irregular sharing cannot exceed the irregular sharing range, if the preset cores in the irregular sharing exceed the irregular sharing range, an exception occurs, but the execution correctness of other cores in the irregular cores is not affected, so that when one core is abnormal, other cores can normally work, and the larger loss is avoided. For multiple shared space types and capacities, the hardware of the source side core is checked according to the configuration, and the exception is generated when the out-of-range access occurs, but the execution correctness of other cores is not influenced.

The specific embodiments described herein are merely illustrative of the spirit of the invention. Various modifications or additions may be made to the described embodiments or alternatives may be employed by those skilled in the art without departing from the spirit or ambit of the invention as defined in the appended claims.

Claims

1. A technique for hybrid management of local storage in a processor array, comprising the steps of:

2. A processor array local store blend management technique as recited in claim 1, wherein: the capacity of the private storage space, the capacity of the shared storage space and the capacity of the Cache storage space are all variable.

3. A processor array local storage blend management technique as claimed in claim 1 or 2, wherein: the shared memory space supports sharing by mapping to accomplish multiple granularities and shapes.

4. A processor array local store blend management technique as recited in claim 3, wherein: the array processor is an array processor with 8 x 8 cores and 8 cores per row and column, the shared memory space includes 16 quad-core neighborhood shares, the quad-core neighborhood shares being a quad-core neighborhood share to divide the array processor equally into 16, including four cores.

5. A processor array local store blend management technique as recited in claim 3, wherein: the array processor is an array processor with 8 x 8 cores, 8 cores per row and column, the shared memory space includes 4 sixteen core neighbor shares, the sixteen core neighbor shares being sixteen core neighbor shares to divide the array processor equally into 4, including sixteen cores.

6. A processor array local store blend management technique as recited in claim 3, wherein: the array processor is 8 by 8 cores with 8 cores per row and column, and the shared memory space includes 1 sixty-four core neighborhood share with sixty-four cores.

7. A processor array local store blend management technique as recited in claim 3, wherein: the array processor is 8 by 8 cores, each row and each column is an array processor of 8 cores, the shared memory space comprises 8 row shares, and the row shares are used for setting each row of the array processor as a row share of the shared memory space.

8. A processor array local store blend management technique as recited in claim 3, wherein: the array processor is 8 by 8 cores, each row and column is an array processor of 8 cores, the shared memory space comprises 8 column shares, and the column shares are column shares used for setting each column of the array processor as the shared memory space.

9. A processor array local store blend management technique as recited in claim 3, wherein: the array processor is an 8 by 8 core array processor with 8 cores per row and column, the shared space includes a plurality of irregular shares that are non-16 four core neighbor shares, non-4 sixteen core neighbor shares, non-1 sixty-four core neighbor shares, non-8 row shares, non-8 column shares, the column shares are column shares to set each column of the array processor as a shared memory space, the row shares are row shares to set each row of the array processor as a shared memory space, the sixteen core neighbor shares are sixteen core neighbor shares to equally divide the array processor into 4 cores including sixteen cores, the four core neighbor shares are four core neighbor shares to equally divide the array processor into 16 cores including four cores, the sixty-four core neighbor sharing is sixty-four core neighbor sharing with sixty-four cores.

10. A processor array local store blend management technique as recited in claim 9, wherein: after the shared memory space is configured, the cores in the irregular sharing are subjected to unified addressing, the memory access of the cores in the irregular sharing cannot exceed the range of the irregular sharing, if the preset cores in the irregular sharing exceed the range of the irregular sharing, an exception is generated, but the execution correctness of other cores in the irregular cores is not influenced.