CN114218148A - Dynamic configuration method for on-chip storage space - Google Patents

Dynamic configuration method for on-chip storage space Download PDF

Info

Publication number
CN114218148A
CN114218148A CN202110398334.9A CN202110398334A CN114218148A CN 114218148 A CN114218148 A CN 114218148A CN 202110398334 A CN202110398334 A CN 202110398334A CN 114218148 A CN114218148 A CN 114218148A
Authority
CN
China
Prior art keywords
cache
space
size
new
ldm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110398334.9A
Other languages
Chinese (zh)
Inventor
管茂林
钱宏
朱琪
吴伟
杨涛
王飞
樊行健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jiangnan Computing Technology Institute
Original Assignee
Wuxi Jiangnan Computing Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Jiangnan Computing Technology Institute filed Critical Wuxi Jiangnan Computing Technology Institute
Priority to CN202110398334.9A priority Critical patent/CN114218148A/en
Publication of CN114218148A publication Critical patent/CN114218148A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/167Interprocessor communication using a common memory, e.g. mailbox
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/177Initialisation or configuration control

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention discloses a dynamic configuration method of on-chip storage space, which comprises the following steps: s1, reading input parameters; s2, reading an LDM configuration register of the hardware; s3, acquiring the stack space size of the computing core and the stack pointer of the computing core; s4, comparing the new _ cache _ size with the old _ cache _ size; s5, transferring the calculation core stack space to the newly allocated local memory space; s6, confirming that the DMA operations related to the calculation core are all completed; s7, refreshing a computing core Cache; s8, comparing the new _ cache _ size with the old _ cache _ size; s9, the mem _ a space is released, and the step goes to S11 for execution. The invention can avoid the performance loss caused by insufficient LDM or Cache capacity under fixed configuration, and exert the performance advantage to the maximum extent.

Description

Dynamic configuration method for on-chip storage space
Technical Field
The invention relates to a dynamic configuration method of on-chip storage space, belonging to the technical field of Cache space configuration.
Background
The control core in the many-core processor architecture is mainly responsible for functions such as control, task distribution and scheduling, and the computing core mainly completes computation acceleration tasks. A small number of control cores and a large number of compute cores are a classic structure in a many-core processor architecture. The performance cost of the computing cores for directly accessing the main memory is too large, so that the computing cores are generally provided with multi-level memory hierarchies, a main memory and an on-chip memory are a typical configuration, under the configuration, each computing core is provided with a high-speed local data storage space, the space can be configured into a conventional on-chip Local Data Memory (LDM) which is completely managed by software, and also can be partially configured into a data Cache which is automatically managed by hardware, and the capacity ratio of the two management modes can be graded and adjusted. Aiming at different application scenes, the functions of the Cache and the LDM have advantages and disadvantages respectively, and the capacity of the Cache and the LDM is configured in the traditional mode when a program is started.
When the local data storage space of a computing core in a many-core processor architecture is configured as a software-managed LDM, the computing core can be accessed in a load/store mode for accessing the LDM space, and can also realize data batch exchange between the LDM and the outside (a main memory or LDM of other computing cores) in a mode of initiating asynchronous DMA (direct memory access), under the use mode, the data volume accessed by the DMA is generally required to be larger, the continuity is better, and the access operation performance for accessing the LDM space is higher; when a part of space is configured as a data Cache, a computing core can access the space by accessing a load/store mode of a main memory space, the access mode has no requirements on the continuity of the accessed data and the data volume, the use mode is flexible, but the performance of the access operation of accessing the main memory is poor, and meanwhile, because the number of computing cores of the many-core processor is large, the Cache consistency is difficult to ensure through hardware, and software is needed to ensure the Cache consistency between the control core and the computing core.
The many-core processor is developed to the present, the technology of the local data storage space of software management is used more mature, but the application of Cache in many computing cores has not been unified. For the configuration mode of LDM/Cache, the traditional mode is to configure the capacities of Cache and LDM when the program is started, but because the usage modes of LDM and Cache are different, the application scenarios and the requirements on the program characteristics that they are applied to are also different. The total capacity of the local data storage space of the computing core is not changed, and the fixed LDM/Cache configuration makes it impossible to effectively utilize the space of one item to improve the performance of the program when the space of the other item is insufficient.
Disclosure of Invention
The invention aims to provide a dynamic configuration method of on-chip storage space, which can avoid performance loss caused by insufficient capacity of LDM or Cache under fixed configuration and give full play to performance advantages.
In order to achieve the purpose, the invention adopts the technical scheme that: a method for dynamically configuring on-chip storage space is provided, which comprises the following steps:
s1, reading input parameters, judging whether the input parameters are one of the capacity sizes supported by the hardware according to the supportable Cache capacity configuration size provided by the hardware, if so, carrying out the next step, recording the input parameters as new _ Cache _ size, and otherwise, reporting an error, exiting and reminding the input parameters of errors;
s2, reading an LDM configuration register of the hardware, acquiring the capacity of the Cache under the current configuration, and recording as old _ Cache _ size;
s3, acquiring the stack space size of the computing core and the stack pointer of the computing core;
s4, comparing the new _ cache _ size with the old _ cache _ size, if the new _ cache _ size is larger than the old _ cache _ size, turning to S5 for execution, and otherwise, turning to S6 for execution;
s5, distributing an office space mem _ a with the same size as the calculation core stack space in the LDM space, and transferring the calculation core stack space to the newly distributed office space;
s6, confirming that DMA operations related to the computing core are all completed through judging the answer words, and confirming that memory access operations sent by the computing core before are all completed through an MEMB instruction of hardware;
s7, refreshing the computing core Cache to ensure Cache consistency, setting the value of the LDM configuration register as new _ Cache _ size to reconfigure the Cache capacity, and ensuring that the subsequent access operation uses new configuration through the MEMB instruction of hardware;
s8, comparing the new _ cache _ size with the old _ cache _ size, if the new _ cache _ size is larger than the old _ cache _ size, turning to S9 for execution, and otherwise, turning to S10 for execution;
s9, transferring the stack space of the computing core in the mem _ a to the LDM space next to the new Cache space, releasing the mem _ a space, and turning to S11 for execution;
s10, transferring the computing core stack space to an LDM space next to the new Cache space;
and S11, pointing the computing core stack pointer to the tail position of the new computing core stack space, and finishing configuration.
The further improved scheme in the technical scheme is as follows:
1. in the above scheme, the configuration method is packaged as a function interface for a user to use, and the input parameter is a Cache capacity that the user wants to configure.
Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages:
the invention provides a method for dynamically configuring Cache and LDM (laser direct memory) during running, which can flexibly configure the capacity of the Cache and the LDM according to the requirements of different stages of a program on the Cache and the LDM, avoid the performance loss caused by insufficient capacity of the LDM or the Cache under fixed configuration and furthest exert the performance advantages of the LDM or the LDM.
Drawings
FIG. 1 is a configuration structure diagram of a computing core LDM/Cache according to the present invention;
FIG. 2 is a schematic diagram of a computational core stack space allocation proposed by the present invention;
FIG. 3 is a flow chart of the method of the present invention.
Detailed Description
Example (b): the invention provides a dynamic configuration method of on-chip storage space, which specifically comprises the following steps:
s1, reading input parameters, judging whether the input parameters are one of the capacity sizes supported by the hardware according to the supportable Cache capacity configuration size provided by the hardware, if so, carrying out the next step, recording the input parameters as new _ Cache _ size, and otherwise, reporting an error, exiting and reminding the input parameters of errors;
s2, reading an LDM configuration register of the hardware, acquiring the capacity of the Cache under the current configuration, and recording as old _ Cache _ size;
s3, acquiring the stack space size of the computing core and the stack pointer of the computing core;
s4, comparing the new _ cache _ size with the old _ cache _ size, if the new _ cache _ size is larger than the old _ cache _ size, turning to S5 for execution, and otherwise, turning to S6 for execution;
s5, distributing an office space mem _ a with the same size as the calculation core stack space in the LDM space, and transferring the calculation core stack space to the newly distributed office space;
s6, confirming that DMA operations related to the computing core are all completed through judging the answer words, and confirming that memory access operations sent by the computing core before are all completed through an MEMB instruction of hardware;
s7, refreshing the computing core Cache to ensure Cache consistency, setting the value of the LDM configuration register as new _ Cache _ size to reconfigure the Cache capacity, and ensuring that the subsequent access operation uses new configuration through the MEMB instruction of hardware;
s8, comparing the new _ cache _ size with the old _ cache _ size, if the new _ cache _ size is larger than the old _ cache _ size, turning to S9 for execution, and otherwise, turning to S10 for execution;
s9, transferring the stack space of the computing core in the mem _ a to the LDM space next to the new Cache space, releasing the mem _ a space, and turning to S11 for execution;
s10, transferring the computing core stack space to an LDM space next to the new Cache space;
and S11, pointing the computing core stack pointer to the tail position of the new computing core stack space, and finishing configuration.
And packaging the configuration method into a function interface for a user to use, wherein the input parameters are the Cache capacity which the user wants to configure.
The above embodiments are further explained as follows:
the local data storage space in the compute core is shown in fig. 1: its overall size is 256KB, with hardware support configured into three forms:
(a) all configured as a conventional on-chip local data memory, LDM; (b) part of the Cache is configured into a Cache (the size is 32 KB) automatically managed by hardware, and the other part of the Cache is LDM; (c) part of the data Cache is configured into a data Cache (the size is 128 KB) automatically managed by hardware, and the other part of the data Cache is LDM; when the Cache is configured, the hardware Cache adopts a 4-path set-associative strategy to support hardware elimination and filling.
Because the number of the computing cores in the many-core processor is very large, the cost of hardware for realizing the consistency of the computing cores and Cache is very large, and the consistency of the computing cores and Cache is difficult to realize, the consistency of the computing cores and Cache needs to be ensured by software.
The configuration structure of the local data storage space is controlled and realized by an LDM configuration register of hardware, different values are given to the LDM configuration register to represent different local storage space configurations, and in addition, the consistency of the Cache of a computing core is not supported by the hardware, so that the problem that the consistency of data in the Cache needs to be ensured when the local storage space is dynamically configured is solved.
In addition, in order to improve performance, the stack space of the computing core is generally stored in the local memory space, and in the process, the correctness of the stack space data is also ensured. The storage position of the stack space of the computing core in the local memory is shown in fig. 2 (taking the Cache size of 32KB as an example): the computing core stack space is stored in the LDM, but in order to distinguish from other computing core data and avoid space overlapping conflicts as much as possible, the computing core stack space is stored next to the partitioned Cache space.
This patent is according to the different using-way of Cache and LDM in the local data storage space of computational core and their different influence to the program performance, through providing the function interface that encapsulates, allow the user to calculate the capacity of core LDM/Cache according to different application scenarios, program characteristic to LDM or the different capacity demand dynamic configuration of Cache in the program operation process, avoid under the fixed configuration because LDM or the not enough performance loss that causes of Cache capacity, promote the performance of program, and the configuration process is then transparent to the user, convenience of customers uses.
According to the configurable characteristic of the local memory space, a packaged function interface is provided for programmers to use, the LDM and Cache capacity are dynamically configured, the configuration process is transparent to users, and performance loss caused by insufficient LDM or Cache capacity is avoided;
according to the comparison of the Cache capacities before and after configuration, different modes are selected for transferring the computing core stack space, so that the computing core stack space is not damaged, and the configuration speed can be increased as much as possible;
before and after the capacity of the Cache of the computing core is configured, the internal of the computing core is ensured to be in a stable consistent state by judging DMA answer words, judging whether the access operation is finished or not, inserting an MEMB instruction, refreshing the Cache of the computing core and the like, and the consistency of the Cache and the correctness of a program are ensured.
The invention provides a dynamic configuration method of on-chip storage space, which is provided for users by a packaged function interface, wherein input parameters are Cache capacity which the users want to configure, a specific flow chart is shown in figure 3, and the method is briefly described as follows:
1) reading an input parameter, judging whether the parameter is the size of the configurable Cache capacity supported by hardware, if so, carrying out the next step, and recording the parameter as new _ Cache _ size; otherwise, reporting an error to exit and reminding the user of the error parameter input;
2) reading an LDM configuration register of hardware, acquiring the capacity of the Cache under the current configuration, and recording as old _ Cache _ size;
3) acquiring the size of a stack space of a computing core and a stack pointer of the computing core;
4) comparing the new _ cache _ size with the old _ cache _ size, if the new _ cache _ size is larger than the old _ cache _ size, executing the step 5, and otherwise, executing the step 6;
5) allocating a local memory space mem _ a with the same size as the calculation core stack space in the LDM space, and transferring the calculation core stack space to the newly allocated local memory space;
6) confirming that DMA operations related to the computing core are all completed by judging the answer words, and confirming that memory access operations sent by the computing core before are all completed by using the MEMB instruction of the hardware;
7) refreshing a computing core Cache to ensure Cache consistency; setting the value of the LDM configuration register as new _ Cache _ size to reconfigure the Cache capacity, and ensuring that the subsequent access operation uses new configuration through the MEMB instruction of hardware;
8) if the new _ cache _ size is larger than the old _ cache _ size, the step 9 is executed, otherwise the step 10 is executed;
9) re-transferring the computing core stack space in the mem _ a to the LDM space next to the new Cache space; releasing the mem _ a space; turning to the step 11 for execution;
10) transferring the computing core stack space to an LDM space next to the new Cache space;
11) and pointing the computing core stack pointer to the tail position of the new computing core stack space, and finishing configuration.
Assuming that the capacity of a computing core Cache under the current configuration is 32KB, and the input parameter of the Cache reconfigured by a user is 128KB, the brief process is roughly as follows:
1) reading input parameters and values of an LDM configuration register of hardware, wherein new _ cache _ size =128KB, old _ cache _ size =32KB, and simultaneously obtaining the size of a computing core space and the value of a stack pointer, wherein if the size of the current computing core stack space is 8KB, the value of the computing core stack pointer is 40 (32 + 8) KB;
2) because new _ cache _ size > old _ cache _ size, we newly apply for a block of 8KB space mem _ a on the regular LDM space, and transfer the 8KB data on the compute kernel stack space into mem _ a;
3) after confirming that DMA operation and memory access operation related to the computing core are completed, refreshing a computing core Cache, setting the value of an LDM configuration register to be 128KB to reconfigure the capacity of the Cache, and inserting an MEMB instruction to ensure that subsequent operations all use new configuration;
4) all the 8KB data (computing core stack space) in the mem _ a is transferred to the conventional LDM space next to the Cache space under the new configuration again, and the mem _ a space is released; the compute core stack pointer is updated, at which time the compute core stack pointer has a value of 136 (128 + 8) KB.
When the on-chip storage space dynamic configuration method is adopted, the capacity of the on-chip storage space can be flexibly configured according to the requirements of different stages of the program on the Cache and the LDM, the performance loss caused by the LDM or insufficient capacity of the Cache under fixed configuration is avoided, and the performance advantages of the on-chip storage space dynamic configuration method are exerted to the greatest extent.
To facilitate a better understanding of the invention, the terms used herein will be briefly explained as follows:
and (4) Cache: cache memory, a special memory subsystem, is a high-speed, small-capacity memory between the central processing unit CPU and the main memory.
LDM: local data memory, local/local data store.
DMA: direct memory access, is a data exchange mode that accesses data directly from a memory without going through a CPU.
The above embodiments are merely illustrative of the technical ideas and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.

Claims (2)

1. A method for dynamically configuring on-chip memory space, comprising the steps of:
s1, reading input parameters, judging whether the input parameters are one of the capacity sizes supported by the hardware according to the supportable Cache capacity configuration size provided by the hardware, if so, carrying out the next step, recording the input parameters as new _ Cache _ size, and otherwise, reporting an error, exiting and reminding the input parameters of errors;
s2, reading an LDM configuration register of the hardware, acquiring the capacity of the Cache under the current configuration, and recording as old _ Cache _ size;
s3, acquiring the stack space size of the computing core and the stack pointer of the computing core;
s4, comparing the new _ cache _ size with the old _ cache _ size, if the new _ cache _ size is larger than the old _ cache _ size, turning to S5 for execution, and otherwise, turning to S6 for execution;
s5, distributing an office space mem _ a with the same size as the calculation core stack space in the LDM space, and transferring the calculation core stack space to the newly distributed office space;
s6, confirming that DMA operations related to the computing core are all completed through judging the answer words, and confirming that memory access operations sent by the computing core before are all completed through an MEMB instruction of hardware;
s7, refreshing the computing core Cache to ensure Cache consistency, setting the value of the LDM configuration register as new _ Cache _ size to reconfigure the Cache capacity, and ensuring that the subsequent access operation uses new configuration through the MEMB instruction of hardware;
s8, comparing the new _ cache _ size with the old _ cache _ size, if the new _ cache _ size is larger than the old _ cache _ size, turning to S9 for execution, and otherwise, turning to S10 for execution;
s9, transferring the stack space of the computing core in the mem _ a to the LDM space next to the new Cache space, releasing the mem _ a space, and turning to S11 for execution;
s10, transferring the computing core stack space to an LDM space next to the new Cache space;
and S11, pointing the computing core stack pointer to the tail position of the new computing core stack space, and finishing configuration.
2. The method of claim 1, wherein: and packaging the configuration method into a function interface for a user to use, wherein the input parameters are the Cache capacity which the user wants to configure.
CN202110398334.9A 2021-04-14 2021-04-14 Dynamic configuration method for on-chip storage space Pending CN114218148A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110398334.9A CN114218148A (en) 2021-04-14 2021-04-14 Dynamic configuration method for on-chip storage space

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110398334.9A CN114218148A (en) 2021-04-14 2021-04-14 Dynamic configuration method for on-chip storage space

Publications (1)

Publication Number Publication Date
CN114218148A true CN114218148A (en) 2022-03-22

Family

ID=80695807

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110398334.9A Pending CN114218148A (en) 2021-04-14 2021-04-14 Dynamic configuration method for on-chip storage space

Country Status (1)

Country Link
CN (1) CN114218148A (en)

Similar Documents

Publication Publication Date Title
CN104615488B (en) The method and apparatus of task scheduling in heterogeneous multi-core reconfigurable calculating platform
CN104699631A (en) Storage device and fetching method for multilayered cooperation and sharing in GPDSP (General-Purpose Digital Signal Processor)
CA2541930A1 (en) Efficient system management synchronization and memory allocation
CN105556503B (en) Dynamic memory control methods and its system
CN110187835A (en) For managing the method, apparatus, equipment and storage medium of access request
WO2023207361A1 (en) Memory management method, system, device and computer readable storage medium
US11720496B2 (en) Reconfigurable cache architecture and methods for cache coherency
CN108710583A (en) Management method, device, computer equipment and the medium in SSD write buffers area
JP5591969B1 (en) Multi-core processor and control method
CN109213745B (en) Distributed file storage method, device, processor and storage medium
CN109062857B (en) Novel message controller capable of realizing communication among multiple processors at high speed and communication method thereof
US20070038429A1 (en) System simulation method
CN114218148A (en) Dynamic configuration method for on-chip storage space
CN112433847B (en) OpenCL kernel submitting method and device
CN112068955B (en) Communication optimization method in heterogeneous multi-core platform processor and electronic equipment
CN112565474A (en) Batch data transmission method facing distributed shared SPM
TW202119215A (en) A system operative to share code and a method for code sharing
EP4426037A1 (en) Computing task scheduling apparatus, computing apparatus, computing task scheduling method and computing method
CN109147839A (en) A kind of apparatus and system for having both Yi Xin and calculating with random storage access function
CN117389946B (en) FFT (fast Fourier transform) implementation structure capable of dynamically expanding points
EP4160423B1 (en) Memory device, memory device operating method, and electronic device including memory device
WO2022142173A1 (en) Data check method and related device
CN117312202B (en) System on chip and data transmission method for system on chip
CN116069451B (en) Virtualization method, device, equipment, medium, accelerator and system
WO2024183678A1 (en) Method for acquiring lock of data object, network interface card, and computing device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination