WO2022247189A1 - Procédé et appareil de commande de cœur pour système à grand nombre de cœurs, et système à grand nombre de cœurs - Google Patents

Procédé et appareil de commande de cœur pour système à grand nombre de cœurs, et système à grand nombre de cœurs Download PDF

Info

Publication number
WO2022247189A1
WO2022247189A1 PCT/CN2021/133963 CN2021133963W WO2022247189A1 WO 2022247189 A1 WO2022247189 A1 WO 2022247189A1 CN 2021133963 W CN2021133963 W CN 2021133963W WO 2022247189 A1 WO2022247189 A1 WO 2022247189A1
Authority
WO
WIPO (PCT)
Prior art keywords
core
cluster
core cluster
processing
duration
Prior art date
Application number
PCT/CN2021/133963
Other languages
English (en)
Chinese (zh)
Inventor
吴臻志
丁瑞强
祝夭龙
Original Assignee
北京灵汐科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京灵汐科技有限公司 filed Critical 北京灵汐科技有限公司
Publication of WO2022247189A1 publication Critical patent/WO2022247189A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present disclosure relates to the field of computer technology, and in particular to a core control method and device for many-core systems, many-core systems, electronic equipment, computer-readable media, and computer program products.
  • a many-core system usually has many cores (also called processing cores).
  • the core is the smallest computing unit in the many-core system that can be independently scheduled and has complete computing capabilities.
  • the core has certain resources such as storage and computing.
  • the cores of the many-core system can run program instructions independently, using the ability of parallel computing to speed up the running speed of the program and provide multi-tasking capabilities.
  • the present disclosure provides a core control method and device for a many-core system, a processing core, a many-core system, electronic equipment, a computer readable medium, and a computer program product.
  • the present disclosure provides a core control method for a many-core system, the many-core system includes at least one core cluster, and each of the core clusters includes at least one second processing core, the core control method Including: for any one of the core clusters, performing load detection on the core cluster, and obtaining the load data corresponding to the core cluster; according to the load data corresponding to the core cluster, determining the load status of the core cluster; according to the load of the core cluster state, the core cluster is regulated and processed; wherein, the regulated process includes one of the following control methods: regulating the number of second processing cores that can currently perform operations in the core cluster; Regulate the operating voltage and operating frequency of the second processing core of the job; insert a blank frame into the buffer corresponding to the core cluster.
  • the present disclosure provides a core control device, the core control device is applied to a many-core system, the many-core system includes at least one core cluster, and each of the core clusters includes at least one second processing core, so
  • the core control device includes: a load data detection module configured to perform load detection on a corresponding core cluster and obtain load data corresponding to the core cluster; a load state detection module configured to determine the load data corresponding to the core cluster according to the load data corresponding to the core cluster The load state of the core cluster; the core regulation module is configured to perform regulation processing on the core cluster according to the load state of the core cluster; wherein, the regulation processing includes one of the following regulation methods: the current available in the core cluster Regulate the number of second processing cores that can perform operations; regulate the operating voltage and frequency of the second processing cores that can currently perform operations in the core cluster; insert a blank frame into the buffer corresponding to the core cluster.
  • the present disclosure provides a many-core system, the many-core system includes a plurality of processing cores, the plurality of processing cores include a first processing core and a plurality of second processing cores, and some of the plurality of second processing cores Or all the second processing cores are divided into at least one core cluster, each of the core clusters includes at least one of the second processing cores, each of the core clusters has a main processing core, and the main processing core of the core cluster is the A designated second processing core in a core cluster; wherein, the first processing core includes the above-mentioned core control device, and/or at least part of the main processing cores of the core cluster includes the above-mentioned core control device.
  • the present disclosure provides an electronic device, which includes: a plurality of processing cores; and an on-chip network configured to exchange data between the plurality of processing cores and external data; wherein, one or more One or more instructions are stored in each of the processing cores, and the one or more instructions are executed by the one or more processing cores, so that the one or more processing cores can execute the above-mentioned core control method.
  • the present disclosure provides a computer-readable medium on which a computer program is stored, wherein the computer program implements the above-mentioned core control method when executed by a processing core of a many-core system.
  • the present disclosure provides a computer program product, which includes a computer program, and when the computer program is executed by a processing core of a many-core system, the above-mentioned core control method is implemented.
  • FIG. 1 is a flowchart of a core control method for a many-core system provided by an embodiment of the present disclosure
  • FIG. 2 is a block diagram of a many-core system provided by an embodiment of the present disclosure
  • FIG. 3 is a flowchart of a core control method for a many-core system provided by an embodiment of the present disclosure
  • FIG. 4 is a flow chart of a core control method for a many-core system provided by an embodiment of the present disclosure
  • FIG. 5 is a flow chart of the control process of the core control method of the embodiment of the present disclosure.
  • FIG. 6 is a flow chart of the control process of the core control method of the embodiment of the present disclosure.
  • FIG. 7 is a schematic diagram of an application scenario of a many-core system according to an embodiment of the present disclosure.
  • FIG. 8 is a block diagram of a core cluster of a many-core system according to an embodiment of the present disclosure.
  • FIG. 9 is a block diagram of a core cluster after a new sub-cluster is formed according to an embodiment of the present disclosure.
  • FIG. 10 is a block diagram of a core control device provided by an embodiment of the present disclosure.
  • Fig. 11 is a composition block diagram of an electronic device provided by an embodiment of the present disclosure.
  • each core of the many-core system stores the input data of the input device, and the arithmetic unit performs calculation according to the input data, stores the calculation result in the memory, and finally notifies the output device to receive the output result.
  • the input device and the output device can be Peripherals can also be cores in many-core systems.
  • the many-core system includes at least one core cluster, each core cluster includes at least one second processing core, and each core cluster is used to execute corresponding computing tasks.
  • each core cluster executes corresponding tasks, especially in the scenario where each core cluster executes task pipeline tasks, there are high requirements for the task execution efficiency of each core cluster, so how to effectively improve the efficiency of core cluster execution tasks , has become an urgent technical problem to be solved in the core cluster scenario of the many-core system.
  • FIG. 1 is a flow chart of a core control method for a many-core system provided by an embodiment of the present disclosure.
  • an embodiment of the present disclosure provides a core control method for a many-core system, wherein the many-core system includes at least one core cluster, and each core cluster includes at least one second processing core, and the method can be implemented by a core control device To execute, the device can be implemented by means of software and/or hardware, and the core control method includes:
  • Step S1 for any core cluster, perform load detection on the core cluster, and obtain load data corresponding to the core cluster.
  • Step S2 according to the load data corresponding to the core cluster, determine the load status of the core cluster.
  • Step S3 according to the load status of the core cluster, the core cluster is regulated.
  • the regulating process includes one of the following regulating methods: regulating the number of second processing cores currently capable of operating in the core cluster; operating voltage and operating frequency of the second processing cores currently capable of operating in the core cluster Perform regulation; insert a blank frame into the buffer corresponding to the core cluster.
  • the load status of the core cluster can be obtained in real time, and the core cluster can be regulated and processed in real time, so that the core cluster can process tasks flexibly, improve the efficiency of task processing, and reduce the power consumption of the many-core system.
  • the core control method for the many-core system can control and manage each core cluster of the many-core system flexibly by detecting the load status of each core cluster, and effectively control and manage each core cluster. Improve the efficiency of each core cluster to perform tasks, and at the same time improve the flexibility of many-core systems for task processing.
  • Fig. 2 is a composition block diagram of a many-core system provided by an embodiment of the present disclosure.
  • the many-core system includes a plurality of processing cores, and the plurality of processing cores include a first processing core and a second processing core, part or all of the second processing cores are pre-divided into at least one core cluster, each core cluster has a main processing core, and the main processing core is the core A pre-designated second processing core of the at least one second processing core of the cluster.
  • the first processing core can process the tasks of the many-core system, and can also perform task allocation and management of the many-core system; and the main processing core of each core cluster can process the tasks of the core cluster where it is located, and can also perform tasks within the cluster. Assignment and management.
  • the core control method of the embodiment of the present disclosure can be applied to the main processing core of any core cluster in the many-core system, that is, the core control method of the embodiment of the present disclosure is implemented based on the main processing core of any core cluster, and the main processing core of any core cluster
  • the processing core can control and manage the second processing core of the core cluster where it is located through the core control method of the embodiment of the present disclosure.
  • the core control method of the embodiment of the present disclosure can also be applied to the first processing core of the many-core system, that is, the core control method of the embodiment of the present disclosure is implemented based on the first processing core of the many-core system, and the first processing core can be implemented through the present disclosure
  • the core control method of the example controls and manages all core clusters of the many-core system.
  • each core cluster is correspondingly provided with a register, and the register is used to cache the task data of the task to be processed by the corresponding core cluster, so the memory status of the register can represent the load condition of the core cluster.
  • the buffer is a FIFO (First Input First Output, first-in-first-out) buffer, and this disclosure does not limit the specific type of the buffer.
  • FIG. 3 is a flowchart of a core control method for a many-core system provided by an embodiment of the present disclosure. As shown in FIG. 3 , step S1 may further include steps S11a to S13a.
  • Step S11a detecting the real-time memory space utilization rate of the buffer corresponding to the core cluster.
  • the real-time memory space usage ratio refers to the ratio of the real-time used memory space size to the total memory space size.
  • Step S12a detecting a comparison result between the real-time memory space usage rate of the buffer corresponding to the core cluster and the first preset threshold.
  • the first preset threshold can be set according to actual needs.
  • the first preset threshold can be set to a value greater than or equal to 60% but less than 100%, for example, can be set to 70%, which is not limited in the present disclosure.
  • Step S13a record the duration of the real-time memory space usage continuously greater than or equal to the first preset threshold, and record it as the first duration, and the load data corresponding to the core cluster includes the first duration.
  • the real-time memory space usage rate of the buffer changes with time. Therefore, the time period during which the real-time memory space usage rate is continuously greater than or equal to the first preset threshold means that the real-time memory space usage rate is continuously greater than or equal to
  • the duration of the state of the first preset threshold can represent the current load state of the buffer, that is, represent the current load state of the core cluster.
  • step S2 may further include step S21a and step S22a.
  • Step S21a judging whether the first duration is greater than or equal to the first preset duration, if yes, execute step S22a, if not, do not perform further processing.
  • Step S22a when the first duration is greater than or equal to the first preset duration, determine that the load state of the core cluster is a busy state.
  • the duration indicates that the core cluster is in an overloaded state, that is, a busy state. If the duration (the first duration) is less than the first preset duration, it indicates that the core cluster is not in a busy state, so no further processing may be performed.
  • the first preset duration can be set according to the actual situation, for example, it can be set to 15 minutes, half an hour or 1 hour, which is not limited in the present disclosure. In this way, the load state of the core cluster can be determined through the real-time memory space usage rate, and the efficiency of load state judgment can be improved.
  • FIG. 4 is a flow chart of a core control method for a many-core system provided by an embodiment of the present disclosure. As shown in FIG. 4 , step S1 may further include steps S11b to S13b.
  • Step S11b detecting the real-time memory space usage rate of the buffer corresponding to the core cluster.
  • Step S12b detecting a comparison result between the real-time memory space usage rate of the buffer corresponding to the core cluster and a second preset threshold.
  • the second preset threshold is greater than 0 and less than the first preset threshold, and the second preset threshold can be set according to actual needs.
  • the second preset threshold can be set to a value less than or equal to 40%, for example, it can be set 10% or 5%, which is not limited in the present disclosure.
  • Step S13b record the time duration during which the real-time memory space usage rate is continuously less than or equal to the second preset threshold, and record it as the second duration, and the load data corresponding to the core cluster includes the second duration.
  • the real-time memory space usage rate of the buffer changes with time. Therefore, the time period during which the real-time memory space usage rate is continuously less than or equal to the second preset threshold means that the real-time memory space usage rate is continuously less than or equal to the second preset threshold.
  • the duration of the state of the two preset thresholds can represent the current load state of the buffer, that is, represent the current load state of the core cluster.
  • step S2 may further include step S21b and step S22b.
  • Step S21b judging whether the second duration is greater than or equal to the second preset duration, if yes, execute step S22b, if not, do no further processing.
  • Step S22b when the second duration is greater than or equal to the second preset duration, determine that the load state of the core cluster is a low load state.
  • the duration indicates that the core cluster is in a state of excess resources, that is, a state of low load. If the duration (the second duration) is less than the second preset duration, it indicates that the core cluster is not in a low-load state, so no further processing may be performed.
  • the second preset duration can be set according to the actual situation, and the second preset duration can be equal to the first preset duration, for example, can be set to 15 minutes, half an hour or 1 hour, which is not limited in the present disclosure. In this way, the load state of the core cluster can be determined through the real-time memory space usage rate, and the efficiency of load state judgment can be improved.
  • the real-time memory space usage rate of the buffer corresponding to the core cluster is detected, the load data of the core cluster includes the real-time memory space usage rate of the buffer corresponding to the core cluster, if the real-time memory space of the buffer The utilization rate is continuously in the state greater than or equal to the first preset threshold value, and the duration (first duration) is greater than or equal to the first preset duration, then it is determined that the load state of the core cluster is a busy state; if the real-time The memory space utilization rate is continuously in the state less than or equal to the second preset threshold, and the duration (second duration) is greater than or equal to the second preset duration, then it is determined that the load state of the core cluster is a low load state; if the cache The real-time memory space usage rate of the device is continuously between the second preset threshold and the first preset threshold, or the duration of being in the state greater than or equal to the first preset threshold is shorter than the first preset duration, and is continuously at the second preset threshold.
  • the core cluster is neither busy nor idle, and the load state of the core cluster is an intermediate state, which is between a low load state and a busy state. status, so no further processing is possible.
  • the load data of the core cluster may be acquired by detecting the growth rate of memory space usage of the buffer corresponding to the core cluster.
  • step S1 may further include: acquiring the memory space usage growth rate of the buffer corresponding to the core cluster, and the load data corresponding to the core cluster includes the memory space usage growth rate of the corresponding buffer.
  • the growth rate of memory space usage refers to the growth rate of the memory space usage rate of the buffer within a preset time period (such as 5 minutes, 10 minutes, or 15 minutes), that is, the memory space usage growth rate refers to the current time.
  • the time period from the historical time to the current time is a preset time period.
  • step S2 in the case of acquiring the load data of the core cluster by detecting the memory space usage growth rate of the buffer corresponding to the core cluster, step S2 may further include.
  • the first preset speed-up value is a positive value, which can be set according to actual needs.
  • the first preset speed-up value can be a value between 60% and 90%, for example, it can be set to 70%. There is no limit to this publicly.
  • the growth rate of memory space usage is greater than or equal to the first preset growth rate value
  • it is determined that the load state of the core cluster is a busy state and jump to step S3. That is to say, if the memory space usage growth rate of the buffer corresponding to the core cluster is greater than or equal to the first preset growth rate value, it indicates that the buffer is in an overloaded state, that is, it indicates that the core cluster is in an overloaded state, that is busy state. In this way, the load status can be determined through the memory space usage growth rate, thereby improving the efficiency of load status judgment.
  • the memory space usage growth rate of the buffer corresponding to the core cluster is less than the first preset growth rate value, it indicates that the buffer is not in an overload state, that is, it indicates that the core cluster is not in an overload state , that is, it is not in a busy state, so further processing may not be performed, or it may be further judged whether the core cluster is in a low-load state.
  • step S2 may further include:
  • the preset usage rate can be set according to actual needs, for example, it can be set to 10%, 20% or 30%, which is not limited in the present disclosure.
  • the second preset speed-up value is a negative value greater than minus 1 and less than 0, and the specific value of the second preset speed-up value can be set according to actual needs.
  • the second preset speed-up value can be negative A value between 90% and minus 50%, for example, can be set to minus 60%.
  • the real-time memory space usage rate of the buffer corresponding to the core cluster is less than or equal to the preset usage rate, it is further judged whether the memory space usage growth rate is less than or equal to the second preset growth rate value, thereby determining the core cluster load status.
  • step S3 In the case that the memory space usage growth rate is less than or equal to the second preset growth rate value, it is determined that the load state of the core cluster is a low load state, and jump to step S3.
  • the real-time memory space utilization rate of the buffer corresponding to the core cluster is less than or equal to the preset utilization rate, and the memory space usage growth rate of the register corresponding to the core cluster is less than or equal to the second preset growth rate value, it indicates that the core
  • the real-time memory space usage rate of the buffer corresponding to the cluster is small, and the memory space usage of the buffer has a relatively large negative growth, that is, it is in a state of excess resources, which means that the core cluster is in a state of excess resources, that is, low load state.
  • the load status can be determined jointly by the memory space usage growth rate and the real-time memory space usage rate, thereby improving the accuracy of load status judgment.
  • the memory space usage growth rate of the buffer corresponding to the core cluster is less than the first preset growth rate value and greater than the second preset growth rate value, it indicates that the core cluster is neither busy nor idle , the load state of the core cluster is an intermediate state, and the intermediate state is a state between the low load state and the busy state, so further processing may not be performed.
  • the load data of the core cluster can also be acquired by detecting the task processing status of the core cluster.
  • step S1 may further include: detecting in real time the task processing time required by the core cluster to process the task, and the load data corresponding to the core cluster includes the task processing time. It can be understood that the task processing time refers to the time spent by the core cluster to process the task.
  • step S2 may further include:
  • the first preset processing duration may be set according to actual needs, which is not limited in the present disclosure.
  • step S3 In the case that the task processing duration required by the core cluster to process the task is greater than or equal to the first preset processing duration, it is determined that the load status of the core cluster is a busy state, and jump to step S3.
  • the task processing duration corresponding to the core cluster is greater than or equal to the first preset processing duration, it indicates that the core cluster spends a long time processing the task, so it can be determined that the core cluster is in an overload state, that is, a busy state. In this way, the load status can be determined by the task processing duration, thereby improving the efficiency of load status judgment.
  • the task processing duration corresponding to the core cluster is less than the first preset processing duration, it indicates that the core cluster is not in an overload state, that is, it indicates that the core cluster is not in an overload state, that is, it is not in a busy state , so no further processing may be performed, or it may be further judged whether the core cluster is in a low-load state.
  • step S2 may further include:
  • the second preset processing duration is shorter than the first preset processing duration, and the second preset processing duration can be set according to actual needs, which is not limited in the present disclosure.
  • step S3 determines that the load state of the core cluster is a low load state, and jump to step S3.
  • the task processing duration corresponding to the core cluster is less than or equal to the second preset processing duration, it indicates that the core cluster takes a relatively short time to process the task, so it can be determined that the core cluster is in a state of excess resources, that is, a low load state. In this way, the load status can be determined by the task processing duration, thereby improving the efficiency of load status judgment.
  • the task processing duration corresponding to the core cluster is greater than the second preset processing duration and less than the first preset processing duration, it indicates that the core cluster is neither busy nor idle, and the load status of the core cluster is an intermediate state, which is a state between the low-load state and the busy state, and therefore may not be further processed.
  • the many-core system includes a plurality of core clusters, and the multiple core clusters perform task processing based on a synchronization period, and the synchronization period is the maximum task processing duration among the task processing durations required by each core cluster to process the task.
  • the current task is the face recognition task of the video to be synthesized.
  • the face recognition task includes multiple subtasks.
  • the multiple subtasks are video stream decoding, face detection, face feature recognition, feature extraction, and feature matching.
  • the core clusters are responsible for their corresponding subtasks.
  • the multiple subtasks constitute a task pipeline, that is, the results of the corresponding subtasks processed by the previous core cluster need to be sent to the next core cluster for processing.
  • multiple core clusters When performing task pipeline processing, multiple core clusters There is a unified synchronization cycle, which is the maximum task processing time among the task processing time required by each core cluster to process the corresponding subtasks. After the synchronization period ends, the multiple core clusters can process the next task, such as voice recognition and video synthesis of the video to be synthesized.
  • step S2 when it is detected that the load data of each core cluster includes the task processing duration of each core cluster, step S2 may further include:
  • the task processing duration corresponding to the core cluster within the preset detection time period is counted as the frequency of the synchronization cycle.
  • the preset detection time period may be any preset time period, and in this step, within the preset detection time period, the task processing duration corresponding to the core cluster is counted as the number of synchronization cycles, that is, the frequency.
  • the first preset number of times may be set according to actual needs, which is not limited in the present disclosure.
  • the task processing duration corresponding to the core cluster is greater than or equal to the first preset number of times as the frequency of the synchronization cycle, it indicates that the task processing duration of the core cluster is often in the maximum state among all core clusters, so it can be determined that the The core cluster is overloaded, i.e. busy.
  • the task processing duration corresponding to the core cluster is less than the first preset number of times as the frequency of the synchronization cycle, it indicates that the core cluster is not in an overloaded state, that is, not in a busy state, so further processing may not be performed.
  • the load status can be determined by using the task processing duration as the frequency of the synchronization cycle, thereby improving the accuracy of load status judgment.
  • step S2 when it is detected that the load data of each core cluster includes the task processing duration of each core cluster, step S2 may further include:
  • the task processing duration corresponding to the core cluster within the preset detection time period is counted as the frequency of the synchronization cycle.
  • the statistics method of the frequency will not be repeated here.
  • the task processing duration corresponding to the core cluster is the frequency of the synchronization cycle and the ratio of the number of synchronization cycles in the preset detection time period. It can be understood that the number of synchronization cycles in the preset detection time period is the number of tasks processed by the multiple core clusters in the preset detection time period.
  • the ratio is greater than or equal to the first preset ratio, and if so, the next step is executed; otherwise, no further processing is performed.
  • the first preset ratio can be set according to actual needs, which is not limited in the present disclosure.
  • the ratio is greater than or equal to the first preset ratio, it indicates that the task processing time of the core cluster is always at the maximum among all core clusters, so it can be determined that the core cluster is in an overload state, that is, a busy state. If the ratio is smaller than the first preset ratio, it indicates that the core cluster is not in an overloaded state, that is, not in a busy state, and therefore no further processing may be performed.
  • the load status can be determined by using the task processing time as the frequency of the synchronization cycle, thereby improving the accuracy of load status judgment.
  • step S3 may further include: increasing the number of second processing cores in the core cluster that can currently perform jobs.
  • Fig. 5 is a flow chart of the regulation and control process of the core control method of the embodiment of the present disclosure.
  • the core cluster is The step S3 of the control processing may further include: step S31a to step S33a.
  • Step S31a if the load state of the core cluster is busy, determine whether the core cluster has adjustable voltage domain and frequency domain, if yes, execute step S32a, otherwise execute step S33a.
  • the load state of the core cluster is busy, it is checked whether there are any second processing cores corresponding to the same operating voltage and operating frequency among all the second processing cores that can currently perform operations in the core cluster and the operating voltage and operating frequency can be controlled. If there are a plurality of second processing cores that can be adjusted, it is determined that the core cluster has adjustable voltage domains and frequency domains, otherwise it is determined that the core cluster does not have adjustable voltage domains and frequency domains, wherein the core cluster has adjustable voltage domains and frequency domains.
  • the adjustable voltage domain means that multiple second processing cores of the core cluster correspond to an operating voltage and the operating voltage is adjustable, and in the same adjustable voltage domain, all corresponding second processing cores share the same operating voltage setting;
  • the core cluster has an adjustable frequency domain, which means that multiple second processing cores of the core cluster correspond to one operating frequency and the operating frequency is adjustable. In the same adjustable frequency domain, all corresponding second processing cores share the same An operating frequency setting.
  • the core cluster has a voltage domain, and further, when the operating voltage is adjustable, it means that the voltage domain is an adjustable voltage domain, Correspondingly, the working voltage has a linear relationship with the working frequency, so the core cluster has an adjustable frequency domain.
  • Step S32a among the second processing cores of the core cluster that can currently perform operations, the operating voltage and operating frequency of the second processing core corresponding to the adjustable voltage domain and frequency domain are increased, and the process ends.
  • the operating voltage and operating frequency of some or all of the second processing cores corresponding to the adjustable voltage domain and frequency domain in the core cluster can be increased, thereby improving
  • the operating computing efficiency of the part or all of the second processing cores is used to improve the efficiency of processing tasks of the part or all of the second processing cores, thereby improving the overall task processing efficiency of the core cluster.
  • the load state of the core cluster is a busy state according to the comparison result between the first duration and the first preset duration, according to the corresponding relationship between the preset duration in the busy state and the voltage adjustment range,
  • the voltage adjustment range corresponding to the first duration is determined
  • the frequency adjustment range corresponding to the first duration is determined according to the preset corresponding relationship between the duration in the busy state and the frequency adjustment range.
  • the operating voltage of the second processing core corresponding to the adjustable voltage domain and frequency domain is adjusted to a corresponding voltage, so that the second processing core can adjust the and, according to the frequency adjustment range corresponding to the first duration, adjust the working frequency of the second processing core corresponding to the adjustable voltage domain and frequency domain to the corresponding frequency, so that the first Two processing cores run based on the adjusted operating frequency.
  • the corresponding relationship between the duration and the voltage adjustment range, and the corresponding relationship between the duration and the frequency adjustment range can be set according to actual needs. For example, assuming that the first preset time length is 10 minutes, the voltage adjustment range corresponding to the time length range of 10 minutes to 20 minutes can be set as 10%, and the voltage adjustment range corresponding to the time length range of 20 minutes to 40 minutes is 15 minutes. %, the voltage adjustment range corresponding to the duration range from 40 minutes to 50 minutes is 20%, and so on. Similarly, the corresponding relationship between the duration and the frequency adjustment range can be set, which will not be repeated here.
  • the first duration is 15 minutes
  • the first preset duration is 10 minutes
  • the voltage adjustment range corresponding to the first duration is found to be 10%
  • the working voltage of the second processing core corresponding to the adjustable voltage domain and frequency domain is increased by 10%.
  • the memory space usage growth rate in the preset busy state may be According to the corresponding relationship with the voltage adjustment range, determine the voltage adjustment range corresponding to the memory space usage growth rate corresponding to the core cluster, and determine the corresponding relationship between the memory space usage growth rate and the frequency adjustment range in the preset busy state.
  • the working voltage of the second processing core corresponding to the adjustable voltage domain and frequency domain is adjusted to a corresponding voltage, so that the The second processing core runs based on the adjusted operating voltage, and according to the frequency adjustment range corresponding to the memory space usage growth rate corresponding to the core cluster, the work of the second processing core corresponding to the adjustable voltage domain and frequency domain The frequency is increased to a corresponding frequency, so that the second processing core operates based on the adjusted working frequency.
  • the corresponding relationship between the growth rate of memory space usage and the voltage adjustment range, and the corresponding relationship between the growth rate of memory space usage and the frequency adjustment range can be set according to actual needs.
  • the description of the corresponding relationship with the voltage adjustment range and the corresponding relationship between the duration and the frequency adjustment range will not be repeated here.
  • the task processing duration in the preset busy state determines the voltage adjustment range corresponding to the task processing duration corresponding to the core cluster, and determine the corresponding task of the core cluster according to the preset corresponding relationship between task processing duration and frequency adjustment range in the busy state The frequency adjustment range corresponding to the processing duration.
  • the working voltage of the second processing core corresponding to the adjustable voltage domain and frequency domain is adjusted to the corresponding voltage, so that the second The processing core operates based on the adjusted operating voltage, and, according to the frequency adjustment range corresponding to the task processing duration corresponding to the core cluster, the operating frequency of the second processing core corresponding to the adjustable voltage domain and frequency domain is increased to corresponding frequency, so that the second processing core runs based on the adjusted working frequency.
  • the corresponding relationship between the task processing time and the voltage adjustment range, and the corresponding relationship between the task processing time and the frequency adjustment range can be set according to actual needs.
  • the description of the corresponding relationship between the duration and the frequency adjustment range will not be repeated here.
  • the amplitude can be adjusted according to the preset frequency and voltage in the busy state Determine the voltage adjustment range corresponding to the above frequency corresponding to the core cluster, and determine the frequency adjustment corresponding to the above frequency corresponding to the core cluster according to the preset corresponding relationship between the frequency in the busy state and the frequency adjustment range amplitude.
  • the operating voltage of the second processing core corresponding to the adjustable voltage domain and frequency domain is adjusted to the corresponding voltage, so that the second processing The core operates based on the adjusted operating voltage, and, according to the frequency adjustment range corresponding to the frequency corresponding to the core cluster, the operating frequency of the second processing core corresponding to the adjustable voltage domain and frequency domain is increased to the corresponding frequency, so that the second processing core operates based on the adjusted operating frequency.
  • the corresponding relationship between the frequency and the voltage adjustment range, and the corresponding relationship between the frequency and the frequency adjustment range can be set according to actual needs.
  • the description of the corresponding relationship between the duration and the frequency adjustment range will not be repeated here.
  • Step S33a increase the number of second processing cores in the core cluster that can currently perform jobs, and end the process.
  • the overall task processing efficiency of the core cluster can be improved by increasing the number of second processing cores in the core cluster that can currently perform jobs.
  • the number of second processing cores that need to be increased can be determined according to the busyness of the core cluster, and the busyness of the core cluster can be determined by, for example, the above-mentioned first duration, the above-mentioned memory space Use the growth rate, the above-mentioned task processing time, or the above-mentioned frequency representation.
  • the corresponding relationship between the duration of the busy state and the number of additional cores required can be preset, and when the load state of the core cluster is determined to be busy according to the comparison result of the first duration and the first preset duration, you can According to the corresponding relationship between the preset duration in the busy state and the required number of additional cores, determine the required number of additional cores corresponding to the first duration, so as to increase the corresponding number of second processes that can currently perform jobs in the core cluster core.
  • the step of increasing the number of second processing cores that can currently perform operations in the core cluster may further include: adding one or more idle second processing cores outside the core cluster in the many-core system to into the core cluster as the second processing core currently available for operations in the core cluster; and/or,
  • One or more second processing cores in the closed state in the core cluster are awakened to serve as the second processing cores in the core cluster that can currently perform jobs.
  • each second processing core there is a controller in each second processing core, and the controller is used to control the second processing core to shut down or wake up (turn on) the second processing core, by sending a wake-up instruction to the controller of the second processing core , the second processing core can be woken up, and the second processing core can be shut down by sending a shutdown command to the controller of the second processing core.
  • step S3 may further include: reducing the number of second processing cores in the core cluster that can currently perform jobs.
  • Fig. 6 is a flowchart of the regulation and control process of the core control method of the embodiment of the present disclosure.
  • the core cluster Step S3 of performing regulation processing may further include: step S31b to step S33b.
  • Step S31b if the load state of the core cluster is low load state, determine whether the core cluster has adjustable voltage domain and frequency domain, if yes, execute step S32b, otherwise execute step S33b. For example, in the case that the load state of the core cluster is a low load state, it is checked whether all the second processing cores in the core cluster that can currently perform operations have the same operating voltage and operating frequency and the voltage and frequency are adjustable. If there are a plurality of second processing cores, it is determined that the core cluster has an adjustable voltage domain and a frequency domain; otherwise, it is determined that the core cluster does not have an adjustable voltage domain and a frequency domain.
  • step S32b among the second processing cores in the core cluster that can currently perform operations, the operating voltage and/or operating frequency of the second processing cores corresponding to the adjustable voltage domain and frequency domain are lowered, and the process ends.
  • the operating voltage and operating frequency of some or all of the second processing cores corresponding to the adjustable voltage domain and frequency domain in the core cluster can be lowered, thereby Reduce the operational computing efficiency of the part or all of the second processing cores, so as to effectively save the power consumption of the core cluster, reduce the power consumption of the many-core system, and save resource utilization.
  • the load state of the core cluster when the load state of the core cluster is determined to be a low-load state according to the comparison result between the above-mentioned second duration and the second preset duration, it may be based on the correspondence between the preset duration in the low-load state and the voltage adjustment range
  • the voltage adjustment range corresponding to the second time length is determined, and the frequency adjustment range corresponding to the second time length is determined according to the preset corresponding relationship between the time length and the frequency adjustment range under the low load state.
  • the operating voltage of the second processing core corresponding to the adjustable voltage domain and frequency domain is lowered to the corresponding voltage, so that the second processing core can adjust the The last working voltage runs, and, according to the frequency adjustment range corresponding to the second duration, the working frequency of the second processing core corresponding to the adjustable voltage domain and frequency domain is adjusted down to the corresponding frequency, so that the first Two processing cores run based on the adjusted operating frequency.
  • the corresponding relationship between the duration and the voltage adjustment range, and the corresponding relationship between the duration and the frequency adjustment range can be set according to actual needs. For example, assuming that the second preset duration is 10 minutes, the voltage adjustment range corresponding to the duration range from 10 minutes to 20 minutes can be set to be 10%, and the voltage adjustment range corresponding to the duration range from 20 minutes to 40 minutes can be set to 15 minutes. %, the voltage adjustment range corresponding to the duration range from 40 minutes to 50 minutes is 20%, and so on. Similarly, you can set the corresponding relationship between the duration of the low-load state and the frequency adjustment range, which will not be repeated here.
  • the second duration is 15 minutes
  • the second preset duration is 10 minutes
  • the voltage adjustment range corresponding to the second duration is queried according to the preset correspondence between the duration in the low-load state and the voltage adjustment range If it is 10%, then the working voltage of the second processing core with voltage domain and frequency domain will be lowered by 10%.
  • the memory space usage in the preset low-load state can be The corresponding relationship between the growth rate and the voltage adjustment range determines the voltage adjustment range corresponding to the memory space usage growth rate corresponding to the core cluster, and according to the corresponding relationship between the memory space usage growth rate and the frequency adjustment range under the preset low load state , to determine the frequency adjustment range corresponding to the memory space usage growth rate corresponding to the core cluster.
  • the working voltage of the second processing core corresponding to the adjustable voltage domain and frequency domain is adjusted down to the corresponding voltage, so that the first The second processing core runs based on the adjusted operating voltage, and adjusts the operating frequency of the second processing core corresponding to the adjustable voltage domain and frequency domain according to the frequency adjustment range corresponding to the memory space usage growth rate corresponding to the core cluster. down to a corresponding frequency so that the second processing core operates based on the adjusted operating frequency.
  • the corresponding relationship between the memory space usage growth rate and the voltage adjustment range, as well as the memory space usage growth rate and the frequency adjustment range can be set according to actual needs.
  • the description of the corresponding relationship between the duration and the voltage adjustment range and the corresponding relationship between the duration and the frequency adjustment range will not be repeated here.
  • the task processing in the preset low-load state can be The corresponding relationship between the duration and the voltage adjustment range, determine the voltage adjustment range corresponding to the task processing time corresponding to the core cluster, and determine the corresponding relationship between the task processing time and the frequency adjustment range under the preset low load state.
  • the frequency adjustment range corresponding to the corresponding task processing time can be The corresponding relationship between the duration and the voltage adjustment range.
  • the working voltage of the second processing core corresponding to the adjustable voltage domain and frequency domain is adjusted down to the corresponding voltage, so that the second The processing core operates based on the adjusted operating voltage, and, according to the frequency adjustment range corresponding to the task processing duration corresponding to the core cluster, the operating frequency of the second processing core corresponding to the adjustable voltage domain and frequency domain is lowered to corresponding frequency, so that the second processing core runs based on the adjusted working frequency.
  • the corresponding relationship between the task processing time and the voltage adjustment range, and the corresponding relationship between the task processing time and the frequency adjustment range can be set according to actual needs.
  • the description of the corresponding relationship between the adjustment range and the corresponding relationship between the duration and the frequency adjustment range will not be repeated here.
  • Step S33b reducing the number of second processing cores in the core cluster that can currently perform jobs, and ending the process.
  • the power consumption of the core cluster can be saved and the power consumption of the many-core system can be reduced by reducing the number of second processing cores that can currently perform operations in the core cluster. , save resource utilization.
  • the number of second processing cores that can currently perform jobs that need to be reduced can be determined according to the low load level of the core cluster, and the low load level of the core cluster can be determined by, for example, the above-mentioned second duration, the above-mentioned Task processing duration or the above-mentioned frequency representation.
  • the corresponding relationship between the duration in the low-load state and the number of cores required to be reduced can be preset, and when the load state of the core cluster is determined to be the low-load state according to the comparison result between the second duration and the second preset duration , according to the preset corresponding relationship between the duration of the low-load state and the number of cores required to be reduced, the number of cores required to be reduced corresponding to the second duration can be determined, so as to reduce the corresponding number of currently available jobs in the core cluster Second processing core.
  • the step of reducing the number of second processing cores currently capable of operating in the core cluster may further include: removing at least one second processing core currently capable of operating in the core cluster from the core cluster ;
  • At least one second processing core that is currently capable of operating in the core cluster is controlled to be in a closed state.
  • FIG. 7 is a schematic diagram of an application scenario of a many-core system according to an embodiment of the present disclosure.
  • the various tasks in the task pipeline of the face recognition business can include video stream decoding tasks, face detection tasks, face feature recognition tasks, face feature extraction tasks, and face feature recognition tasks that need to be executed in sequence. matching tasks, etc.
  • each core cluster of the many-core system can respectively process a task in the task pipeline, and each core cluster of the many-core system sequentially processes its corresponding tasks according to the operation sequence of the pipeline.
  • the task data after the core cluster processes the corresponding task can be sent to the buffer corresponding to the core cluster that is sequentially located after the core cluster on the task pipeline for caching, so that the sequence is located at
  • the core clusters behind the core cluster read as required, and start to run their corresponding tasks at the same time, wherein the cache memory can also cache data transferred by other external devices.
  • Fig. 8 is a composition block diagram of a core cluster of a many-core system according to an embodiment of the present disclosure. As shown in Fig. 8, each core cluster includes a plurality of sub-clusters, and each sub-cluster includes at least one second processing core that can currently perform operations , the plurality of sub-clusters are used to process tasks corresponding to the core clusters in parallel. For example, the task corresponding to the core cluster is face recognition. After obtaining multiple frames of image data, each sub-cluster in the multiple sub-clusters of the core cluster can be responsible for face recognition based on one or more frames of image data. Suppose there are three sub-clusters There are three frames of images in a cluster, and the three sub-clusters can respectively process one frame of images.
  • the core control method may further include: Step S4a-Step S6a.
  • Step S4a according to the newly added second processing core in the core cluster that can currently perform operations, build a new sub-cluster of the core cluster and obtain the configuration information of the new sub-cluster.
  • FIG. 9 is a block diagram of a core cluster after a new sub-cluster is established according to an embodiment of the present disclosure.
  • one or more second processing cores that can currently perform operations can be newly added, As a new subcluster of the core cluster, and obtain the configuration information of the new subcluster.
  • the new sub-cluster can process tasks corresponding to the core cluster in parallel with other sub-clusters
  • the new sub-cluster includes one or more second processing cores that can currently perform operations
  • the configuration information includes but is not limited to the new The number of second processing cores in the sub-cluster and address information of each second processing core.
  • Step S5a sending the configuration information of the new subcluster to the target processing core in the predecessor core cluster of the core cluster, so that the target processing core in the predecessor core cluster can establish The input for this new subcluster is shunted.
  • step S5a the main processing core of the core cluster sends the configuration information of the new sub-cluster to the target processing core in the predecessor core cluster of the core cluster.
  • the previous core cluster is the previous core cluster of the core cluster on the task pipeline
  • the target processing core in the previous core cluster is used to establish the input of the new sub-cluster according to the configuration information of the new sub-cluster of the core cluster.
  • Split, the input split is the path for outputting data from the previous core cluster to the new sub-cluster.
  • the target processing core in the predecessor core cluster may be the main processing core in the predecessor core cluster, or the second processing core responsible for data output in the predecessor core cluster.
  • the predecessor core cluster may include a task scheduler, and the task scheduler may be configured in the second processing core in charge of data output in the predecessor core cluster, or may be configured in the main processing core of the predecessor core cluster. processing core.
  • the task scheduler maintains a previous task list, which is marked with the number of sub-clusters of the next core cluster on the task pipeline where the previous core cluster is located, the number of second processing cores included in each sub-cluster, and the number of sub-clusters of each sub-cluster. Cluster address and other information.
  • each subcluster of the core cluster marked in the previous task list is correspondingly set with a flag bit, and the value of the flag bit represents the state of the corresponding subcluster. For example, when the flag bit is a valid value, it means that the corresponding subcluster is currently Available, and when the flag bit is an invalid value, it means that the corresponding subcluster is not available.
  • the predecessor core cluster can allocate tasks to each subcluster of the core cluster in the predecessor task list according to the predecessor task list maintained by it.
  • the predecessor core cluster may update the predecessor task list maintained by it according to the update information transmitted by a core cluster located behind and adjacent to it on the task pipeline. For example, after a core cluster adds cores to form a new sub-cluster, the main processing core of the core cluster can send the configuration information of the new sub-cluster to its predecessor core cluster, so that the target processing core of the predecessor core cluster can configure the new sub-cluster The information is written into the predecessor task list, and the flag bit corresponding to the added new subcluster is set as a valid value.
  • Step S6a sending the configuration information of the new sub-cluster to the target processing core in the successor core cluster of the core cluster, so that the target processing core in the successor core cluster can establish the new sub-cluster according to the configuration information of the new sub-cluster of the core cluster.
  • step S6a the main processing core of the core cluster sends the configuration information of the new sub-cluster to the target processing core in the successor core cluster of the core cluster.
  • the successor core cluster is the last core cluster of the core cluster on the task pipeline
  • the target processing core in the successor core cluster is used to establish the output shunt of the new sub-cluster according to the configuration information of the new sub-cluster of the core cluster
  • the output split is a path for the new sub-cluster to output data to the successor core cluster.
  • the target processing core in the successor core cluster may be the main processing core of the successor core cluster, or the second processing core responsible for data output in the successor core cluster.
  • the target processing core of the successor core cluster can maintain a successor task list as required, and the successor task list is marked with the number of sub-clusters of the previous core cluster on the task pipeline where the successor core cluster is located, and the number of sub-clusters of each sub-cluster. Information such as the number of second processing cores and the address of each sub-cluster is included.
  • the successor core cluster can update the successor task list maintained by it according to the update information delivered by a core cluster located before and adjacent to it on the task pipeline. For example, after a core cluster adds cores to form a new subcluster, the main processing core of the core cluster can send the configuration information of the new subcluster to its successor core cluster, so that the target processing core of the successor core cluster can write the configuration information to into the successor task list.
  • the step of reducing the number of second processing cores in the core cluster that can currently perform operations may include: reducing the number of sub-clusters in the core cluster that can currently perform operations, or reducing any one of the core clusters. or the number of second processing cores in multiple subclusters. After reducing the number of sub-clusters that can currently perform operations in the core cluster or reducing the number of second processing cores in any one or more sub-clusters in the core cluster, the main processing core of the core cluster can be sent to the core cluster.
  • the predecessor core cluster and the successor core cluster send the update information of the core cluster, so that the predecessor core cluster updates the previous task list maintained by it, updates the corresponding flag bit of the sub-cluster, and deletes the corresponding input shunt, and the successor core cluster updates its maintenance list of successor tasks and delete the corresponding output stream.
  • the step of determining the load status of the core cluster may further include:
  • the load state of the core cluster is an idle state, and the idle state level is first level.
  • the idle state can be understood as an underload state or a zero load state, which belongs to a low load state under special circumstances.
  • the third preset duration is longer than the second preset duration, and the third preset duration can be set according to actual needs, which is not limited in the present disclosure.
  • the step of determining the load status of the core cluster may further include: if the real-time memory space usage rate is 0 and continues to be If the second duration of 0 is greater than or equal to the third preset duration, it is determined that the load state of the core cluster is an idle state, and the idle state level is the second level.
  • the load state of the core cluster can be determined as the idle state and the level of the idle state, so as to perform corresponding regulation and processing on the core cluster, thereby reducing the power of the core cluster. consumption.
  • the step of regulating and processing the core cluster according to the load state of the core cluster includes: When the load state of the core cluster is the idle state of the first level, a blank frame is inserted into the buffer corresponding to the core cluster, wherein the blank frame can be a preset frame image, thereby maintaining the work of the core cluster State, to ensure that the core cluster spits out the processed data.
  • a corresponding gating clock can be set to control whether the corresponding second processing core works or not.
  • the clock gating is used to output clock signals to the multiple core clusters to drive the multiple core clusters to work or not to work based on the clock signals.
  • the many-core system includes multiple core clusters
  • the core control method is implemented by the first processing core, and the first processing core uniformly performs load detection and management on each core cluster.
  • the core control method also includes: suspending sending synchronization signals to the multiple core clusters, so that the multiple core clusters can suspend synchronous update, thereby saving resources of the many-core system, achieving power saving effects, and reducing the power consumption of the many-core system. power consumption.
  • the synchronization signal is used to control the multiple core clusters to perform task processing based on the synchronization cycle.
  • the same task correspondingly processed by multiple core clusters may be, for example, a face recognition task of a video to be synthesized.
  • a blank frame is inserted in the cache memory of each core cluster in the plurality of core clusters to maintain the working state of the plurality of core clusters, and wait for the plurality of core clusters to spit out all the processed data. After the data is input and there is no data output, the sending of synchronization signals to the multiple core clusters is suspended, or all gating clocks corresponding to the multiple core clusters are turned off at the same time.
  • the step of regulating and processing the core cluster according to the load state of the core cluster includes: In the case that the load state of the core cluster is the idle state of the second level, among the second processing cores currently available for operation in the core cluster, the adjustable voltage domain and frequency domain corresponding to the second processing core The operating voltage and operating frequency are lowered, or the number of second processing cores in the core cluster that can currently perform operations is reduced.
  • Fig. 10 is a block diagram of a core control device provided by an embodiment of the present disclosure.
  • an embodiment of the present disclosure provides a core control device 300, the core control device 300 is applied to a many-core system, the many-core system includes at least one core cluster, and each core cluster includes at least one second processing core,
  • the core control device 300 includes: a load data detection module 301 , a load state detection module 302 and a core control module 303 .
  • the load data detection module 301 is configured to detect the load of the corresponding core cluster, and obtain the load data corresponding to the core cluster; the load status detection module 302 is configured to determine the load data of the core cluster according to the load data corresponding to the core cluster. Load status; the core control module 303 is configured to perform control processing on the core cluster according to the load status of the core cluster; wherein, the control processing includes one of the following control methods: the second processing of the currently available jobs in the core cluster Regulate the number of cores; regulate the operating voltage and operating frequency of the second processing core that can currently perform operations in the core cluster; insert blank frames into the buffer corresponding to the core cluster.
  • each of the core clusters is correspondingly provided with a buffer, and the buffer is used for caching the task data of the task to be processed by the corresponding core cluster;
  • the load data detection module is configured to: detect the real-time memory space usage rate of the buffer corresponding to the core cluster; detect the difference between the real-time memory space usage rate of the buffer corresponding to the core cluster and the first preset threshold Comparison result; record the duration of the real-time memory space usage rate continuously greater than or equal to the first preset threshold, and record it as the first duration, the load data corresponding to the core cluster includes the first duration,
  • the load state detection module is used to: judge whether the first duration is greater than or equal to the first preset duration; if the first duration is greater than or equal to the first preset duration, determine whether the core cluster The load status of is busy.
  • the load data detection module is configured to: detect a comparison result between the real-time memory space usage rate of the buffer corresponding to the core cluster and a second preset threshold, and the second preset threshold is greater than 0 and less than the first preset threshold; record the duration of the real-time memory space usage that is continuously less than or equal to the second preset threshold, and record it as the second duration, and the load data corresponding to the core cluster includes the The second duration, wherein the load state detection module is used to: determine whether the second duration is greater than or equal to a second preset duration; if the second duration is greater than or equal to a second preset duration, It is determined that the load state of the core cluster is a low load state.
  • each of the core clusters is correspondingly provided with a buffer, and the buffer is used for caching the task data of the corresponding tasks to be processed by the core cluster; the load data detection module is used for : Obtain the memory space usage growth rate of the buffer corresponding to the core cluster, the load data corresponding to the core cluster includes the corresponding memory space usage growth rate of the buffer, wherein the load status detection module is used to : judging whether the memory space usage growth rate is greater than or equal to a first preset growth rate value; if the memory space usage growth rate is greater than or equal to the first preset growth rate value, determine the load status of the core cluster is busy.
  • the load data corresponding to the core cluster also includes the corresponding real-time memory space usage rate of the buffer, and the load status detection module is configured to: When the memory space usage rate is less than or equal to the preset usage rate, it is determined whether the growth rate of the memory space usage is less than or equal to a second preset growth rate value, and the second preset growth rate value is a negative value; If the growth rate of memory space usage is less than or equal to the second preset growth rate value, it is determined that the load state of the core cluster is a low load state.
  • the load data detection module is configured to: detect in real time the task processing duration required by the core cluster to process the task, the load data corresponding to the core cluster includes the task processing duration, wherein the load status
  • the detection module is used to: determine whether the task processing duration required by the core cluster processing task is greater than or equal to the first preset processing duration; the task processing duration required by the core cluster processing task is greater than or equal to the first preset processing duration In the case of , it is determined that the load state of the core cluster is a busy state.
  • the load state detection module is configured to: determine whether the task processing duration required by the core cluster to process the task is less than or equal to the second preset processing duration; If the duration is less than or equal to the second preset processing duration, it is determined that the load state of the core cluster is a low load state.
  • the many-core system includes a plurality of core clusters, and the multiple core clusters perform task processing based on a synchronization period, and the synchronization period is the maximum task processing time required for each core cluster to process the task.
  • the task processing duration; the load state detection module is used to: count the task processing duration corresponding to the core cluster as the frequency of the synchronization cycle within the preset detection time period; the task processing duration corresponding to the core cluster as the frequency of the synchronization cycle
  • the frequency of the synchronization cycle is greater than or equal to the first preset number of times, it is determined that the load state of the core cluster is a busy state.
  • the many-core system includes a plurality of core clusters, and the multiple core clusters perform task processing based on a synchronization period, and the synchronization period is the maximum task processing time required for each core cluster to process the task. task processing time;
  • the load state detection module is used to: count the task processing duration corresponding to the core cluster as the frequency of the synchronization cycle within the preset detection time period; calculate the task processing duration corresponding to the core cluster as the frequency of the synchronization cycle , and the ratio of the number of synchronization cycles within the preset detection time period; when the ratio is greater than or equal to the first preset ratio, it is determined that the load state of the core cluster is a busy state.
  • the core control module is configured to: determine whether the core cluster has an adjustable voltage domain and frequency domain when the load status of the core cluster is busy; In the case of an adjustable voltage domain and frequency domain, among the second processing cores that can currently perform operations in the core cluster, the operating voltage and operating frequency of the second processing core corresponding to the adjustable voltage domain and frequency domain are adjusted. High; if it is determined that the core cluster does not have an adjustable voltage domain and frequency domain, increase the number of second processing cores in the core cluster that can currently perform operations.
  • the core regulation module is configured to: determine whether the core cluster has an adjustable voltage domain and frequency domain when the load state of the core cluster is a low load state; In the case of an adjustable voltage domain and frequency domain, the operating voltage and operating frequency of the second processing core corresponding to the adjustable voltage domain and frequency domain among the second processing cores that can currently perform operations in the core cluster Turning down: reducing the number of second processing cores in the core cluster that can currently perform jobs in a case where it is determined that the core cluster does not have an adjustable voltage domain and frequency domain.
  • the core regulation module increases the number of second processing cores that can currently perform operations in the core cluster, including: adding one or more idle second processing cores outside the core cluster in the many-core system The core is added to the core cluster as the second processing core that can currently perform operations in the core cluster; and/or, one or more second processing cores in the closed state in the core cluster are awakened to As the second processing core currently available for jobs in this core cluster.
  • the core regulation module reduces the number of second processing cores that can currently perform operations in the core cluster, including: removing at least one second processing core that can currently perform operations in the core cluster from the a core cluster; and/or, controlling at least one second processing core that is currently capable of operating in the core cluster to be in a closed state.
  • each of the core clusters of the many-core system corresponds to a task in the processing task pipeline, and each of the core clusters includes a plurality of sub-clusters, and each of the sub-clusters includes a currently available job. At least one second processing core of the plurality of sub-clusters is used to process tasks corresponding to the core clusters in parallel; the device also includes:
  • the sub-cluster building module is used to form a new sub-cluster of the core cluster and obtain the configuration information of the new sub-cluster according to the newly added second processing core in the core cluster that can currently perform operations.
  • the new sub-cluster includes the newly added One or more second processing cores that can currently perform operations, the configuration information includes the number of second processing cores in the new sub-cluster and address information of each second processing core;
  • a first sending module configured to send the configuration information of the new sub-cluster to the target processing core in the predecessor core cluster of the core cluster;
  • a second sending module configured to send the configuration information of the new sub-cluster to a target processing core in a successor core cluster of the core cluster;
  • the previous core cluster is the previous core cluster of the core cluster on the task pipeline
  • the target processing core in the previous core cluster is used to establish The input split of the new sub-cluster, the input split is the path for the previous core cluster to output data to the new sub-cluster
  • the successor core cluster is the next core cluster of the core cluster on the task pipeline
  • the The target processing core in the successor core cluster is used to establish an output shunt of the new sub-cluster according to the configuration information of the new sub-cluster of the core cluster, and the output shunt is a path for the new sub-cluster to output data to the successor core cluster .
  • the load state detection module is configured to: if the real-time memory space usage rate is 0, and the second duration of 0 is greater than or equal to the second preset duration and less than the second duration Three preset durations, then determine that the load state of the core cluster is an idle state, and the idle state level is the first level; if the real-time memory space usage rate is 0, and the second duration of 0 is greater than or equal to For the third preset duration, it is determined that the load state of the core cluster is an idle state, and the idle state level is the second level,
  • the core control module is configured to: insert a blank frame into the buffer corresponding to the core cluster when the load state of the core cluster is the first-level idle state; When the state is the idle state of the second level, among the second processing cores that can currently perform operations in the core cluster, the operating voltage and operating frequency of the second processing core corresponding to the adjustable voltage domain and frequency domain are adjusted. Low, or, reduce the number of secondary processing cores in this core cluster that are currently available for jobs.
  • the core control module is further configured to: close the gates corresponding to the second processing cores in the core cluster A clock, the gated clock is used to output a clock signal to the corresponding second processing core in the core cluster to drive the corresponding second processing core to work or not to work based on the clock signal.
  • the many-core system includes a plurality of core clusters.
  • the core control device When there are multiple core clusters corresponding to processing the same task, and the load states of the multiple core clusters are all in the idle state of the first level, the core control device further includes: a signal stop sending module, configured to suspend sending synchronization signals to the multiple core clusters, and the synchronization signals are used for The plurality of core clusters are controlled to perform task processing based on a synchronous cycle.
  • the core control device 300 provided by the embodiment of the present disclosure is used to implement the above-mentioned core control method.
  • the core control device 300 is used to implement the above-mentioned core control method.
  • the description in the above-mentioned core control method please refer to the description in the above-mentioned core control method, which will not be repeated here.
  • An embodiment of the present disclosure also provides a processing core, where the processing core includes the above-mentioned core control device.
  • An embodiment of the present disclosure also provides a many-core system, which includes a plurality of processing cores, the plurality of processing cores include a first processing core and a plurality of second processing cores, and part or all of the plurality of second processing cores are second
  • the processing core is divided into at least one core cluster, each core cluster includes at least one second processing core, each core cluster has a main processing core, and the main processing core of the core cluster is a second processing core specified in the core cluster .
  • the first processing core includes the above-mentioned core control device, that is, the first processing core adopts the processing core including the above-mentioned core control device, and/or, at least part of the main processing cores of the core cluster include the above-mentioned The core control device, that is, at least part of the main processing cores of the core cluster adopts the processing core including the above-mentioned core control device.
  • Fig. 11 is a composition block diagram of an electronic device provided by an embodiment of the present disclosure.
  • an embodiment of the present disclosure provides an electronic device, the electronic device includes a plurality of processing cores 701 and an on-chip network 702, wherein the plurality of processing cores 701 are all connected to the on-chip network 702, and the on-chip network 702 is used to interact multiple One handles inter-core data and external data.
  • one or more processing cores 701 store one or more instructions, and the one or more processing cores 701 execute the one or more processing cores 701, so that the one or more processing cores 701 can execute the above core control method.
  • an embodiment of the present disclosure also provides a computer-readable medium on which a computer program is stored, wherein the computer program implements the above-mentioned core control method when executed by a processing core of a many-core system.
  • An embodiment of the present disclosure also provides a computer program product, which includes a computer program, and when the computer program is executed by a processing core of a many-core system, the above-mentioned core control method is implemented.
  • the functional modules/units in the system, and the device can be implemented as software, firmware, hardware, and an appropriate combination thereof.
  • the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be composed of several physical components. Components cooperate to execute.
  • Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit .
  • Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media).
  • computer storage media includes both volatile and nonvolatile media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. permanent, removable and non-removable media.
  • Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, tape, magnetic disk storage or other magnetic storage devices, or can Any other medium used to store desired information and which can be accessed by a computer.
  • communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Power Sources (AREA)

Abstract

La présente invention porte sur un procédé et un appareil de commande de cœur pour système à grand nombre de cœurs, et sur un système à grand nombre de cœurs. Le système à grand nombre de cœurs comprend au moins un groupe de cœurs, chaque groupe de cœurs comprenant au moins un second cœur de traitement. Le procédé de commande de cœur comprend : pour n'importe quel groupe de cœurs, la réalisation d'une détection de charge sur le groupe de cœurs pour acquérir des données de charge correspondant au groupe de cœurs ; la détermination d'un état de charge du groupe de cœurs en fonction des données de charge correspondant au groupe de cœurs ; la réalisation d'un traitement de régulation sur le groupe de cœurs en fonction de l'état de charge du groupe de cœurs, le traitement de régulation comprenant l'un des procédés de régulation suivants : la réalisation d'une régulation sur le nombre de seconds cœurs de traitement, qui peuvent actuellement effectuer une opération, dans le groupe de cœurs, et la réalisation d'une régulation sur les tensions de fonctionnement et les fréquences de fonctionnement des seconds cœurs de traitement, qui peuvent actuellement effectuer une opération, dans le groupe de cœurs ; et l'insertion d'une trame vierge dans un tampon correspondant au groupe de cœurs.
PCT/CN2021/133963 2021-05-24 2021-11-29 Procédé et appareil de commande de cœur pour système à grand nombre de cœurs, et système à grand nombre de cœurs WO2022247189A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110566468.7A CN115391021A (zh) 2021-05-24 2021-05-24 核心控制方法及装置、处理核心、系统、电子设备、介质
CN202110566468.7 2021-05-24

Publications (1)

Publication Number Publication Date
WO2022247189A1 true WO2022247189A1 (fr) 2022-12-01

Family

ID=84114532

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/133963 WO2022247189A1 (fr) 2021-05-24 2021-11-29 Procédé et appareil de commande de cœur pour système à grand nombre de cœurs, et système à grand nombre de cœurs

Country Status (2)

Country Link
CN (1) CN115391021A (fr)
WO (1) WO2022247189A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115686871A (zh) * 2022-12-30 2023-02-03 摩尔线程智能科技(北京)有限责任公司 用于多核系统的核心调度方法和装置

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102298580A (zh) * 2010-06-22 2011-12-28 Sap股份公司 使用异步缓冲器的多核查询处理
US20120216064A1 (en) * 2011-02-21 2012-08-23 Samsung Electronics Co., Ltd. Hot-plugging of multi-core processor
US20140317380A1 (en) * 2013-04-18 2014-10-23 Denso Corporation Multi-core processor
CN105608049A (zh) * 2015-12-23 2016-05-25 魅族科技(中国)有限公司 智能终端的cpu控制方法及控制装置
CN107844152A (zh) * 2016-09-20 2018-03-27 华为技术有限公司 负载监控器、基于多核心架构的供电系统和电压调整方法
CN109144658A (zh) * 2017-06-27 2019-01-04 阿里巴巴集团控股有限公司 有限资源的负载均衡方法、装置及电子设备
CN111198757A (zh) * 2020-01-06 2020-05-26 北京小米移动软件有限公司 Cpu内核调度方法、cpu内核调度装置及存储介质
CN112463367A (zh) * 2020-11-19 2021-03-09 苏州浪潮智能科技有限公司 一种存储系统性能优化方法、系统及电子设备和存储介质

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102298580A (zh) * 2010-06-22 2011-12-28 Sap股份公司 使用异步缓冲器的多核查询处理
US20120216064A1 (en) * 2011-02-21 2012-08-23 Samsung Electronics Co., Ltd. Hot-plugging of multi-core processor
US20140317380A1 (en) * 2013-04-18 2014-10-23 Denso Corporation Multi-core processor
CN105608049A (zh) * 2015-12-23 2016-05-25 魅族科技(中国)有限公司 智能终端的cpu控制方法及控制装置
CN107844152A (zh) * 2016-09-20 2018-03-27 华为技术有限公司 负载监控器、基于多核心架构的供电系统和电压调整方法
CN109144658A (zh) * 2017-06-27 2019-01-04 阿里巴巴集团控股有限公司 有限资源的负载均衡方法、装置及电子设备
CN111198757A (zh) * 2020-01-06 2020-05-26 北京小米移动软件有限公司 Cpu内核调度方法、cpu内核调度装置及存储介质
CN112463367A (zh) * 2020-11-19 2021-03-09 苏州浪潮智能科技有限公司 一种存储系统性能优化方法、系统及电子设备和存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115686871A (zh) * 2022-12-30 2023-02-03 摩尔线程智能科技(北京)有限责任公司 用于多核系统的核心调度方法和装置

Also Published As

Publication number Publication date
CN115391021A (zh) 2022-11-25

Similar Documents

Publication Publication Date Title
US9760158B2 (en) Forcing a processor into a low power state
TWI260543B (en) Performance scheduling method and system, and computer readable medium
US20190107874A1 (en) Method and Apparatus for Managing Global Chip Power on a Multicore System on Chip
WO2005106623A1 (fr) Dispositif de contrôle d’horloge d’unité centrale, procédé de contrôle d’horloge d’unité centrale, programme de contrôle d’horloge d’unité centrale, support d’enregistrement et support de transmission
WO2022247189A1 (fr) Procédé et appareil de commande de cœur pour système à grand nombre de cœurs, et système à grand nombre de cœurs
KR20020087388A (ko) 연산처리시스템 및 연산처리 제어방법, 업무관리시스템 및업무관리방법과 기억매체
JP2002099433A (ja) 演算処理システム及び演算処理制御方法、タスク管理システム及びタスク管理方法、並びに記憶媒体
US11650650B2 (en) Modifying an operating state of a processing unit based on waiting statuses of blocks
US20200301860A1 (en) Dispatching interrupts in a multi-processor system based on power and performance factors
CN109144680A (zh) 一种时钟滴答中断设置方法及装置
US20170116037A1 (en) Resource-aware backfill job scheduling
US9182797B2 (en) Decoupled power and performance allocation in a multiprocessing system
CN104598311A (zh) 一种面向Hadoop的实时作业公平调度的方法和装置
WO2017148253A1 (fr) Procédé et appareil de mise en œuvre de gestion d'économie d'énergie et dispositif de réseau
US9632566B2 (en) Dynamically controlling power based on work-loop performance
Yao et al. A dual delay timer strategy for optimizing server farm energy
US11243603B2 (en) Power management of an event-based processing system
Wang et al. Power saving design for servers under response time constraint
JP2008217628A (ja) Cpuの省電力システム及び省電力方法
CN109144693B (zh) 一种功率自适应任务调度方法及系统
US11803224B2 (en) Power management method, multi-processing unit system and power management module
CN116303132A (zh) 一种数据缓存方法、装置、设备以及存储介质
WO2016058149A1 (fr) Procédé pour prédire un taux d'utilisation d'un processeur, appareil de traitement et dispositif de terminal
KR20160105209A (ko) 전자 장치 및 이의 전력 제어 방법
Tsai et al. Prevent vm migration in virtualized clusters via deadline driven placement policy

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21942752

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE