WO2022247189A1 - Core control method and apparatus for many-core system, and many-core system - Google Patents

Core control method and apparatus for many-core system, and many-core system Download PDF

Info

Publication number
WO2022247189A1
WO2022247189A1 PCT/CN2021/133963 CN2021133963W WO2022247189A1 WO 2022247189 A1 WO2022247189 A1 WO 2022247189A1 CN 2021133963 W CN2021133963 W CN 2021133963W WO 2022247189 A1 WO2022247189 A1 WO 2022247189A1
Authority
WO
WIPO (PCT)
Prior art keywords
core
cluster
core cluster
processing
duration
Prior art date
Application number
PCT/CN2021/133963
Other languages
French (fr)
Chinese (zh)
Inventor
吴臻志
丁瑞强
祝夭龙
Original Assignee
北京灵汐科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京灵汐科技有限公司 filed Critical 北京灵汐科技有限公司
Publication of WO2022247189A1 publication Critical patent/WO2022247189A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present disclosure relates to the field of computer technology, and in particular to a core control method and device for many-core systems, many-core systems, electronic equipment, computer-readable media, and computer program products.
  • a many-core system usually has many cores (also called processing cores).
  • the core is the smallest computing unit in the many-core system that can be independently scheduled and has complete computing capabilities.
  • the core has certain resources such as storage and computing.
  • the cores of the many-core system can run program instructions independently, using the ability of parallel computing to speed up the running speed of the program and provide multi-tasking capabilities.
  • the present disclosure provides a core control method and device for a many-core system, a processing core, a many-core system, electronic equipment, a computer readable medium, and a computer program product.
  • the present disclosure provides a core control method for a many-core system, the many-core system includes at least one core cluster, and each of the core clusters includes at least one second processing core, the core control method Including: for any one of the core clusters, performing load detection on the core cluster, and obtaining the load data corresponding to the core cluster; according to the load data corresponding to the core cluster, determining the load status of the core cluster; according to the load of the core cluster state, the core cluster is regulated and processed; wherein, the regulated process includes one of the following control methods: regulating the number of second processing cores that can currently perform operations in the core cluster; Regulate the operating voltage and operating frequency of the second processing core of the job; insert a blank frame into the buffer corresponding to the core cluster.
  • the present disclosure provides a core control device, the core control device is applied to a many-core system, the many-core system includes at least one core cluster, and each of the core clusters includes at least one second processing core, so
  • the core control device includes: a load data detection module configured to perform load detection on a corresponding core cluster and obtain load data corresponding to the core cluster; a load state detection module configured to determine the load data corresponding to the core cluster according to the load data corresponding to the core cluster The load state of the core cluster; the core regulation module is configured to perform regulation processing on the core cluster according to the load state of the core cluster; wherein, the regulation processing includes one of the following regulation methods: the current available in the core cluster Regulate the number of second processing cores that can perform operations; regulate the operating voltage and frequency of the second processing cores that can currently perform operations in the core cluster; insert a blank frame into the buffer corresponding to the core cluster.
  • the present disclosure provides a many-core system, the many-core system includes a plurality of processing cores, the plurality of processing cores include a first processing core and a plurality of second processing cores, and some of the plurality of second processing cores Or all the second processing cores are divided into at least one core cluster, each of the core clusters includes at least one of the second processing cores, each of the core clusters has a main processing core, and the main processing core of the core cluster is the A designated second processing core in a core cluster; wherein, the first processing core includes the above-mentioned core control device, and/or at least part of the main processing cores of the core cluster includes the above-mentioned core control device.
  • the present disclosure provides an electronic device, which includes: a plurality of processing cores; and an on-chip network configured to exchange data between the plurality of processing cores and external data; wherein, one or more One or more instructions are stored in each of the processing cores, and the one or more instructions are executed by the one or more processing cores, so that the one or more processing cores can execute the above-mentioned core control method.
  • the present disclosure provides a computer-readable medium on which a computer program is stored, wherein the computer program implements the above-mentioned core control method when executed by a processing core of a many-core system.
  • the present disclosure provides a computer program product, which includes a computer program, and when the computer program is executed by a processing core of a many-core system, the above-mentioned core control method is implemented.
  • FIG. 1 is a flowchart of a core control method for a many-core system provided by an embodiment of the present disclosure
  • FIG. 2 is a block diagram of a many-core system provided by an embodiment of the present disclosure
  • FIG. 3 is a flowchart of a core control method for a many-core system provided by an embodiment of the present disclosure
  • FIG. 4 is a flow chart of a core control method for a many-core system provided by an embodiment of the present disclosure
  • FIG. 5 is a flow chart of the control process of the core control method of the embodiment of the present disclosure.
  • FIG. 6 is a flow chart of the control process of the core control method of the embodiment of the present disclosure.
  • FIG. 7 is a schematic diagram of an application scenario of a many-core system according to an embodiment of the present disclosure.
  • FIG. 8 is a block diagram of a core cluster of a many-core system according to an embodiment of the present disclosure.
  • FIG. 9 is a block diagram of a core cluster after a new sub-cluster is formed according to an embodiment of the present disclosure.
  • FIG. 10 is a block diagram of a core control device provided by an embodiment of the present disclosure.
  • Fig. 11 is a composition block diagram of an electronic device provided by an embodiment of the present disclosure.
  • each core of the many-core system stores the input data of the input device, and the arithmetic unit performs calculation according to the input data, stores the calculation result in the memory, and finally notifies the output device to receive the output result.
  • the input device and the output device can be Peripherals can also be cores in many-core systems.
  • the many-core system includes at least one core cluster, each core cluster includes at least one second processing core, and each core cluster is used to execute corresponding computing tasks.
  • each core cluster executes corresponding tasks, especially in the scenario where each core cluster executes task pipeline tasks, there are high requirements for the task execution efficiency of each core cluster, so how to effectively improve the efficiency of core cluster execution tasks , has become an urgent technical problem to be solved in the core cluster scenario of the many-core system.
  • FIG. 1 is a flow chart of a core control method for a many-core system provided by an embodiment of the present disclosure.
  • an embodiment of the present disclosure provides a core control method for a many-core system, wherein the many-core system includes at least one core cluster, and each core cluster includes at least one second processing core, and the method can be implemented by a core control device To execute, the device can be implemented by means of software and/or hardware, and the core control method includes:
  • Step S1 for any core cluster, perform load detection on the core cluster, and obtain load data corresponding to the core cluster.
  • Step S2 according to the load data corresponding to the core cluster, determine the load status of the core cluster.
  • Step S3 according to the load status of the core cluster, the core cluster is regulated.
  • the regulating process includes one of the following regulating methods: regulating the number of second processing cores currently capable of operating in the core cluster; operating voltage and operating frequency of the second processing cores currently capable of operating in the core cluster Perform regulation; insert a blank frame into the buffer corresponding to the core cluster.
  • the load status of the core cluster can be obtained in real time, and the core cluster can be regulated and processed in real time, so that the core cluster can process tasks flexibly, improve the efficiency of task processing, and reduce the power consumption of the many-core system.
  • the core control method for the many-core system can control and manage each core cluster of the many-core system flexibly by detecting the load status of each core cluster, and effectively control and manage each core cluster. Improve the efficiency of each core cluster to perform tasks, and at the same time improve the flexibility of many-core systems for task processing.
  • Fig. 2 is a composition block diagram of a many-core system provided by an embodiment of the present disclosure.
  • the many-core system includes a plurality of processing cores, and the plurality of processing cores include a first processing core and a second processing core, part or all of the second processing cores are pre-divided into at least one core cluster, each core cluster has a main processing core, and the main processing core is the core A pre-designated second processing core of the at least one second processing core of the cluster.
  • the first processing core can process the tasks of the many-core system, and can also perform task allocation and management of the many-core system; and the main processing core of each core cluster can process the tasks of the core cluster where it is located, and can also perform tasks within the cluster. Assignment and management.
  • the core control method of the embodiment of the present disclosure can be applied to the main processing core of any core cluster in the many-core system, that is, the core control method of the embodiment of the present disclosure is implemented based on the main processing core of any core cluster, and the main processing core of any core cluster
  • the processing core can control and manage the second processing core of the core cluster where it is located through the core control method of the embodiment of the present disclosure.
  • the core control method of the embodiment of the present disclosure can also be applied to the first processing core of the many-core system, that is, the core control method of the embodiment of the present disclosure is implemented based on the first processing core of the many-core system, and the first processing core can be implemented through the present disclosure
  • the core control method of the example controls and manages all core clusters of the many-core system.
  • each core cluster is correspondingly provided with a register, and the register is used to cache the task data of the task to be processed by the corresponding core cluster, so the memory status of the register can represent the load condition of the core cluster.
  • the buffer is a FIFO (First Input First Output, first-in-first-out) buffer, and this disclosure does not limit the specific type of the buffer.
  • FIG. 3 is a flowchart of a core control method for a many-core system provided by an embodiment of the present disclosure. As shown in FIG. 3 , step S1 may further include steps S11a to S13a.
  • Step S11a detecting the real-time memory space utilization rate of the buffer corresponding to the core cluster.
  • the real-time memory space usage ratio refers to the ratio of the real-time used memory space size to the total memory space size.
  • Step S12a detecting a comparison result between the real-time memory space usage rate of the buffer corresponding to the core cluster and the first preset threshold.
  • the first preset threshold can be set according to actual needs.
  • the first preset threshold can be set to a value greater than or equal to 60% but less than 100%, for example, can be set to 70%, which is not limited in the present disclosure.
  • Step S13a record the duration of the real-time memory space usage continuously greater than or equal to the first preset threshold, and record it as the first duration, and the load data corresponding to the core cluster includes the first duration.
  • the real-time memory space usage rate of the buffer changes with time. Therefore, the time period during which the real-time memory space usage rate is continuously greater than or equal to the first preset threshold means that the real-time memory space usage rate is continuously greater than or equal to
  • the duration of the state of the first preset threshold can represent the current load state of the buffer, that is, represent the current load state of the core cluster.
  • step S2 may further include step S21a and step S22a.
  • Step S21a judging whether the first duration is greater than or equal to the first preset duration, if yes, execute step S22a, if not, do not perform further processing.
  • Step S22a when the first duration is greater than or equal to the first preset duration, determine that the load state of the core cluster is a busy state.
  • the duration indicates that the core cluster is in an overloaded state, that is, a busy state. If the duration (the first duration) is less than the first preset duration, it indicates that the core cluster is not in a busy state, so no further processing may be performed.
  • the first preset duration can be set according to the actual situation, for example, it can be set to 15 minutes, half an hour or 1 hour, which is not limited in the present disclosure. In this way, the load state of the core cluster can be determined through the real-time memory space usage rate, and the efficiency of load state judgment can be improved.
  • FIG. 4 is a flow chart of a core control method for a many-core system provided by an embodiment of the present disclosure. As shown in FIG. 4 , step S1 may further include steps S11b to S13b.
  • Step S11b detecting the real-time memory space usage rate of the buffer corresponding to the core cluster.
  • Step S12b detecting a comparison result between the real-time memory space usage rate of the buffer corresponding to the core cluster and a second preset threshold.
  • the second preset threshold is greater than 0 and less than the first preset threshold, and the second preset threshold can be set according to actual needs.
  • the second preset threshold can be set to a value less than or equal to 40%, for example, it can be set 10% or 5%, which is not limited in the present disclosure.
  • Step S13b record the time duration during which the real-time memory space usage rate is continuously less than or equal to the second preset threshold, and record it as the second duration, and the load data corresponding to the core cluster includes the second duration.
  • the real-time memory space usage rate of the buffer changes with time. Therefore, the time period during which the real-time memory space usage rate is continuously less than or equal to the second preset threshold means that the real-time memory space usage rate is continuously less than or equal to the second preset threshold.
  • the duration of the state of the two preset thresholds can represent the current load state of the buffer, that is, represent the current load state of the core cluster.
  • step S2 may further include step S21b and step S22b.
  • Step S21b judging whether the second duration is greater than or equal to the second preset duration, if yes, execute step S22b, if not, do no further processing.
  • Step S22b when the second duration is greater than or equal to the second preset duration, determine that the load state of the core cluster is a low load state.
  • the duration indicates that the core cluster is in a state of excess resources, that is, a state of low load. If the duration (the second duration) is less than the second preset duration, it indicates that the core cluster is not in a low-load state, so no further processing may be performed.
  • the second preset duration can be set according to the actual situation, and the second preset duration can be equal to the first preset duration, for example, can be set to 15 minutes, half an hour or 1 hour, which is not limited in the present disclosure. In this way, the load state of the core cluster can be determined through the real-time memory space usage rate, and the efficiency of load state judgment can be improved.
  • the real-time memory space usage rate of the buffer corresponding to the core cluster is detected, the load data of the core cluster includes the real-time memory space usage rate of the buffer corresponding to the core cluster, if the real-time memory space of the buffer The utilization rate is continuously in the state greater than or equal to the first preset threshold value, and the duration (first duration) is greater than or equal to the first preset duration, then it is determined that the load state of the core cluster is a busy state; if the real-time The memory space utilization rate is continuously in the state less than or equal to the second preset threshold, and the duration (second duration) is greater than or equal to the second preset duration, then it is determined that the load state of the core cluster is a low load state; if the cache The real-time memory space usage rate of the device is continuously between the second preset threshold and the first preset threshold, or the duration of being in the state greater than or equal to the first preset threshold is shorter than the first preset duration, and is continuously at the second preset threshold.
  • the core cluster is neither busy nor idle, and the load state of the core cluster is an intermediate state, which is between a low load state and a busy state. status, so no further processing is possible.
  • the load data of the core cluster may be acquired by detecting the growth rate of memory space usage of the buffer corresponding to the core cluster.
  • step S1 may further include: acquiring the memory space usage growth rate of the buffer corresponding to the core cluster, and the load data corresponding to the core cluster includes the memory space usage growth rate of the corresponding buffer.
  • the growth rate of memory space usage refers to the growth rate of the memory space usage rate of the buffer within a preset time period (such as 5 minutes, 10 minutes, or 15 minutes), that is, the memory space usage growth rate refers to the current time.
  • the time period from the historical time to the current time is a preset time period.
  • step S2 in the case of acquiring the load data of the core cluster by detecting the memory space usage growth rate of the buffer corresponding to the core cluster, step S2 may further include.
  • the first preset speed-up value is a positive value, which can be set according to actual needs.
  • the first preset speed-up value can be a value between 60% and 90%, for example, it can be set to 70%. There is no limit to this publicly.
  • the growth rate of memory space usage is greater than or equal to the first preset growth rate value
  • it is determined that the load state of the core cluster is a busy state and jump to step S3. That is to say, if the memory space usage growth rate of the buffer corresponding to the core cluster is greater than or equal to the first preset growth rate value, it indicates that the buffer is in an overloaded state, that is, it indicates that the core cluster is in an overloaded state, that is busy state. In this way, the load status can be determined through the memory space usage growth rate, thereby improving the efficiency of load status judgment.
  • the memory space usage growth rate of the buffer corresponding to the core cluster is less than the first preset growth rate value, it indicates that the buffer is not in an overload state, that is, it indicates that the core cluster is not in an overload state , that is, it is not in a busy state, so further processing may not be performed, or it may be further judged whether the core cluster is in a low-load state.
  • step S2 may further include:
  • the preset usage rate can be set according to actual needs, for example, it can be set to 10%, 20% or 30%, which is not limited in the present disclosure.
  • the second preset speed-up value is a negative value greater than minus 1 and less than 0, and the specific value of the second preset speed-up value can be set according to actual needs.
  • the second preset speed-up value can be negative A value between 90% and minus 50%, for example, can be set to minus 60%.
  • the real-time memory space usage rate of the buffer corresponding to the core cluster is less than or equal to the preset usage rate, it is further judged whether the memory space usage growth rate is less than or equal to the second preset growth rate value, thereby determining the core cluster load status.
  • step S3 In the case that the memory space usage growth rate is less than or equal to the second preset growth rate value, it is determined that the load state of the core cluster is a low load state, and jump to step S3.
  • the real-time memory space utilization rate of the buffer corresponding to the core cluster is less than or equal to the preset utilization rate, and the memory space usage growth rate of the register corresponding to the core cluster is less than or equal to the second preset growth rate value, it indicates that the core
  • the real-time memory space usage rate of the buffer corresponding to the cluster is small, and the memory space usage of the buffer has a relatively large negative growth, that is, it is in a state of excess resources, which means that the core cluster is in a state of excess resources, that is, low load state.
  • the load status can be determined jointly by the memory space usage growth rate and the real-time memory space usage rate, thereby improving the accuracy of load status judgment.
  • the memory space usage growth rate of the buffer corresponding to the core cluster is less than the first preset growth rate value and greater than the second preset growth rate value, it indicates that the core cluster is neither busy nor idle , the load state of the core cluster is an intermediate state, and the intermediate state is a state between the low load state and the busy state, so further processing may not be performed.
  • the load data of the core cluster can also be acquired by detecting the task processing status of the core cluster.
  • step S1 may further include: detecting in real time the task processing time required by the core cluster to process the task, and the load data corresponding to the core cluster includes the task processing time. It can be understood that the task processing time refers to the time spent by the core cluster to process the task.
  • step S2 may further include:
  • the first preset processing duration may be set according to actual needs, which is not limited in the present disclosure.
  • step S3 In the case that the task processing duration required by the core cluster to process the task is greater than or equal to the first preset processing duration, it is determined that the load status of the core cluster is a busy state, and jump to step S3.
  • the task processing duration corresponding to the core cluster is greater than or equal to the first preset processing duration, it indicates that the core cluster spends a long time processing the task, so it can be determined that the core cluster is in an overload state, that is, a busy state. In this way, the load status can be determined by the task processing duration, thereby improving the efficiency of load status judgment.
  • the task processing duration corresponding to the core cluster is less than the first preset processing duration, it indicates that the core cluster is not in an overload state, that is, it indicates that the core cluster is not in an overload state, that is, it is not in a busy state , so no further processing may be performed, or it may be further judged whether the core cluster is in a low-load state.
  • step S2 may further include:
  • the second preset processing duration is shorter than the first preset processing duration, and the second preset processing duration can be set according to actual needs, which is not limited in the present disclosure.
  • step S3 determines that the load state of the core cluster is a low load state, and jump to step S3.
  • the task processing duration corresponding to the core cluster is less than or equal to the second preset processing duration, it indicates that the core cluster takes a relatively short time to process the task, so it can be determined that the core cluster is in a state of excess resources, that is, a low load state. In this way, the load status can be determined by the task processing duration, thereby improving the efficiency of load status judgment.
  • the task processing duration corresponding to the core cluster is greater than the second preset processing duration and less than the first preset processing duration, it indicates that the core cluster is neither busy nor idle, and the load status of the core cluster is an intermediate state, which is a state between the low-load state and the busy state, and therefore may not be further processed.
  • the many-core system includes a plurality of core clusters, and the multiple core clusters perform task processing based on a synchronization period, and the synchronization period is the maximum task processing duration among the task processing durations required by each core cluster to process the task.
  • the current task is the face recognition task of the video to be synthesized.
  • the face recognition task includes multiple subtasks.
  • the multiple subtasks are video stream decoding, face detection, face feature recognition, feature extraction, and feature matching.
  • the core clusters are responsible for their corresponding subtasks.
  • the multiple subtasks constitute a task pipeline, that is, the results of the corresponding subtasks processed by the previous core cluster need to be sent to the next core cluster for processing.
  • multiple core clusters When performing task pipeline processing, multiple core clusters There is a unified synchronization cycle, which is the maximum task processing time among the task processing time required by each core cluster to process the corresponding subtasks. After the synchronization period ends, the multiple core clusters can process the next task, such as voice recognition and video synthesis of the video to be synthesized.
  • step S2 when it is detected that the load data of each core cluster includes the task processing duration of each core cluster, step S2 may further include:
  • the task processing duration corresponding to the core cluster within the preset detection time period is counted as the frequency of the synchronization cycle.
  • the preset detection time period may be any preset time period, and in this step, within the preset detection time period, the task processing duration corresponding to the core cluster is counted as the number of synchronization cycles, that is, the frequency.
  • the first preset number of times may be set according to actual needs, which is not limited in the present disclosure.
  • the task processing duration corresponding to the core cluster is greater than or equal to the first preset number of times as the frequency of the synchronization cycle, it indicates that the task processing duration of the core cluster is often in the maximum state among all core clusters, so it can be determined that the The core cluster is overloaded, i.e. busy.
  • the task processing duration corresponding to the core cluster is less than the first preset number of times as the frequency of the synchronization cycle, it indicates that the core cluster is not in an overloaded state, that is, not in a busy state, so further processing may not be performed.
  • the load status can be determined by using the task processing duration as the frequency of the synchronization cycle, thereby improving the accuracy of load status judgment.
  • step S2 when it is detected that the load data of each core cluster includes the task processing duration of each core cluster, step S2 may further include:
  • the task processing duration corresponding to the core cluster within the preset detection time period is counted as the frequency of the synchronization cycle.
  • the statistics method of the frequency will not be repeated here.
  • the task processing duration corresponding to the core cluster is the frequency of the synchronization cycle and the ratio of the number of synchronization cycles in the preset detection time period. It can be understood that the number of synchronization cycles in the preset detection time period is the number of tasks processed by the multiple core clusters in the preset detection time period.
  • the ratio is greater than or equal to the first preset ratio, and if so, the next step is executed; otherwise, no further processing is performed.
  • the first preset ratio can be set according to actual needs, which is not limited in the present disclosure.
  • the ratio is greater than or equal to the first preset ratio, it indicates that the task processing time of the core cluster is always at the maximum among all core clusters, so it can be determined that the core cluster is in an overload state, that is, a busy state. If the ratio is smaller than the first preset ratio, it indicates that the core cluster is not in an overloaded state, that is, not in a busy state, and therefore no further processing may be performed.
  • the load status can be determined by using the task processing time as the frequency of the synchronization cycle, thereby improving the accuracy of load status judgment.
  • step S3 may further include: increasing the number of second processing cores in the core cluster that can currently perform jobs.
  • Fig. 5 is a flow chart of the regulation and control process of the core control method of the embodiment of the present disclosure.
  • the core cluster is The step S3 of the control processing may further include: step S31a to step S33a.
  • Step S31a if the load state of the core cluster is busy, determine whether the core cluster has adjustable voltage domain and frequency domain, if yes, execute step S32a, otherwise execute step S33a.
  • the load state of the core cluster is busy, it is checked whether there are any second processing cores corresponding to the same operating voltage and operating frequency among all the second processing cores that can currently perform operations in the core cluster and the operating voltage and operating frequency can be controlled. If there are a plurality of second processing cores that can be adjusted, it is determined that the core cluster has adjustable voltage domains and frequency domains, otherwise it is determined that the core cluster does not have adjustable voltage domains and frequency domains, wherein the core cluster has adjustable voltage domains and frequency domains.
  • the adjustable voltage domain means that multiple second processing cores of the core cluster correspond to an operating voltage and the operating voltage is adjustable, and in the same adjustable voltage domain, all corresponding second processing cores share the same operating voltage setting;
  • the core cluster has an adjustable frequency domain, which means that multiple second processing cores of the core cluster correspond to one operating frequency and the operating frequency is adjustable. In the same adjustable frequency domain, all corresponding second processing cores share the same An operating frequency setting.
  • the core cluster has a voltage domain, and further, when the operating voltage is adjustable, it means that the voltage domain is an adjustable voltage domain, Correspondingly, the working voltage has a linear relationship with the working frequency, so the core cluster has an adjustable frequency domain.
  • Step S32a among the second processing cores of the core cluster that can currently perform operations, the operating voltage and operating frequency of the second processing core corresponding to the adjustable voltage domain and frequency domain are increased, and the process ends.
  • the operating voltage and operating frequency of some or all of the second processing cores corresponding to the adjustable voltage domain and frequency domain in the core cluster can be increased, thereby improving
  • the operating computing efficiency of the part or all of the second processing cores is used to improve the efficiency of processing tasks of the part or all of the second processing cores, thereby improving the overall task processing efficiency of the core cluster.
  • the load state of the core cluster is a busy state according to the comparison result between the first duration and the first preset duration, according to the corresponding relationship between the preset duration in the busy state and the voltage adjustment range,
  • the voltage adjustment range corresponding to the first duration is determined
  • the frequency adjustment range corresponding to the first duration is determined according to the preset corresponding relationship between the duration in the busy state and the frequency adjustment range.
  • the operating voltage of the second processing core corresponding to the adjustable voltage domain and frequency domain is adjusted to a corresponding voltage, so that the second processing core can adjust the and, according to the frequency adjustment range corresponding to the first duration, adjust the working frequency of the second processing core corresponding to the adjustable voltage domain and frequency domain to the corresponding frequency, so that the first Two processing cores run based on the adjusted operating frequency.
  • the corresponding relationship between the duration and the voltage adjustment range, and the corresponding relationship between the duration and the frequency adjustment range can be set according to actual needs. For example, assuming that the first preset time length is 10 minutes, the voltage adjustment range corresponding to the time length range of 10 minutes to 20 minutes can be set as 10%, and the voltage adjustment range corresponding to the time length range of 20 minutes to 40 minutes is 15 minutes. %, the voltage adjustment range corresponding to the duration range from 40 minutes to 50 minutes is 20%, and so on. Similarly, the corresponding relationship between the duration and the frequency adjustment range can be set, which will not be repeated here.
  • the first duration is 15 minutes
  • the first preset duration is 10 minutes
  • the voltage adjustment range corresponding to the first duration is found to be 10%
  • the working voltage of the second processing core corresponding to the adjustable voltage domain and frequency domain is increased by 10%.
  • the memory space usage growth rate in the preset busy state may be According to the corresponding relationship with the voltage adjustment range, determine the voltage adjustment range corresponding to the memory space usage growth rate corresponding to the core cluster, and determine the corresponding relationship between the memory space usage growth rate and the frequency adjustment range in the preset busy state.
  • the working voltage of the second processing core corresponding to the adjustable voltage domain and frequency domain is adjusted to a corresponding voltage, so that the The second processing core runs based on the adjusted operating voltage, and according to the frequency adjustment range corresponding to the memory space usage growth rate corresponding to the core cluster, the work of the second processing core corresponding to the adjustable voltage domain and frequency domain The frequency is increased to a corresponding frequency, so that the second processing core operates based on the adjusted working frequency.
  • the corresponding relationship between the growth rate of memory space usage and the voltage adjustment range, and the corresponding relationship between the growth rate of memory space usage and the frequency adjustment range can be set according to actual needs.
  • the description of the corresponding relationship with the voltage adjustment range and the corresponding relationship between the duration and the frequency adjustment range will not be repeated here.
  • the task processing duration in the preset busy state determines the voltage adjustment range corresponding to the task processing duration corresponding to the core cluster, and determine the corresponding task of the core cluster according to the preset corresponding relationship between task processing duration and frequency adjustment range in the busy state The frequency adjustment range corresponding to the processing duration.
  • the working voltage of the second processing core corresponding to the adjustable voltage domain and frequency domain is adjusted to the corresponding voltage, so that the second The processing core operates based on the adjusted operating voltage, and, according to the frequency adjustment range corresponding to the task processing duration corresponding to the core cluster, the operating frequency of the second processing core corresponding to the adjustable voltage domain and frequency domain is increased to corresponding frequency, so that the second processing core runs based on the adjusted working frequency.
  • the corresponding relationship between the task processing time and the voltage adjustment range, and the corresponding relationship between the task processing time and the frequency adjustment range can be set according to actual needs.
  • the description of the corresponding relationship between the duration and the frequency adjustment range will not be repeated here.
  • the amplitude can be adjusted according to the preset frequency and voltage in the busy state Determine the voltage adjustment range corresponding to the above frequency corresponding to the core cluster, and determine the frequency adjustment corresponding to the above frequency corresponding to the core cluster according to the preset corresponding relationship between the frequency in the busy state and the frequency adjustment range amplitude.
  • the operating voltage of the second processing core corresponding to the adjustable voltage domain and frequency domain is adjusted to the corresponding voltage, so that the second processing The core operates based on the adjusted operating voltage, and, according to the frequency adjustment range corresponding to the frequency corresponding to the core cluster, the operating frequency of the second processing core corresponding to the adjustable voltage domain and frequency domain is increased to the corresponding frequency, so that the second processing core operates based on the adjusted operating frequency.
  • the corresponding relationship between the frequency and the voltage adjustment range, and the corresponding relationship between the frequency and the frequency adjustment range can be set according to actual needs.
  • the description of the corresponding relationship between the duration and the frequency adjustment range will not be repeated here.
  • Step S33a increase the number of second processing cores in the core cluster that can currently perform jobs, and end the process.
  • the overall task processing efficiency of the core cluster can be improved by increasing the number of second processing cores in the core cluster that can currently perform jobs.
  • the number of second processing cores that need to be increased can be determined according to the busyness of the core cluster, and the busyness of the core cluster can be determined by, for example, the above-mentioned first duration, the above-mentioned memory space Use the growth rate, the above-mentioned task processing time, or the above-mentioned frequency representation.
  • the corresponding relationship between the duration of the busy state and the number of additional cores required can be preset, and when the load state of the core cluster is determined to be busy according to the comparison result of the first duration and the first preset duration, you can According to the corresponding relationship between the preset duration in the busy state and the required number of additional cores, determine the required number of additional cores corresponding to the first duration, so as to increase the corresponding number of second processes that can currently perform jobs in the core cluster core.
  • the step of increasing the number of second processing cores that can currently perform operations in the core cluster may further include: adding one or more idle second processing cores outside the core cluster in the many-core system to into the core cluster as the second processing core currently available for operations in the core cluster; and/or,
  • One or more second processing cores in the closed state in the core cluster are awakened to serve as the second processing cores in the core cluster that can currently perform jobs.
  • each second processing core there is a controller in each second processing core, and the controller is used to control the second processing core to shut down or wake up (turn on) the second processing core, by sending a wake-up instruction to the controller of the second processing core , the second processing core can be woken up, and the second processing core can be shut down by sending a shutdown command to the controller of the second processing core.
  • step S3 may further include: reducing the number of second processing cores in the core cluster that can currently perform jobs.
  • Fig. 6 is a flowchart of the regulation and control process of the core control method of the embodiment of the present disclosure.
  • the core cluster Step S3 of performing regulation processing may further include: step S31b to step S33b.
  • Step S31b if the load state of the core cluster is low load state, determine whether the core cluster has adjustable voltage domain and frequency domain, if yes, execute step S32b, otherwise execute step S33b. For example, in the case that the load state of the core cluster is a low load state, it is checked whether all the second processing cores in the core cluster that can currently perform operations have the same operating voltage and operating frequency and the voltage and frequency are adjustable. If there are a plurality of second processing cores, it is determined that the core cluster has an adjustable voltage domain and a frequency domain; otherwise, it is determined that the core cluster does not have an adjustable voltage domain and a frequency domain.
  • step S32b among the second processing cores in the core cluster that can currently perform operations, the operating voltage and/or operating frequency of the second processing cores corresponding to the adjustable voltage domain and frequency domain are lowered, and the process ends.
  • the operating voltage and operating frequency of some or all of the second processing cores corresponding to the adjustable voltage domain and frequency domain in the core cluster can be lowered, thereby Reduce the operational computing efficiency of the part or all of the second processing cores, so as to effectively save the power consumption of the core cluster, reduce the power consumption of the many-core system, and save resource utilization.
  • the load state of the core cluster when the load state of the core cluster is determined to be a low-load state according to the comparison result between the above-mentioned second duration and the second preset duration, it may be based on the correspondence between the preset duration in the low-load state and the voltage adjustment range
  • the voltage adjustment range corresponding to the second time length is determined, and the frequency adjustment range corresponding to the second time length is determined according to the preset corresponding relationship between the time length and the frequency adjustment range under the low load state.
  • the operating voltage of the second processing core corresponding to the adjustable voltage domain and frequency domain is lowered to the corresponding voltage, so that the second processing core can adjust the The last working voltage runs, and, according to the frequency adjustment range corresponding to the second duration, the working frequency of the second processing core corresponding to the adjustable voltage domain and frequency domain is adjusted down to the corresponding frequency, so that the first Two processing cores run based on the adjusted operating frequency.
  • the corresponding relationship between the duration and the voltage adjustment range, and the corresponding relationship between the duration and the frequency adjustment range can be set according to actual needs. For example, assuming that the second preset duration is 10 minutes, the voltage adjustment range corresponding to the duration range from 10 minutes to 20 minutes can be set to be 10%, and the voltage adjustment range corresponding to the duration range from 20 minutes to 40 minutes can be set to 15 minutes. %, the voltage adjustment range corresponding to the duration range from 40 minutes to 50 minutes is 20%, and so on. Similarly, you can set the corresponding relationship between the duration of the low-load state and the frequency adjustment range, which will not be repeated here.
  • the second duration is 15 minutes
  • the second preset duration is 10 minutes
  • the voltage adjustment range corresponding to the second duration is queried according to the preset correspondence between the duration in the low-load state and the voltage adjustment range If it is 10%, then the working voltage of the second processing core with voltage domain and frequency domain will be lowered by 10%.
  • the memory space usage in the preset low-load state can be The corresponding relationship between the growth rate and the voltage adjustment range determines the voltage adjustment range corresponding to the memory space usage growth rate corresponding to the core cluster, and according to the corresponding relationship between the memory space usage growth rate and the frequency adjustment range under the preset low load state , to determine the frequency adjustment range corresponding to the memory space usage growth rate corresponding to the core cluster.
  • the working voltage of the second processing core corresponding to the adjustable voltage domain and frequency domain is adjusted down to the corresponding voltage, so that the first The second processing core runs based on the adjusted operating voltage, and adjusts the operating frequency of the second processing core corresponding to the adjustable voltage domain and frequency domain according to the frequency adjustment range corresponding to the memory space usage growth rate corresponding to the core cluster. down to a corresponding frequency so that the second processing core operates based on the adjusted operating frequency.
  • the corresponding relationship between the memory space usage growth rate and the voltage adjustment range, as well as the memory space usage growth rate and the frequency adjustment range can be set according to actual needs.
  • the description of the corresponding relationship between the duration and the voltage adjustment range and the corresponding relationship between the duration and the frequency adjustment range will not be repeated here.
  • the task processing in the preset low-load state can be The corresponding relationship between the duration and the voltage adjustment range, determine the voltage adjustment range corresponding to the task processing time corresponding to the core cluster, and determine the corresponding relationship between the task processing time and the frequency adjustment range under the preset low load state.
  • the frequency adjustment range corresponding to the corresponding task processing time can be The corresponding relationship between the duration and the voltage adjustment range.
  • the working voltage of the second processing core corresponding to the adjustable voltage domain and frequency domain is adjusted down to the corresponding voltage, so that the second The processing core operates based on the adjusted operating voltage, and, according to the frequency adjustment range corresponding to the task processing duration corresponding to the core cluster, the operating frequency of the second processing core corresponding to the adjustable voltage domain and frequency domain is lowered to corresponding frequency, so that the second processing core runs based on the adjusted working frequency.
  • the corresponding relationship between the task processing time and the voltage adjustment range, and the corresponding relationship between the task processing time and the frequency adjustment range can be set according to actual needs.
  • the description of the corresponding relationship between the adjustment range and the corresponding relationship between the duration and the frequency adjustment range will not be repeated here.
  • Step S33b reducing the number of second processing cores in the core cluster that can currently perform jobs, and ending the process.
  • the power consumption of the core cluster can be saved and the power consumption of the many-core system can be reduced by reducing the number of second processing cores that can currently perform operations in the core cluster. , save resource utilization.
  • the number of second processing cores that can currently perform jobs that need to be reduced can be determined according to the low load level of the core cluster, and the low load level of the core cluster can be determined by, for example, the above-mentioned second duration, the above-mentioned Task processing duration or the above-mentioned frequency representation.
  • the corresponding relationship between the duration in the low-load state and the number of cores required to be reduced can be preset, and when the load state of the core cluster is determined to be the low-load state according to the comparison result between the second duration and the second preset duration , according to the preset corresponding relationship between the duration of the low-load state and the number of cores required to be reduced, the number of cores required to be reduced corresponding to the second duration can be determined, so as to reduce the corresponding number of currently available jobs in the core cluster Second processing core.
  • the step of reducing the number of second processing cores currently capable of operating in the core cluster may further include: removing at least one second processing core currently capable of operating in the core cluster from the core cluster ;
  • At least one second processing core that is currently capable of operating in the core cluster is controlled to be in a closed state.
  • FIG. 7 is a schematic diagram of an application scenario of a many-core system according to an embodiment of the present disclosure.
  • the various tasks in the task pipeline of the face recognition business can include video stream decoding tasks, face detection tasks, face feature recognition tasks, face feature extraction tasks, and face feature recognition tasks that need to be executed in sequence. matching tasks, etc.
  • each core cluster of the many-core system can respectively process a task in the task pipeline, and each core cluster of the many-core system sequentially processes its corresponding tasks according to the operation sequence of the pipeline.
  • the task data after the core cluster processes the corresponding task can be sent to the buffer corresponding to the core cluster that is sequentially located after the core cluster on the task pipeline for caching, so that the sequence is located at
  • the core clusters behind the core cluster read as required, and start to run their corresponding tasks at the same time, wherein the cache memory can also cache data transferred by other external devices.
  • Fig. 8 is a composition block diagram of a core cluster of a many-core system according to an embodiment of the present disclosure. As shown in Fig. 8, each core cluster includes a plurality of sub-clusters, and each sub-cluster includes at least one second processing core that can currently perform operations , the plurality of sub-clusters are used to process tasks corresponding to the core clusters in parallel. For example, the task corresponding to the core cluster is face recognition. After obtaining multiple frames of image data, each sub-cluster in the multiple sub-clusters of the core cluster can be responsible for face recognition based on one or more frames of image data. Suppose there are three sub-clusters There are three frames of images in a cluster, and the three sub-clusters can respectively process one frame of images.
  • the core control method may further include: Step S4a-Step S6a.
  • Step S4a according to the newly added second processing core in the core cluster that can currently perform operations, build a new sub-cluster of the core cluster and obtain the configuration information of the new sub-cluster.
  • FIG. 9 is a block diagram of a core cluster after a new sub-cluster is established according to an embodiment of the present disclosure.
  • one or more second processing cores that can currently perform operations can be newly added, As a new subcluster of the core cluster, and obtain the configuration information of the new subcluster.
  • the new sub-cluster can process tasks corresponding to the core cluster in parallel with other sub-clusters
  • the new sub-cluster includes one or more second processing cores that can currently perform operations
  • the configuration information includes but is not limited to the new The number of second processing cores in the sub-cluster and address information of each second processing core.
  • Step S5a sending the configuration information of the new subcluster to the target processing core in the predecessor core cluster of the core cluster, so that the target processing core in the predecessor core cluster can establish The input for this new subcluster is shunted.
  • step S5a the main processing core of the core cluster sends the configuration information of the new sub-cluster to the target processing core in the predecessor core cluster of the core cluster.
  • the previous core cluster is the previous core cluster of the core cluster on the task pipeline
  • the target processing core in the previous core cluster is used to establish the input of the new sub-cluster according to the configuration information of the new sub-cluster of the core cluster.
  • Split, the input split is the path for outputting data from the previous core cluster to the new sub-cluster.
  • the target processing core in the predecessor core cluster may be the main processing core in the predecessor core cluster, or the second processing core responsible for data output in the predecessor core cluster.
  • the predecessor core cluster may include a task scheduler, and the task scheduler may be configured in the second processing core in charge of data output in the predecessor core cluster, or may be configured in the main processing core of the predecessor core cluster. processing core.
  • the task scheduler maintains a previous task list, which is marked with the number of sub-clusters of the next core cluster on the task pipeline where the previous core cluster is located, the number of second processing cores included in each sub-cluster, and the number of sub-clusters of each sub-cluster. Cluster address and other information.
  • each subcluster of the core cluster marked in the previous task list is correspondingly set with a flag bit, and the value of the flag bit represents the state of the corresponding subcluster. For example, when the flag bit is a valid value, it means that the corresponding subcluster is currently Available, and when the flag bit is an invalid value, it means that the corresponding subcluster is not available.
  • the predecessor core cluster can allocate tasks to each subcluster of the core cluster in the predecessor task list according to the predecessor task list maintained by it.
  • the predecessor core cluster may update the predecessor task list maintained by it according to the update information transmitted by a core cluster located behind and adjacent to it on the task pipeline. For example, after a core cluster adds cores to form a new sub-cluster, the main processing core of the core cluster can send the configuration information of the new sub-cluster to its predecessor core cluster, so that the target processing core of the predecessor core cluster can configure the new sub-cluster The information is written into the predecessor task list, and the flag bit corresponding to the added new subcluster is set as a valid value.
  • Step S6a sending the configuration information of the new sub-cluster to the target processing core in the successor core cluster of the core cluster, so that the target processing core in the successor core cluster can establish the new sub-cluster according to the configuration information of the new sub-cluster of the core cluster.
  • step S6a the main processing core of the core cluster sends the configuration information of the new sub-cluster to the target processing core in the successor core cluster of the core cluster.
  • the successor core cluster is the last core cluster of the core cluster on the task pipeline
  • the target processing core in the successor core cluster is used to establish the output shunt of the new sub-cluster according to the configuration information of the new sub-cluster of the core cluster
  • the output split is a path for the new sub-cluster to output data to the successor core cluster.
  • the target processing core in the successor core cluster may be the main processing core of the successor core cluster, or the second processing core responsible for data output in the successor core cluster.
  • the target processing core of the successor core cluster can maintain a successor task list as required, and the successor task list is marked with the number of sub-clusters of the previous core cluster on the task pipeline where the successor core cluster is located, and the number of sub-clusters of each sub-cluster. Information such as the number of second processing cores and the address of each sub-cluster is included.
  • the successor core cluster can update the successor task list maintained by it according to the update information delivered by a core cluster located before and adjacent to it on the task pipeline. For example, after a core cluster adds cores to form a new subcluster, the main processing core of the core cluster can send the configuration information of the new subcluster to its successor core cluster, so that the target processing core of the successor core cluster can write the configuration information to into the successor task list.
  • the step of reducing the number of second processing cores in the core cluster that can currently perform operations may include: reducing the number of sub-clusters in the core cluster that can currently perform operations, or reducing any one of the core clusters. or the number of second processing cores in multiple subclusters. After reducing the number of sub-clusters that can currently perform operations in the core cluster or reducing the number of second processing cores in any one or more sub-clusters in the core cluster, the main processing core of the core cluster can be sent to the core cluster.
  • the predecessor core cluster and the successor core cluster send the update information of the core cluster, so that the predecessor core cluster updates the previous task list maintained by it, updates the corresponding flag bit of the sub-cluster, and deletes the corresponding input shunt, and the successor core cluster updates its maintenance list of successor tasks and delete the corresponding output stream.
  • the step of determining the load status of the core cluster may further include:
  • the load state of the core cluster is an idle state, and the idle state level is first level.
  • the idle state can be understood as an underload state or a zero load state, which belongs to a low load state under special circumstances.
  • the third preset duration is longer than the second preset duration, and the third preset duration can be set according to actual needs, which is not limited in the present disclosure.
  • the step of determining the load status of the core cluster may further include: if the real-time memory space usage rate is 0 and continues to be If the second duration of 0 is greater than or equal to the third preset duration, it is determined that the load state of the core cluster is an idle state, and the idle state level is the second level.
  • the load state of the core cluster can be determined as the idle state and the level of the idle state, so as to perform corresponding regulation and processing on the core cluster, thereby reducing the power of the core cluster. consumption.
  • the step of regulating and processing the core cluster according to the load state of the core cluster includes: When the load state of the core cluster is the idle state of the first level, a blank frame is inserted into the buffer corresponding to the core cluster, wherein the blank frame can be a preset frame image, thereby maintaining the work of the core cluster State, to ensure that the core cluster spits out the processed data.
  • a corresponding gating clock can be set to control whether the corresponding second processing core works or not.
  • the clock gating is used to output clock signals to the multiple core clusters to drive the multiple core clusters to work or not to work based on the clock signals.
  • the many-core system includes multiple core clusters
  • the core control method is implemented by the first processing core, and the first processing core uniformly performs load detection and management on each core cluster.
  • the core control method also includes: suspending sending synchronization signals to the multiple core clusters, so that the multiple core clusters can suspend synchronous update, thereby saving resources of the many-core system, achieving power saving effects, and reducing the power consumption of the many-core system. power consumption.
  • the synchronization signal is used to control the multiple core clusters to perform task processing based on the synchronization cycle.
  • the same task correspondingly processed by multiple core clusters may be, for example, a face recognition task of a video to be synthesized.
  • a blank frame is inserted in the cache memory of each core cluster in the plurality of core clusters to maintain the working state of the plurality of core clusters, and wait for the plurality of core clusters to spit out all the processed data. After the data is input and there is no data output, the sending of synchronization signals to the multiple core clusters is suspended, or all gating clocks corresponding to the multiple core clusters are turned off at the same time.
  • the step of regulating and processing the core cluster according to the load state of the core cluster includes: In the case that the load state of the core cluster is the idle state of the second level, among the second processing cores currently available for operation in the core cluster, the adjustable voltage domain and frequency domain corresponding to the second processing core The operating voltage and operating frequency are lowered, or the number of second processing cores in the core cluster that can currently perform operations is reduced.
  • Fig. 10 is a block diagram of a core control device provided by an embodiment of the present disclosure.
  • an embodiment of the present disclosure provides a core control device 300, the core control device 300 is applied to a many-core system, the many-core system includes at least one core cluster, and each core cluster includes at least one second processing core,
  • the core control device 300 includes: a load data detection module 301 , a load state detection module 302 and a core control module 303 .
  • the load data detection module 301 is configured to detect the load of the corresponding core cluster, and obtain the load data corresponding to the core cluster; the load status detection module 302 is configured to determine the load data of the core cluster according to the load data corresponding to the core cluster. Load status; the core control module 303 is configured to perform control processing on the core cluster according to the load status of the core cluster; wherein, the control processing includes one of the following control methods: the second processing of the currently available jobs in the core cluster Regulate the number of cores; regulate the operating voltage and operating frequency of the second processing core that can currently perform operations in the core cluster; insert blank frames into the buffer corresponding to the core cluster.
  • each of the core clusters is correspondingly provided with a buffer, and the buffer is used for caching the task data of the task to be processed by the corresponding core cluster;
  • the load data detection module is configured to: detect the real-time memory space usage rate of the buffer corresponding to the core cluster; detect the difference between the real-time memory space usage rate of the buffer corresponding to the core cluster and the first preset threshold Comparison result; record the duration of the real-time memory space usage rate continuously greater than or equal to the first preset threshold, and record it as the first duration, the load data corresponding to the core cluster includes the first duration,
  • the load state detection module is used to: judge whether the first duration is greater than or equal to the first preset duration; if the first duration is greater than or equal to the first preset duration, determine whether the core cluster The load status of is busy.
  • the load data detection module is configured to: detect a comparison result between the real-time memory space usage rate of the buffer corresponding to the core cluster and a second preset threshold, and the second preset threshold is greater than 0 and less than the first preset threshold; record the duration of the real-time memory space usage that is continuously less than or equal to the second preset threshold, and record it as the second duration, and the load data corresponding to the core cluster includes the The second duration, wherein the load state detection module is used to: determine whether the second duration is greater than or equal to a second preset duration; if the second duration is greater than or equal to a second preset duration, It is determined that the load state of the core cluster is a low load state.
  • each of the core clusters is correspondingly provided with a buffer, and the buffer is used for caching the task data of the corresponding tasks to be processed by the core cluster; the load data detection module is used for : Obtain the memory space usage growth rate of the buffer corresponding to the core cluster, the load data corresponding to the core cluster includes the corresponding memory space usage growth rate of the buffer, wherein the load status detection module is used to : judging whether the memory space usage growth rate is greater than or equal to a first preset growth rate value; if the memory space usage growth rate is greater than or equal to the first preset growth rate value, determine the load status of the core cluster is busy.
  • the load data corresponding to the core cluster also includes the corresponding real-time memory space usage rate of the buffer, and the load status detection module is configured to: When the memory space usage rate is less than or equal to the preset usage rate, it is determined whether the growth rate of the memory space usage is less than or equal to a second preset growth rate value, and the second preset growth rate value is a negative value; If the growth rate of memory space usage is less than or equal to the second preset growth rate value, it is determined that the load state of the core cluster is a low load state.
  • the load data detection module is configured to: detect in real time the task processing duration required by the core cluster to process the task, the load data corresponding to the core cluster includes the task processing duration, wherein the load status
  • the detection module is used to: determine whether the task processing duration required by the core cluster processing task is greater than or equal to the first preset processing duration; the task processing duration required by the core cluster processing task is greater than or equal to the first preset processing duration In the case of , it is determined that the load state of the core cluster is a busy state.
  • the load state detection module is configured to: determine whether the task processing duration required by the core cluster to process the task is less than or equal to the second preset processing duration; If the duration is less than or equal to the second preset processing duration, it is determined that the load state of the core cluster is a low load state.
  • the many-core system includes a plurality of core clusters, and the multiple core clusters perform task processing based on a synchronization period, and the synchronization period is the maximum task processing time required for each core cluster to process the task.
  • the task processing duration; the load state detection module is used to: count the task processing duration corresponding to the core cluster as the frequency of the synchronization cycle within the preset detection time period; the task processing duration corresponding to the core cluster as the frequency of the synchronization cycle
  • the frequency of the synchronization cycle is greater than or equal to the first preset number of times, it is determined that the load state of the core cluster is a busy state.
  • the many-core system includes a plurality of core clusters, and the multiple core clusters perform task processing based on a synchronization period, and the synchronization period is the maximum task processing time required for each core cluster to process the task. task processing time;
  • the load state detection module is used to: count the task processing duration corresponding to the core cluster as the frequency of the synchronization cycle within the preset detection time period; calculate the task processing duration corresponding to the core cluster as the frequency of the synchronization cycle , and the ratio of the number of synchronization cycles within the preset detection time period; when the ratio is greater than or equal to the first preset ratio, it is determined that the load state of the core cluster is a busy state.
  • the core control module is configured to: determine whether the core cluster has an adjustable voltage domain and frequency domain when the load status of the core cluster is busy; In the case of an adjustable voltage domain and frequency domain, among the second processing cores that can currently perform operations in the core cluster, the operating voltage and operating frequency of the second processing core corresponding to the adjustable voltage domain and frequency domain are adjusted. High; if it is determined that the core cluster does not have an adjustable voltage domain and frequency domain, increase the number of second processing cores in the core cluster that can currently perform operations.
  • the core regulation module is configured to: determine whether the core cluster has an adjustable voltage domain and frequency domain when the load state of the core cluster is a low load state; In the case of an adjustable voltage domain and frequency domain, the operating voltage and operating frequency of the second processing core corresponding to the adjustable voltage domain and frequency domain among the second processing cores that can currently perform operations in the core cluster Turning down: reducing the number of second processing cores in the core cluster that can currently perform jobs in a case where it is determined that the core cluster does not have an adjustable voltage domain and frequency domain.
  • the core regulation module increases the number of second processing cores that can currently perform operations in the core cluster, including: adding one or more idle second processing cores outside the core cluster in the many-core system The core is added to the core cluster as the second processing core that can currently perform operations in the core cluster; and/or, one or more second processing cores in the closed state in the core cluster are awakened to As the second processing core currently available for jobs in this core cluster.
  • the core regulation module reduces the number of second processing cores that can currently perform operations in the core cluster, including: removing at least one second processing core that can currently perform operations in the core cluster from the a core cluster; and/or, controlling at least one second processing core that is currently capable of operating in the core cluster to be in a closed state.
  • each of the core clusters of the many-core system corresponds to a task in the processing task pipeline, and each of the core clusters includes a plurality of sub-clusters, and each of the sub-clusters includes a currently available job. At least one second processing core of the plurality of sub-clusters is used to process tasks corresponding to the core clusters in parallel; the device also includes:
  • the sub-cluster building module is used to form a new sub-cluster of the core cluster and obtain the configuration information of the new sub-cluster according to the newly added second processing core in the core cluster that can currently perform operations.
  • the new sub-cluster includes the newly added One or more second processing cores that can currently perform operations, the configuration information includes the number of second processing cores in the new sub-cluster and address information of each second processing core;
  • a first sending module configured to send the configuration information of the new sub-cluster to the target processing core in the predecessor core cluster of the core cluster;
  • a second sending module configured to send the configuration information of the new sub-cluster to a target processing core in a successor core cluster of the core cluster;
  • the previous core cluster is the previous core cluster of the core cluster on the task pipeline
  • the target processing core in the previous core cluster is used to establish The input split of the new sub-cluster, the input split is the path for the previous core cluster to output data to the new sub-cluster
  • the successor core cluster is the next core cluster of the core cluster on the task pipeline
  • the The target processing core in the successor core cluster is used to establish an output shunt of the new sub-cluster according to the configuration information of the new sub-cluster of the core cluster, and the output shunt is a path for the new sub-cluster to output data to the successor core cluster .
  • the load state detection module is configured to: if the real-time memory space usage rate is 0, and the second duration of 0 is greater than or equal to the second preset duration and less than the second duration Three preset durations, then determine that the load state of the core cluster is an idle state, and the idle state level is the first level; if the real-time memory space usage rate is 0, and the second duration of 0 is greater than or equal to For the third preset duration, it is determined that the load state of the core cluster is an idle state, and the idle state level is the second level,
  • the core control module is configured to: insert a blank frame into the buffer corresponding to the core cluster when the load state of the core cluster is the first-level idle state; When the state is the idle state of the second level, among the second processing cores that can currently perform operations in the core cluster, the operating voltage and operating frequency of the second processing core corresponding to the adjustable voltage domain and frequency domain are adjusted. Low, or, reduce the number of secondary processing cores in this core cluster that are currently available for jobs.
  • the core control module is further configured to: close the gates corresponding to the second processing cores in the core cluster A clock, the gated clock is used to output a clock signal to the corresponding second processing core in the core cluster to drive the corresponding second processing core to work or not to work based on the clock signal.
  • the many-core system includes a plurality of core clusters.
  • the core control device When there are multiple core clusters corresponding to processing the same task, and the load states of the multiple core clusters are all in the idle state of the first level, the core control device further includes: a signal stop sending module, configured to suspend sending synchronization signals to the multiple core clusters, and the synchronization signals are used for The plurality of core clusters are controlled to perform task processing based on a synchronous cycle.
  • the core control device 300 provided by the embodiment of the present disclosure is used to implement the above-mentioned core control method.
  • the core control device 300 is used to implement the above-mentioned core control method.
  • the description in the above-mentioned core control method please refer to the description in the above-mentioned core control method, which will not be repeated here.
  • An embodiment of the present disclosure also provides a processing core, where the processing core includes the above-mentioned core control device.
  • An embodiment of the present disclosure also provides a many-core system, which includes a plurality of processing cores, the plurality of processing cores include a first processing core and a plurality of second processing cores, and part or all of the plurality of second processing cores are second
  • the processing core is divided into at least one core cluster, each core cluster includes at least one second processing core, each core cluster has a main processing core, and the main processing core of the core cluster is a second processing core specified in the core cluster .
  • the first processing core includes the above-mentioned core control device, that is, the first processing core adopts the processing core including the above-mentioned core control device, and/or, at least part of the main processing cores of the core cluster include the above-mentioned The core control device, that is, at least part of the main processing cores of the core cluster adopts the processing core including the above-mentioned core control device.
  • Fig. 11 is a composition block diagram of an electronic device provided by an embodiment of the present disclosure.
  • an embodiment of the present disclosure provides an electronic device, the electronic device includes a plurality of processing cores 701 and an on-chip network 702, wherein the plurality of processing cores 701 are all connected to the on-chip network 702, and the on-chip network 702 is used to interact multiple One handles inter-core data and external data.
  • one or more processing cores 701 store one or more instructions, and the one or more processing cores 701 execute the one or more processing cores 701, so that the one or more processing cores 701 can execute the above core control method.
  • an embodiment of the present disclosure also provides a computer-readable medium on which a computer program is stored, wherein the computer program implements the above-mentioned core control method when executed by a processing core of a many-core system.
  • An embodiment of the present disclosure also provides a computer program product, which includes a computer program, and when the computer program is executed by a processing core of a many-core system, the above-mentioned core control method is implemented.
  • the functional modules/units in the system, and the device can be implemented as software, firmware, hardware, and an appropriate combination thereof.
  • the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be composed of several physical components. Components cooperate to execute.
  • Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit .
  • Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media).
  • computer storage media includes both volatile and nonvolatile media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. permanent, removable and non-removable media.
  • Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, tape, magnetic disk storage or other magnetic storage devices, or can Any other medium used to store desired information and which can be accessed by a computer.
  • communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Power Sources (AREA)

Abstract

Provided in the present disclosure are a core control method and apparatus for a many-core system, and a many-core system. The many-core system comprises at least one core cluster, wherein each core cluster comprises at least one second processing core. The core control method comprises: for any core cluster, performing load detection on the core cluster to acquire load data corresponding to the core cluster; determining a load state of the core cluster according to the load data corresponding to the core cluster; performing regulation processing on the core cluster according to the load state of the core cluster, wherein the regulation processing comprises one of the following regulation methods: performing regulation on the number of second processing cores, that can currently perform an operation, in the core cluster, and performing regulation on operating voltages and operating frequencies of the second processing cores, that can currently perform an operation, in the core cluster; and inserting a blank frame into a buffer corresponding to the core cluster.

Description

用于众核系统的核心控制方法及装置、众核系统Core control method and device for many-core system, many-core system 技术领域technical field
本公开涉及计算机技术领域,特别涉及一种用于众核系统的核心控制方法及装置、众核系统、电子设备、计算机可读介质及计算机程序产品。The present disclosure relates to the field of computer technology, and in particular to a core control method and device for many-core systems, many-core systems, electronic equipment, computer-readable media, and computer program products.
背景技术Background technique
随着人工智能技术的发展,对数据处理速度的需求日益增加,使得众核系统的应用越来越广泛。众核系统通常具有众多核心(也称为处理核心),核心是众核系统中能够独立调度、且拥有完整计算能力的最小计算单元,核心具有一定的存储、计算等资源。众核系统的核心可以分别独立运行程序指令,利用并行计算的能力,可以加快程序的运行速度,并可以提供多任务能力。With the development of artificial intelligence technology, the demand for data processing speed is increasing, making the application of many-core systems more and more extensive. A many-core system usually has many cores (also called processing cores). The core is the smallest computing unit in the many-core system that can be independently scheduled and has complete computing capabilities. The core has certain resources such as storage and computing. The cores of the many-core system can run program instructions independently, using the ability of parallel computing to speed up the running speed of the program and provide multi-tasking capabilities.
发明内容Contents of the invention
本公开提供一种用于众核系统的核心控制方法及装置、处理核心、众核系统、电子设备、计算机可读介质及计算机程序产品。The present disclosure provides a core control method and device for a many-core system, a processing core, a many-core system, electronic equipment, a computer readable medium, and a computer program product.
第一方面,本公开提供了一种用于众核系统的核心控制方法,所述众核系统包括至少一个核心簇,每个所述核心簇包括至少一个第二处理核心,所述核心控制方法包括:针对任意一个所述核心簇,对该核心簇进行负载检测,获取该核心簇对应的负载数据;根据该核心簇对应的负载数据,确定该核心簇的负载状态;根据该核心簇的负载状态,对该核心簇进行调控处理;其中,所述调控处理包括以下调控方式之一:对该核心簇中当前可进行作业的第二处理核心的数量进行调控;对该核心簇中当前可进行作业的第二处理核心的工作电压和工作频率进行调控;向该核心簇对应的缓存器中插入空白帧。In a first aspect, the present disclosure provides a core control method for a many-core system, the many-core system includes at least one core cluster, and each of the core clusters includes at least one second processing core, the core control method Including: for any one of the core clusters, performing load detection on the core cluster, and obtaining the load data corresponding to the core cluster; according to the load data corresponding to the core cluster, determining the load status of the core cluster; according to the load of the core cluster state, the core cluster is regulated and processed; wherein, the regulated process includes one of the following control methods: regulating the number of second processing cores that can currently perform operations in the core cluster; Regulate the operating voltage and operating frequency of the second processing core of the job; insert a blank frame into the buffer corresponding to the core cluster.
第二方面,本公开提供了一种核心控制装置,该核心控制装置应用于众核系统,所述众核系统包括至少一个核心簇,每个所述核心簇包括至少一个第二处理核心,所述核心控制装置包括:负载数据检测模块,被配置为对对应的核心簇进行负载检测,获取该核心簇对应的负载数据;负载状态检测模块,被配置为根据该核心簇对应的负载数据,确定该核心簇的负载状态;核心调控模块,被配置为根据该核心簇的负载状态,对该核心簇进行调控处理;其中,所述调控处理包括以下调控方式之一:对该核心簇中当前可进行作业的第二处理核心的数量进行调控;对该核心簇中当前可进行作业的第二处理核心的工作电压和工作频率进行调控;向该核心簇对应的缓存器中插入空白帧。In a second aspect, the present disclosure provides a core control device, the core control device is applied to a many-core system, the many-core system includes at least one core cluster, and each of the core clusters includes at least one second processing core, so The core control device includes: a load data detection module configured to perform load detection on a corresponding core cluster and obtain load data corresponding to the core cluster; a load state detection module configured to determine the load data corresponding to the core cluster according to the load data corresponding to the core cluster The load state of the core cluster; the core regulation module is configured to perform regulation processing on the core cluster according to the load state of the core cluster; wherein, the regulation processing includes one of the following regulation methods: the current available in the core cluster Regulate the number of second processing cores that can perform operations; regulate the operating voltage and frequency of the second processing cores that can currently perform operations in the core cluster; insert a blank frame into the buffer corresponding to the core cluster.
第三方面,本公开提供了一种众核系统,该众核系统包括多个处理核心,多个处理核心包括第一处理核心和多个第二处理核心,多个第二处理核心中的部分或全部第二处理核心被划分为至少一个核心簇,每个所述核心簇包括至少一个所述第二处理核心,每个所述核心簇具有一主处理核心,核心簇的主处理核心为该核心簇中指定的一个第二处理核心;其中,所述第一处理核心包括上述的核心控制装置,和/或,至少部分所述核心簇的主处理核心包括上述的核心控制装置。In a third aspect, the present disclosure provides a many-core system, the many-core system includes a plurality of processing cores, the plurality of processing cores include a first processing core and a plurality of second processing cores, and some of the plurality of second processing cores Or all the second processing cores are divided into at least one core cluster, each of the core clusters includes at least one of the second processing cores, each of the core clusters has a main processing core, and the main processing core of the core cluster is the A designated second processing core in a core cluster; wherein, the first processing core includes the above-mentioned core control device, and/or at least part of the main processing cores of the core cluster includes the above-mentioned core control device.
第四方面,本公开提供了一种电子设备,该电子设备包括:多个处理核心;以及,片上网络,被配置为交互所述多个处理核心间的数据和外部数据;其中,一个或多个所述处理核心中存储有一个或多个指令,一个或多个所述指令被一个或多个所述处理核心执行,以使一个或多个所述处理核心能够执行上述的核心控制方法。In a fourth aspect, the present disclosure provides an electronic device, which includes: a plurality of processing cores; and an on-chip network configured to exchange data between the plurality of processing cores and external data; wherein, one or more One or more instructions are stored in each of the processing cores, and the one or more instructions are executed by the one or more processing cores, so that the one or more processing cores can execute the above-mentioned core control method.
第五方面,本公开提供了一种计算机可读介质,其上存储有计算机程序,其中,所述计算机程序在被众核系统的处理核心执行时实现上述的核心控制方法。In a fifth aspect, the present disclosure provides a computer-readable medium on which a computer program is stored, wherein the computer program implements the above-mentioned core control method when executed by a processing core of a many-core system.
第六方面,本公开提供了一种计算机程序产品,其包括计算机程序,所述计算机程序在被众核系统的处理核心执行时实现上述的核心控制方法。In a sixth aspect, the present disclosure provides a computer program product, which includes a computer program, and when the computer program is executed by a processing core of a many-core system, the above-mentioned core control method is implemented.
应当理解,本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。It should be understood that what is described in this section is not intended to identify key or important features of the embodiments of the present disclosure, nor is it intended to limit the scope of the present disclosure. Other features of the present disclosure will be readily understood through the following description.
附图说明Description of drawings
附图用来提供对本公开的进一步理解,并且构成说明书的一部分,与本公开的实施例一起用于 解释本公开,并不构成对本公开的限制。通过参考附图对详细示例实施例进行描述,以上和其他特征和优点对本领域技术人员将变得更加显而易见,在附图中:The accompanying drawings are used to provide a further understanding of the present disclosure, and constitute a part of the specification, and are used together with the embodiments of the present disclosure to explain the present disclosure, and do not constitute a limitation to the present disclosure. The above and other features and advantages will become more apparent to those skilled in the art by describing detailed example embodiments with reference to the accompanying drawings, in which:
图1为本公开实施例提供的一种用于众核系统的核心控制方法的流程图;FIG. 1 is a flowchart of a core control method for a many-core system provided by an embodiment of the present disclosure;
图2为本公开实施例提供的一种众核系统的组成框图;FIG. 2 is a block diagram of a many-core system provided by an embodiment of the present disclosure;
图3为本公开实施例提供的一种用于众核系统的核心控制方法的流程图;FIG. 3 is a flowchart of a core control method for a many-core system provided by an embodiment of the present disclosure;
图4为本公开实施例提供的一种用于众核系统的核心控制方法的流程图;FIG. 4 is a flow chart of a core control method for a many-core system provided by an embodiment of the present disclosure;
图5为本公开实施例的核心控制方法的调控处理过程的流程图;FIG. 5 is a flow chart of the control process of the core control method of the embodiment of the present disclosure;
图6为本公开实施例的核心控制方法的调控处理过程的流程图;FIG. 6 is a flow chart of the control process of the core control method of the embodiment of the present disclosure;
图7为本公开实施例的一种众核系统的应用场景示意图;FIG. 7 is a schematic diagram of an application scenario of a many-core system according to an embodiment of the present disclosure;
图8为本公开实施例的一种众核系统的核心簇的组成框图;FIG. 8 is a block diagram of a core cluster of a many-core system according to an embodiment of the present disclosure;
图9为本公开实施例的组建新子簇后的核心簇的组成框图;FIG. 9 is a block diagram of a core cluster after a new sub-cluster is formed according to an embodiment of the present disclosure;
图10为本公开实施例提供的一种核心控制装置的组成框图;FIG. 10 is a block diagram of a core control device provided by an embodiment of the present disclosure;
图11为本公开实施例提供的一种电子设备的组成框图。Fig. 11 is a composition block diagram of an electronic device provided by an embodiment of the present disclosure.
具体实施方式Detailed ways
为使本领域的技术人员更好地理解本公开的技术方案,以下结合附图对本公开的示范性实施例做出说明,其中包括本公开实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本公开的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。In order for those skilled in the art to better understand the technical solution of the present disclosure, the exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and they should be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
在不冲突的情况下,本公开各实施例及实施例中的各特征可相互组合。如本文所使用的,术语“和/或”包括一个或多个相关列举条目的任何和所有组合。本文所使用的术语仅用于描述特定实施例,且不意欲限制本公开。如本文所使用的,单数形式“一个”和“该”也意欲包括复数形式,除非上下文另外清楚指出。还将理解的是,当本说明书中使用术语“包括”和/或“由……制成”时,指定存在所述特征、整体、步骤、操作、元件和/或组件,但不排除存在或添加一个或多个其它特征、整体、步骤、操作、元件、组件和/或其群组。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电性的连接,不管是直接的还是间接的。In the case of no conflict, various embodiments of the present disclosure and various features in the embodiments can be combined with each other. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items. The terminology used herein is for describing particular embodiments only and is not intended to limit the present disclosure. As used herein, the singular forms "a" and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It will also be understood that when the terms "comprising" and/or "consisting of" are used in this specification, the stated features, integers, steps, operations, elements and/or components are specified to be present but not excluded to be present or Add one or more other features, integers, steps, operations, elements, components and/or groups thereof. Words such as "connected" or "connected" are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect.
除非另外限定,否则本文所用的所有术语(包括技术和科学术语)的含义与本领域普通技术人员通常理解的含义相同。还将理解,诸如那些在常用字典中限定的那些术语应当被解释为具有与其在相关技术以及本公开的背景下的含义一致的含义,且将不解释为具有理想化或过度形式上的含义,除非本文明确如此限定。Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art. It will also be understood that terms such as those defined in commonly used dictionaries should be interpreted as having meanings consistent with their meanings in the context of the relevant art and the present disclosure, and will not be interpreted as having idealized or excessive formal meanings, Unless expressly so limited herein.
在相关技术中,众核系统的各核心存储输入设备的输入数据,运算器根据输入数据进行运算,并将运算结果存入存储器中,最后通知输出设备接收输出结果,输入设备、输出设备可以是外部设备,也可以是众核系统中的核心。In the related technology, each core of the many-core system stores the input data of the input device, and the arithmetic unit performs calculation according to the input data, stores the calculation result in the memory, and finally notifies the output device to receive the output result. The input device and the output device can be Peripherals can also be cores in many-core systems.
在本公开实施例中,众核系统包括至少一个核心簇,每个核心簇包括至少一个第二处理核,各核心簇分别用于执行相应的计算任务。在各核心簇执行相应任务的过程中,尤其是在各核心簇执行任务流水线的任务的场景中,对于各核心簇的任务执行效率具有较高的要求,因此如何有效提升核心簇执行任务的效率,成为在众核系统的核心簇场景下亟需解决的技术问题。In an embodiment of the present disclosure, the many-core system includes at least one core cluster, each core cluster includes at least one second processing core, and each core cluster is used to execute corresponding computing tasks. In the process of each core cluster executing corresponding tasks, especially in the scenario where each core cluster executes task pipeline tasks, there are high requirements for the task execution efficiency of each core cluster, so how to effectively improve the efficiency of core cluster execution tasks , has become an urgent technical problem to be solved in the core cluster scenario of the many-core system.
图1为本公开实施例提供的一种用于众核系统的核心控制方法的流程图。FIG. 1 is a flow chart of a core control method for a many-core system provided by an embodiment of the present disclosure.
参照图1,本公开实施例提供一种用于众核系统的核心控制方法,其中众核系统包括至少一个核心簇,每个核心簇包括至少一个第二处理核心,该方法可以由核心控制装置来执行,该装置可以通过软件和/或硬件的方式实现,该核心控制方法包括:Referring to FIG. 1 , an embodiment of the present disclosure provides a core control method for a many-core system, wherein the many-core system includes at least one core cluster, and each core cluster includes at least one second processing core, and the method can be implemented by a core control device To execute, the device can be implemented by means of software and/or hardware, and the core control method includes:
步骤S1、针对任意一个核心簇,对该核心簇进行负载检测,获取该核心簇对应的负载数据。Step S1 , for any core cluster, perform load detection on the core cluster, and obtain load data corresponding to the core cluster.
步骤S2、根据该核心簇对应的负载数据,确定该核心簇的负载状态。Step S2, according to the load data corresponding to the core cluster, determine the load status of the core cluster.
步骤S3、根据该核心簇的负载状态,对该核心簇进行调控处理。Step S3 , according to the load status of the core cluster, the core cluster is regulated.
其中,调控处理包括以下调控方式之一:对该核心簇中当前可进行作业的第二处理核心的数量进行调控;对该核心簇中当前可进行作业的第二处理核心的工作电压和工作频率进行调控;向该核心簇对应的缓存器中插入空白帧。Wherein, the regulating process includes one of the following regulating methods: regulating the number of second processing cores currently capable of operating in the core cluster; operating voltage and operating frequency of the second processing cores currently capable of operating in the core cluster Perform regulation; insert a blank frame into the buffer corresponding to the core cluster.
在本公开实施例中,可以通过实时获取核心簇的负载状态,对核心簇进行实时调控处理,使得核心簇能够灵活地处理任务,提高任务处理的效率,降低众核系统的功耗。In the embodiment of the present disclosure, the load status of the core cluster can be obtained in real time, and the core cluster can be regulated and processed in real time, so that the core cluster can process tasks flexibly, improve the efficiency of task processing, and reduce the power consumption of the many-core system.
本公开实施例所提供的用于众核系统的核心控制方法,通过检测各核心簇的负载状态,对各核心簇进行调控处理,从而能够灵活地控制和管理众核系统的各核心簇,有效提升各核心簇执行任务的效率,同时提高了众核系统进行任务处理的灵活性。The core control method for the many-core system provided by the embodiment of the present disclosure can control and manage each core cluster of the many-core system flexibly by detecting the load status of each core cluster, and effectively control and manage each core cluster. Improve the efficiency of each core cluster to perform tasks, and at the same time improve the flexibility of many-core systems for task processing.
图2为本公开实施例提供的一种众核系统的组成框图,在本公开实施例中,参见图2,众核系统包括多个处理核心,多个处理核心包括一第一处理核心和多个第二处理核心,在多个第二处理核心中部分或全部第二处理核心预先被划分为至少一个核心簇,在每个核心簇中均具有一主处理核心,该主处理核心为该核心簇的至少一个第二处理核心中预先指定的一第二处理核心。其中,第一处理核心可以处理众核系统的任务,还可以进行众核系统的任务分配和管理;而每个核心簇的主处理核心可以处理所在核心簇的任务,并可以进行簇内任务的分配和管理。Fig. 2 is a composition block diagram of a many-core system provided by an embodiment of the present disclosure. In an embodiment of the present disclosure, referring to Fig. 2, the many-core system includes a plurality of processing cores, and the plurality of processing cores include a first processing core and a second processing core, part or all of the second processing cores are pre-divided into at least one core cluster, each core cluster has a main processing core, and the main processing core is the core A pre-designated second processing core of the at least one second processing core of the cluster. Among them, the first processing core can process the tasks of the many-core system, and can also perform task allocation and management of the many-core system; and the main processing core of each core cluster can process the tasks of the core cluster where it is located, and can also perform tasks within the cluster. Assignment and management.
本公开实施例的核心控制方法可以应用于众核系统中任一核心簇的主处理核心,即本公开实施例的核心控制方法基于任一核心簇的主处理核心实现,任一核心簇的主处理核心可以通过本公开实施例的核心控制方法对其所在的核心簇的第二处理核心进行控制和管理。The core control method of the embodiment of the present disclosure can be applied to the main processing core of any core cluster in the many-core system, that is, the core control method of the embodiment of the present disclosure is implemented based on the main processing core of any core cluster, and the main processing core of any core cluster The processing core can control and manage the second processing core of the core cluster where it is located through the core control method of the embodiment of the present disclosure.
本公开实施例的核心控制方法还可以应用于众核系统的第一处理核心,即本公开实施例的核心控制方法基于众核系统的第一处理核心实现,第一处理核心可以通过本公开实施例的核心控制方法对众核系统的所有核心簇进行控制和管理。The core control method of the embodiment of the present disclosure can also be applied to the first processing core of the many-core system, that is, the core control method of the embodiment of the present disclosure is implemented based on the first processing core of the many-core system, and the first processing core can be implemented through the present disclosure The core control method of the example controls and manages all core clusters of the many-core system.
在一些实施例中,每个核心簇对应设置有一缓存器,该缓存器用于缓存对应的核心簇所需处理的任务的任务数据,因此缓存器的内存状态能够表征该核心簇的负载情况。其中,该缓存器为FIFO(First Input First Output,先入先出)缓存器,本公开对缓存器的具体类别不作限制。In some embodiments, each core cluster is correspondingly provided with a register, and the register is used to cache the task data of the task to be processed by the corresponding core cluster, so the memory status of the register can represent the load condition of the core cluster. Wherein, the buffer is a FIFO (First Input First Output, first-in-first-out) buffer, and this disclosure does not limit the specific type of the buffer.
在一些实施例中,可以通过检测核心簇对应的缓存器的实时内存空间使用率,来获取核心簇的负载数据。图3为本公开实施例提供的一种用于众核系统的核心控制方法的流程图,如图3所示,步骤S1可以进一步包括步骤S11a~步骤S13a。In some embodiments, the load data of the core cluster can be acquired by detecting the real-time memory space usage rate of the buffer corresponding to the core cluster. FIG. 3 is a flowchart of a core control method for a many-core system provided by an embodiment of the present disclosure. As shown in FIG. 3 , step S1 may further include steps S11a to S13a.
步骤S11a、检测该核心簇对应的缓存器的实时内存空间使用率。其中,实时内存空间使用率(实时内存空间使用占比)是指实时已使用内存空间大小与总内存空间大小的比值。Step S11a, detecting the real-time memory space utilization rate of the buffer corresponding to the core cluster. Wherein, the real-time memory space usage ratio (real-time memory space usage ratio) refers to the ratio of the real-time used memory space size to the total memory space size.
步骤S12a、检测该核心簇对应的缓存器的实时内存空间使用率与第一预设阈值的比较结果。其中,第一预设阈值可以根据实际需要设置,作为示例,第一预设阈值可以设置为大于或等于60%而小于100%的值,例如可以设置为70%,本公开对此不作限制。Step S12a, detecting a comparison result between the real-time memory space usage rate of the buffer corresponding to the core cluster and the first preset threshold. Wherein, the first preset threshold can be set according to actual needs. As an example, the first preset threshold can be set to a value greater than or equal to 60% but less than 100%, for example, can be set to 70%, which is not limited in the present disclosure.
步骤S13a、记录实时内存空间使用率持续大于或等于第一预设阈值的时长,并记为第一时长,该核心簇对应的负载数据包括第一时长。Step S13a, record the duration of the real-time memory space usage continuously greater than or equal to the first preset threshold, and record it as the first duration, and the load data corresponding to the core cluster includes the first duration.
可以理解的是,缓存器的实时内存空间使用率随时间变化而变化,因此,实时内存空间使用率持续大于或等于第一预设阈值的时长,是指实时内存空间使用率持续处于大于或等于第一预设阈值的状态的时长,其能够表征缓存器的当前负载状态,也即表征该核心簇的当前负载状态。It can be understood that the real-time memory space usage rate of the buffer changes with time. Therefore, the time period during which the real-time memory space usage rate is continuously greater than or equal to the first preset threshold means that the real-time memory space usage rate is continuously greater than or equal to The duration of the state of the first preset threshold can represent the current load state of the buffer, that is, represent the current load state of the core cluster.
在一些实施例中,在该核心簇对应的负载数据包括上述第一时长的情况下,如图3所示,步骤S2可以进一步包括步骤S21a和步骤S22a。In some embodiments, when the load data corresponding to the core cluster includes the above-mentioned first duration, as shown in FIG. 3 , step S2 may further include step S21a and step S22a.
步骤S21a、判断第一时长是否大于或等于第一预设时长,若是,则执行步骤S22a,若否,则不作进一步处理。Step S21a, judging whether the first duration is greater than or equal to the first preset duration, if yes, execute step S22a, if not, do not perform further processing.
步骤S22a、在第一时长大于或等于第一预设时长的情况下,确定该核心簇的负载状态为繁忙状态。Step S22a, when the first duration is greater than or equal to the first preset duration, determine that the load state of the core cluster is a busy state.
在一些实施例中,当检测到该核心簇对应的缓存器的实时内存空间使用率持续处于大于或等于第一预设阈值的状态,且持续时长(第一时长)大于或等于第一预设时长,表明该核心簇处于超负载状态,也即繁忙状态。若持续时长(第一时长)小于第一预设时长,则表明该核心簇并不处于繁忙状态,因此可以不作进一步处理。其中,第一预设时长可以根据实际情况设置,例如可以设置为15分钟、半个小时或者1个小时,本公开对此不作限制。通过这种方式,能够通过实时内存空间使用率来确定核心簇的负载状态,提高负载状态判断的效率。In some embodiments, when it is detected that the real-time memory space usage rate of the buffer corresponding to the core cluster is continuously greater than or equal to the first preset threshold, and the duration (first duration) is greater than or equal to the first preset The duration indicates that the core cluster is in an overloaded state, that is, a busy state. If the duration (the first duration) is less than the first preset duration, it indicates that the core cluster is not in a busy state, so no further processing may be performed. Wherein, the first preset duration can be set according to the actual situation, for example, it can be set to 15 minutes, half an hour or 1 hour, which is not limited in the present disclosure. In this way, the load state of the core cluster can be determined through the real-time memory space usage rate, and the efficiency of load state judgment can be improved.
图4为本公开实施例提供的一种用于众核系统的核心控制方法的流程图,如图4所示,步骤S1 还可以进一步包括步骤S11b~步骤S13b。FIG. 4 is a flow chart of a core control method for a many-core system provided by an embodiment of the present disclosure. As shown in FIG. 4 , step S1 may further include steps S11b to S13b.
步骤S11b、检测该核心簇对应的缓存器的实时内存空间使用率。Step S11b, detecting the real-time memory space usage rate of the buffer corresponding to the core cluster.
步骤S12b、检测该核心簇对应的缓存器的实时内存空间使用率与第二预设阈值的比较结果。其中,第二预设阈值大于0且小于第一预设阈值,第二预设阈值可以根据实际需要设置,作为示例,第二预设阈值可以设置为小于或等于40%的值,例如可以设置为10%或5%,本公开对此不作限制。Step S12b, detecting a comparison result between the real-time memory space usage rate of the buffer corresponding to the core cluster and a second preset threshold. Wherein, the second preset threshold is greater than 0 and less than the first preset threshold, and the second preset threshold can be set according to actual needs. As an example, the second preset threshold can be set to a value less than or equal to 40%, for example, it can be set 10% or 5%, which is not limited in the present disclosure.
步骤S13b、记录实时内存空间使用率持续小于或等于第二预设阈值的时长,并记为第二时长,该核心簇对应的负载数据包括第二时长。Step S13b, record the time duration during which the real-time memory space usage rate is continuously less than or equal to the second preset threshold, and record it as the second duration, and the load data corresponding to the core cluster includes the second duration.
可以理解的是,缓存器的实时内存空间使用率随时间变化而变化,因此,实时内存空间使用率持续小于或等于第二预设阈值的时长,是指实时内存空间使用率持续处于小于等于第二预设阈值的状态的时长,其能够表征缓存器的当前负载状态,也即表征该核心簇的当前负载状态。It can be understood that the real-time memory space usage rate of the buffer changes with time. Therefore, the time period during which the real-time memory space usage rate is continuously less than or equal to the second preset threshold means that the real-time memory space usage rate is continuously less than or equal to the second preset threshold. The duration of the state of the two preset thresholds can represent the current load state of the buffer, that is, represent the current load state of the core cluster.
在一些实施例中,在该核心簇对应的负载数据包括上述第二时长的情况下,如图4所示,步骤S2可以进一步包括步骤S21b和步骤S22b。In some embodiments, when the load data corresponding to the core cluster includes the above-mentioned second duration, as shown in FIG. 4 , step S2 may further include step S21b and step S22b.
步骤S21b、判断第二时长是否大于或等于第二预设时长,若是,则执行步骤S22b,若否,则不作进一步处理。Step S21b, judging whether the second duration is greater than or equal to the second preset duration, if yes, execute step S22b, if not, do no further processing.
步骤S22b、在第二时长大于或等于第二预设时长的情况下,确定该核心簇的负载状态为低负载状态。Step S22b, when the second duration is greater than or equal to the second preset duration, determine that the load state of the core cluster is a low load state.
在一些实施例中,当检测到该核心簇对应的缓存器的实时内存空间使用率持续处于小于或等于第二预设阈值的状态,且持续时长(第二时长)大于或等于第二预设时长,表明该核心簇处于资源过剩状态,也即低负载状态。若持续时长(第二时长)小于第二预设时长,则表明该核心簇并不处于低负载状态,因此可以不作进一步处理。其中,第二预设时长可以根据实际情况设置,第二预设时长可以等于第一预设时长,例如可以设置为15分钟、半个小时或者1个小时,本公开对此不作限制。通过这种方式,能够通过实时内存空间使用率来确定核心簇的负载状态,提高负载状态判断的效率。In some embodiments, when it is detected that the real-time memory space usage rate of the buffer corresponding to the core cluster is continuously less than or equal to the second preset threshold, and the duration (second duration) is greater than or equal to the second preset The duration indicates that the core cluster is in a state of excess resources, that is, a state of low load. If the duration (the second duration) is less than the second preset duration, it indicates that the core cluster is not in a low-load state, so no further processing may be performed. Wherein, the second preset duration can be set according to the actual situation, and the second preset duration can be equal to the first preset duration, for example, can be set to 15 minutes, half an hour or 1 hour, which is not limited in the present disclosure. In this way, the load state of the core cluster can be determined through the real-time memory space usage rate, and the efficiency of load state judgment can be improved.
在一些实施例中,检测该核心簇对应的缓存器的实时内存空间使用率,该核心簇的负载数据包括该核心簇对应的缓存器的实时内存空间使用率,若该缓存器的实时内存空间使用率持续处于大于或等于第一预设阈值的状态,且持续时长(第一时长)大于或等于第一预设时长,则确定该核心簇的负载状态为繁忙状态;若该缓存器的实时内存空间使用率持续处于小于或等于第二预设阈值的状态,且持续时长(第二时长)大于或等于第二预设时长,则确定该核心簇的负载状态为低负载状态;若该缓存器的实时内存空间使用率持续处于第二预设阈值和第一预设阈值之间,或者持续处于大于或等于第一预设阈值的状态的时长小于第一预设时长,且持续处于第二预设阈值的状态的时长小于第二预设时长,则该核心簇既不繁忙也不空闲,该核心簇的负载状态为中间状态,该中间状态为介于低负载状态和繁忙状态之间的状态,因此可以不作进一步处理。In some embodiments, the real-time memory space usage rate of the buffer corresponding to the core cluster is detected, the load data of the core cluster includes the real-time memory space usage rate of the buffer corresponding to the core cluster, if the real-time memory space of the buffer The utilization rate is continuously in the state greater than or equal to the first preset threshold value, and the duration (first duration) is greater than or equal to the first preset duration, then it is determined that the load state of the core cluster is a busy state; if the real-time The memory space utilization rate is continuously in the state less than or equal to the second preset threshold, and the duration (second duration) is greater than or equal to the second preset duration, then it is determined that the load state of the core cluster is a low load state; if the cache The real-time memory space usage rate of the device is continuously between the second preset threshold and the first preset threshold, or the duration of being in the state greater than or equal to the first preset threshold is shorter than the first preset duration, and is continuously at the second preset threshold. If the duration of the state of the preset threshold is less than the second preset duration, then the core cluster is neither busy nor idle, and the load state of the core cluster is an intermediate state, which is between a low load state and a busy state. status, so no further processing is possible.
在一些实施例中,可以通过检测核心簇对应的缓存器的内存空间使用增速,来获取核心簇的负载数据。举例来说,步骤S1可以进一步包括:获取该核心簇对应的缓存器的内存空间使用增速,该核心簇对应的负载数据包括对应的缓存器的内存空间使用增速。In some embodiments, the load data of the core cluster may be acquired by detecting the growth rate of memory space usage of the buffer corresponding to the core cluster. For example, step S1 may further include: acquiring the memory space usage growth rate of the buffer corresponding to the core cluster, and the load data corresponding to the core cluster includes the memory space usage growth rate of the corresponding buffer.
其中,内存空间使用增速是指缓存器的内存空间使用率在预设时间段(例如5分钟、10分钟或者15分钟)内的增长速度,即内存空间使用增速是指当前时间的内存空间使用率与历史时间的内存空间使用率之间的差值,与历史时间的内存空间使用率的比值。其中,历史时间至当前时间的时间段为预设时间段。Among them, the growth rate of memory space usage refers to the growth rate of the memory space usage rate of the buffer within a preset time period (such as 5 minutes, 10 minutes, or 15 minutes), that is, the memory space usage growth rate refers to the current time. The difference between the usage rate and the memory space usage rate of the historical time, and the ratio of the memory space usage rate of the historical time. Wherein, the time period from the historical time to the current time is a preset time period.
在一些实施例中,在通过检测核心簇对应的缓存器的内存空间使用增速,来获取核心簇的负载数据的情况下,步骤S2可以进一步包括。In some embodiments, in the case of acquiring the load data of the core cluster by detecting the memory space usage growth rate of the buffer corresponding to the core cluster, step S2 may further include.
判断该核心簇对应的缓存器的内存空间使用增速是否大于或等于第一预设增速值,若是,则执行下一步骤,若否,则不作进一步处理。其中,第一预设增速值为正值,其可以根据实际需要设置,作为示例,第一预设增速值可以是60%至90%之间的值,例如可设置为70%,本公开对此不作限制。It is judged whether the growth rate of the memory space usage of the buffer corresponding to the core cluster is greater than or equal to the first preset growth rate value, and if yes, the next step is executed, and if not, no further processing is performed. Wherein, the first preset speed-up value is a positive value, which can be set according to actual needs. As an example, the first preset speed-up value can be a value between 60% and 90%, for example, it can be set to 70%. There is no limit to this publicly.
在内存空间使用增速大于或等于第一预设增速值的情况下,确定该核心簇的负载状态为繁忙状态,并跳转至步骤S3。也就是说,若该核心簇对应的缓存器的内存空间使用增速大于或等于第一预 设增速值,表明该缓存器处于超负载状态,也即表明该核心簇处于超负载状态,即繁忙状态。通过这种方式,能够通过内存空间使用增速来确定负载状态,从而提高负载状态判断的效率。In the case that the growth rate of memory space usage is greater than or equal to the first preset growth rate value, it is determined that the load state of the core cluster is a busy state, and jump to step S3. That is to say, if the memory space usage growth rate of the buffer corresponding to the core cluster is greater than or equal to the first preset growth rate value, it indicates that the buffer is in an overloaded state, that is, it indicates that the core cluster is in an overloaded state, that is busy state. In this way, the load status can be determined through the memory space usage growth rate, thereby improving the efficiency of load status judgment.
在一些实施例中,若该核心簇对应的缓存器的内存空间使用增速小于第一预设增速值,表明该缓存器不处于超负载状态,也即表明该核心簇不处于超负载状态,即不处于繁忙状态,因此可以不作进一步处理,或者,进一步判断该核心簇是否处于低负载状态。In some embodiments, if the memory space usage growth rate of the buffer corresponding to the core cluster is less than the first preset growth rate value, it indicates that the buffer is not in an overload state, that is, it indicates that the core cluster is not in an overload state , that is, it is not in a busy state, so further processing may not be performed, or it may be further judged whether the core cluster is in a low-load state.
在一些实施例中,在通过检测核心簇对应的缓存器的内存空间使用增速和实时内存空间使用率,来获取核心簇的负载数据的情况下,即核心簇的负载数据包括对应的缓存器的内存空间使用增速和实时内存空间使用率,步骤S2可以进一步包括:In some embodiments, when the load data of the core cluster is obtained by detecting the memory space usage speed-up and real-time memory space usage rate of the buffer corresponding to the core cluster, that is, the load data of the core cluster includes the corresponding buffer The memory space usage growth rate and the real-time memory space usage rate, step S2 may further include:
判断该核心簇对应的缓存器的实时内存空间使用率是否小于或等于预设使用率,若是,则执行下一步骤,否则不作进一步处理。其中,预设使用率可以根据实际需要设置,例如可以设置为10%、20%或30%,本公开对此不作限制。It is judged whether the real-time memory space usage rate of the buffer corresponding to the core cluster is less than or equal to the preset usage rate, and if yes, the next step is executed; otherwise, no further processing is performed. Wherein, the preset usage rate can be set according to actual needs, for example, it can be set to 10%, 20% or 30%, which is not limited in the present disclosure.
判断内存空间使用增速是否小于或等于第二预设增速值,若是,则执行下一步骤,否则不作进一步处理。其中,第二预设增速值为大于负1而小于0的负值,第二预设增速值的具体取值可根据实际需要设置,作为示例,第二预设增速值可以是负90%至负50%之间的值,例如可以设置为负60%。It is judged whether the growth rate of memory space usage is less than or equal to the second preset growth rate value, and if so, the next step is executed, otherwise, no further processing is performed. Wherein, the second preset speed-up value is a negative value greater than minus 1 and less than 0, and the specific value of the second preset speed-up value can be set according to actual needs. As an example, the second preset speed-up value can be negative A value between 90% and minus 50%, for example, can be set to minus 60%.
在该核心簇对应的缓存器的实时内存空间使用率小于或等于预设使用率的情况下,进一步判断内存空间使用增速是否小于或等于第二预设增速值,从而确定该核心簇的负载状态。In the case that the real-time memory space usage rate of the buffer corresponding to the core cluster is less than or equal to the preset usage rate, it is further judged whether the memory space usage growth rate is less than or equal to the second preset growth rate value, thereby determining the core cluster load status.
在内存空间使用增速小于或等于第二预设增速值的情况下,确定该核心簇的负载状态为低负载状态,并跳转至步骤S3。In the case that the memory space usage growth rate is less than or equal to the second preset growth rate value, it is determined that the load state of the core cluster is a low load state, and jump to step S3.
若该核心簇对应的缓存器的实时内存空间使用率小于或等于预设使用率,且该核心簇对应的缓存器的内存空间使用增速小于或等于第二预设增速值,表明该核心簇对应的缓存器的实时内存空间使用率较小,且该缓存器的内存空间使用出现较大幅度的负增长,即处于资源过剩状态,也即表明该核心簇处于资源过剩状态,也即低负载状态。通过这种方式,能够通过内存空间使用增速及实时内存空间使用率共同确定负载状态,从而提高负载状态判断的准确性。If the real-time memory space utilization rate of the buffer corresponding to the core cluster is less than or equal to the preset utilization rate, and the memory space usage growth rate of the register corresponding to the core cluster is less than or equal to the second preset growth rate value, it indicates that the core The real-time memory space usage rate of the buffer corresponding to the cluster is small, and the memory space usage of the buffer has a relatively large negative growth, that is, it is in a state of excess resources, which means that the core cluster is in a state of excess resources, that is, low load state. In this way, the load status can be determined jointly by the memory space usage growth rate and the real-time memory space usage rate, thereby improving the accuracy of load status judgment.
在一些实施例中,若该核心簇对应的缓存器的内存空间使用增速小于第一预设增速值,且大于第二预设增速值,则表明该核心簇既不繁忙也不空闲,该核心簇的负载状态为中间状态,该中间状态为介于低负载状态和繁忙状态之间的状态,因此可以不作进一步处理。In some embodiments, if the memory space usage growth rate of the buffer corresponding to the core cluster is less than the first preset growth rate value and greater than the second preset growth rate value, it indicates that the core cluster is neither busy nor idle , the load state of the core cluster is an intermediate state, and the intermediate state is a state between the low load state and the busy state, so further processing may not be performed.
在一些实施例中,还可以通过检测核心簇的任务处理情况,来获取该核心簇的负载数据。举例来说,步骤S1可以进一步包括:实时检测该核心簇处理任务所需的任务处理时长,该核心簇对应的负载数据包括任务处理时长。可以理解的是,任务处理时长是指该核心簇处理该任务所花费的时长。In some embodiments, the load data of the core cluster can also be acquired by detecting the task processing status of the core cluster. For example, step S1 may further include: detecting in real time the task processing time required by the core cluster to process the task, and the load data corresponding to the core cluster includes the task processing time. It can be understood that the task processing time refers to the time spent by the core cluster to process the task.
在一些实施例中,在该核心簇对应的负载数据包括任务处理时长的情况下,步骤S2可以进一步包括:In some embodiments, when the load data corresponding to the core cluster includes task processing duration, step S2 may further include:
判断该核心簇处理任务所需的任务处理时长是否大于或等于第一预设处理时长,若是,则执行下一步骤,若否,则不作进一步处理。其中,第一预设处理时长可以根据实际需要设置,本公开对此不作限制。Judging whether the task processing duration required by the core cluster to process the task is greater than or equal to the first preset processing duration, if yes, execute the next step, and if not, do no further processing. Wherein, the first preset processing duration may be set according to actual needs, which is not limited in the present disclosure.
在该核心簇处理任务所需的任务处理时长大于或等于第一预设处理时长的情况下,确定该核心簇的负载状态为繁忙状态,并跳转至步骤S3。In the case that the task processing duration required by the core cluster to process the task is greater than or equal to the first preset processing duration, it is determined that the load status of the core cluster is a busy state, and jump to step S3.
若该核心簇对应的任务处理时长大于或等于第一预设处理时长,表明该核心簇处理任务所花费的时间较长,因此可以确定该核心簇处于超负载状态,即繁忙状态。通过这种方式,能够通过任务处理时长来确定负载状态,从而提高负载状态判断的效率。If the task processing duration corresponding to the core cluster is greater than or equal to the first preset processing duration, it indicates that the core cluster spends a long time processing the task, so it can be determined that the core cluster is in an overload state, that is, a busy state. In this way, the load status can be determined by the task processing duration, thereby improving the efficiency of load status judgment.
在一些实施例中,若该核心簇对应的任务处理时长小于第一预设处理时长,表明该核心簇不处于超负载状态,也即表明该核心簇不处于超负载状态,即不处于繁忙状态,因此可以不作进一步处理,或者,进一步判断该核心簇是否处于低负载状态。In some embodiments, if the task processing duration corresponding to the core cluster is less than the first preset processing duration, it indicates that the core cluster is not in an overload state, that is, it indicates that the core cluster is not in an overload state, that is, it is not in a busy state , so no further processing may be performed, or it may be further judged whether the core cluster is in a low-load state.
在一些实施例中,在该核心簇对应的负载数据包括任务处理时长的情况下,步骤S2可以进一步包括:In some embodiments, when the load data corresponding to the core cluster includes task processing duration, step S2 may further include:
判断该核心簇处理任务所需的任务处理时长是否小于或等于第二预设处理时长,若是,则执行下一步骤,否则不作进一步处理。其中,第二预设处理时长小于第一预设处理时长,第二预设处理 时长可以根据实际需要设置,本公开对此不作限制。It is judged whether the task processing duration required by the core cluster to process the task is less than or equal to the second preset processing duration, if yes, the next step is executed, otherwise no further processing is performed. Wherein, the second preset processing duration is shorter than the first preset processing duration, and the second preset processing duration can be set according to actual needs, which is not limited in the present disclosure.
在该核心簇处理任务所需的任务处理时长小于或等于第二预设处理时长的情况下,确定该核心簇的负载状态为低负载状态,并跳转至步骤S3。In the case that the task processing duration required by the core cluster to process the task is less than or equal to the second preset processing duration, determine that the load state of the core cluster is a low load state, and jump to step S3.
若该核心簇对应的任务处理时长小于或等于第二预设处理时长,表明该核心簇处理任务所花费的时间较短,因此可以确定该核心簇处于资源过剩状态,即低负载状态。通过这种方式,能够通过任务处理时长来确定负载状态,从而提高负载状态判断的效率。If the task processing duration corresponding to the core cluster is less than or equal to the second preset processing duration, it indicates that the core cluster takes a relatively short time to process the task, so it can be determined that the core cluster is in a state of excess resources, that is, a low load state. In this way, the load status can be determined by the task processing duration, thereby improving the efficiency of load status judgment.
在一些实施例中,若该核心簇对应的任务处理时长大于第二预设处理时长,且小于第一预设处理时长,则表明该核心簇既不繁忙也不空闲,该核心簇的负载状态为中间状态,该中间状态为介于低负载状态和繁忙状态之间的状态,因此可以不作进一步处理。In some embodiments, if the task processing duration corresponding to the core cluster is greater than the second preset processing duration and less than the first preset processing duration, it indicates that the core cluster is neither busy nor idle, and the load status of the core cluster is an intermediate state, which is a state between the low-load state and the busy state, and therefore may not be further processed.
在一些实施例中,众核系统包括多个核心簇,该多个核心簇基于同步周期进行任务处理,同步周期为各核心簇处理该任务所需的任务处理时长中最大的任务处理时长。例如,当前任务为待合成视频的人脸识别任务,该人脸识别任务包括多个子任务,例如多个子任务分别是视频流解码、人脸检测、人脸特征识别、特征提取、特征匹配,各核心簇负责各自对应的子任务,该多个子任务构成任务流水线,即前一核心簇处理对应的子任务的结果需发送至后一核心簇进行处理,在进行任务流水线处理时,多个核心簇具有一个统一的同步周期,该同步周期即为各核心簇处理对应的子任务所需的任务处理时长中最大的任务处理时长。在该同步周期结束后,该多个核心簇即可进行下一个任务处理,下一个任务处理例如是待合成视频的声音识别、视频合成。In some embodiments, the many-core system includes a plurality of core clusters, and the multiple core clusters perform task processing based on a synchronization period, and the synchronization period is the maximum task processing duration among the task processing durations required by each core cluster to process the task. For example, the current task is the face recognition task of the video to be synthesized. The face recognition task includes multiple subtasks. For example, the multiple subtasks are video stream decoding, face detection, face feature recognition, feature extraction, and feature matching. The core clusters are responsible for their corresponding subtasks. The multiple subtasks constitute a task pipeline, that is, the results of the corresponding subtasks processed by the previous core cluster need to be sent to the next core cluster for processing. When performing task pipeline processing, multiple core clusters There is a unified synchronization cycle, which is the maximum task processing time among the task processing time required by each core cluster to process the corresponding subtasks. After the synchronization period ends, the multiple core clusters can process the next task, such as voice recognition and video synthesis of the video to be synthesized.
在一些实施例中,在检测到各核心簇的负载数据包含各核心簇的任务处理时长的情况下,步骤S2可以进一步包括:In some embodiments, when it is detected that the load data of each core cluster includes the task processing duration of each core cluster, step S2 may further include:
统计预设检测时间段内,该核心簇对应的任务处理时长作为所述同步周期的频次。其中,预设检测时间段可以是预设的任意的时间段,在该步骤中,统计预设检测时间段内,该核心簇对应的任务处理时长作为所述同步周期的次数,即频次。The task processing duration corresponding to the core cluster within the preset detection time period is counted as the frequency of the synchronization cycle. Wherein, the preset detection time period may be any preset time period, and in this step, within the preset detection time period, the task processing duration corresponding to the core cluster is counted as the number of synchronization cycles, that is, the frequency.
判断预设检测时间段内该核心簇对应的任务处理时长作为所述同步周期的频次是否大于或等于第一预设次数,若是,则执行下一步骤,否则不作进一步处理。其中,第一预设次数可以根据实际需要设置,本公开对此不作限制。Judging whether the task processing time corresponding to the core cluster within the preset detection time period as the frequency of the synchronization cycle is greater than or equal to the first preset number of times, if so, execute the next step, otherwise do not perform further processing. Wherein, the first preset number of times may be set according to actual needs, which is not limited in the present disclosure.
确定该核心簇的负载状态为繁忙状态,并跳转至步骤S3。Determine that the load status of the core cluster is busy, and jump to step S3.
在该核心簇对应的任务处理时长作为所述同步周期的频次大于或等于第一预设次数的情况下,表明该核心簇的任务处理时长在所有核心簇中经常处于最大状态,因此可以确定该核心簇处于超负载状态,即繁忙状态。在该核心簇对应的任务处理时长作为所述同步周期的频次小于第一预设次数的情况下,表明该核心簇并不处于超负载状态,即不处于繁忙状态,因此可以不作进一步处理。In the case that the task processing duration corresponding to the core cluster is greater than or equal to the first preset number of times as the frequency of the synchronization cycle, it indicates that the task processing duration of the core cluster is often in the maximum state among all core clusters, so it can be determined that the The core cluster is overloaded, i.e. busy. In the case that the task processing duration corresponding to the core cluster is less than the first preset number of times as the frequency of the synchronization cycle, it indicates that the core cluster is not in an overloaded state, that is, not in a busy state, so further processing may not be performed.
通过这种方式,能够通过任务处理时长作为同步周期的频次确定负载状态,从而提高负载状态判断的准确性。In this way, the load status can be determined by using the task processing duration as the frequency of the synchronization cycle, thereby improving the accuracy of load status judgment.
在一些实施例中,在检测到各核心簇的负载数据包含各核心簇的任务处理时长的情况下,步骤S2可以进一步包括:In some embodiments, when it is detected that the load data of each core cluster includes the task processing duration of each core cluster, step S2 may further include:
统计预设检测时间段内,该核心簇对应的任务处理时长作为所述同步周期的频次。关于频次的统计方式,此处不再赘述。The task processing duration corresponding to the core cluster within the preset detection time period is counted as the frequency of the synchronization cycle. The statistics method of the frequency will not be repeated here.
计算该核心簇对应的任务处理时长作为同步周期的频次,与预设检测时间段内同步周期的个数的比值。可以理解的是,预设检测时间段内同步周期的个数即为多个核心簇在预设检测时间段内所处理的任务的个数。Calculate the task processing duration corresponding to the core cluster as the frequency of the synchronization cycle and the ratio of the number of synchronization cycles in the preset detection time period. It can be understood that the number of synchronization cycles in the preset detection time period is the number of tasks processed by the multiple core clusters in the preset detection time period.
判断该比值是否大于或等于第一预设比值,若是则执行下一步骤,否则不作进一步处理。其中,第一预设比值可以根据实际需要设置,本公开对此不作限制。It is judged whether the ratio is greater than or equal to the first preset ratio, and if so, the next step is executed; otherwise, no further processing is performed. Wherein, the first preset ratio can be set according to actual needs, which is not limited in the present disclosure.
确定该核心簇的负载状态为繁忙状态,并跳转至步骤S3。Determine that the load status of the core cluster is busy, and jump to step S3.
在该比值大于或等于第一预设比值的情况下,表明该核心簇的任务处理时长在所有核心簇中经常处于最大状态,因此可以确定该核心簇处于超负载状态,即繁忙状态。在该比值小于第一预设比值的情况下,表明该核心簇并不处于超负载状态,即不处于繁忙状态,因此可以不作进一步处理。If the ratio is greater than or equal to the first preset ratio, it indicates that the task processing time of the core cluster is always at the maximum among all core clusters, so it can be determined that the core cluster is in an overload state, that is, a busy state. If the ratio is smaller than the first preset ratio, it indicates that the core cluster is not in an overloaded state, that is, not in a busy state, and therefore no further processing may be performed.
通过这种方式,能够通过任务处理时长作为同步周期的频次确定负载状态,从而提高负载状态 判断的准确性。In this way, the load status can be determined by using the task processing time as the frequency of the synchronization cycle, thereby improving the accuracy of load status judgment.
在一些实施例中,在确定该核心簇的负载状态为繁忙状态的情况下,步骤S3可以进一步包括:增加该核心簇中当前可进行作业的第二处理核心的数量。In some embodiments, when it is determined that the load status of the core cluster is busy, step S3 may further include: increasing the number of second processing cores in the core cluster that can currently perform jobs.
图5为本公开实施例的核心控制方法的调控处理过程的流程图,在一些实施例中,在确定该核心簇的负载状态为繁忙状态的情况下,如图5所示,对核心簇进行调控处理的步骤S3可以进一步包括:步骤S31a~步骤S33a。Fig. 5 is a flow chart of the regulation and control process of the core control method of the embodiment of the present disclosure. In some embodiments, when the load state of the core cluster is determined to be a busy state, as shown in Fig. 5, the core cluster is The step S3 of the control processing may further include: step S31a to step S33a.
步骤S31a、在该核心簇的负载状态为繁忙状态的情况下,确定该核心簇是否具有可调的电压域和频率域,若是,则执行步骤S32a,否则执行步骤S33a。Step S31a, if the load state of the core cluster is busy, determine whether the core cluster has adjustable voltage domain and frequency domain, if yes, execute step S32a, otherwise execute step S33a.
举例来说,在该核心簇的负载状态为繁忙状态的情况下,检查该核心簇当前可进行作业的所有第二处理核心中,是否存在对应相同工作电压和工作频率且工作电压、工作频率可调的多个第二处理核心,若存在,则确定该核心簇具有可调的电压域和频率域,否则确定该核心簇不具有可调的电压域和频率域,其中,该核心簇具有可调的电压域是指该核心簇的多个第二处理核心对应一个工作电压且工作电压可调,在同一个可调的电压域中,对应的所有第二处理核心共享同一个工作电压设置;该核心簇具有可调的频率域是指该核心簇的多个第二处理核心对应一个工作频率且工作频率可调,在同一个可调的频率域中,对应的所有第二处理核心共享同一个工作频率设置。换言之,若该核心簇中存在多个第二处理核心以相同的工作电压进行工作,则该核心簇具有电压域,进一步地该工作电压可调时,表示该电压域为可调的电压域,相应的,工作电压与工作频率呈线性关系,因此该核心簇具有可调的频率域。For example, in the case that the load state of the core cluster is busy, it is checked whether there are any second processing cores corresponding to the same operating voltage and operating frequency among all the second processing cores that can currently perform operations in the core cluster and the operating voltage and operating frequency can be controlled. If there are a plurality of second processing cores that can be adjusted, it is determined that the core cluster has adjustable voltage domains and frequency domains, otherwise it is determined that the core cluster does not have adjustable voltage domains and frequency domains, wherein the core cluster has adjustable voltage domains and frequency domains. The adjustable voltage domain means that multiple second processing cores of the core cluster correspond to an operating voltage and the operating voltage is adjustable, and in the same adjustable voltage domain, all corresponding second processing cores share the same operating voltage setting; The core cluster has an adjustable frequency domain, which means that multiple second processing cores of the core cluster correspond to one operating frequency and the operating frequency is adjustable. In the same adjustable frequency domain, all corresponding second processing cores share the same An operating frequency setting. In other words, if there are multiple second processing cores working at the same operating voltage in the core cluster, the core cluster has a voltage domain, and further, when the operating voltage is adjustable, it means that the voltage domain is an adjustable voltage domain, Correspondingly, the working voltage has a linear relationship with the working frequency, so the core cluster has an adjustable frequency domain.
步骤S32a、将该核心簇的当前可进行作业的第二处理核心中,可调的电压域和频率域所对应的第二处理核心的工作电压和工作频率调高,并结束流程。Step S32a, among the second processing cores of the core cluster that can currently perform operations, the operating voltage and operating frequency of the second processing core corresponding to the adjustable voltage domain and frequency domain are increased, and the process ends.
在该核心簇的负载状态为繁忙状态的情况下,可以通过将该核心簇中该可调的电压域和频率域对应的部分或全部第二处理核心的工作电压和工作频率调高,从而提高该部分或全部第二处理核心的运行计算效率,以提高该部分或全部第二处理核心处理任务的效率,继而提高该核心簇整体的任务处理效率。In the case that the load state of the core cluster is a busy state, the operating voltage and operating frequency of some or all of the second processing cores corresponding to the adjustable voltage domain and frequency domain in the core cluster can be increased, thereby improving The operating computing efficiency of the part or all of the second processing cores is used to improve the efficiency of processing tasks of the part or all of the second processing cores, thereby improving the overall task processing efficiency of the core cluster.
作为示例,在根据上述第一时长与第一预设时长的比较结果确定该核心簇的负载状态为繁忙状态的情况下,可以根据预设的繁忙状态下的时长与电压调整幅度的对应关系,确定该第一时长所对应的电压调整幅度,并根据预设的繁忙状态下的时长与频率调整幅度的对应关系,确定该第一时长所对应的频率调整幅度。进一步地,根据该第一时长所对应的电压调整幅度,将该可调的电压域和频率域对应的第二处理核心的工作电压调高至相应的电压,以使该第二处理核心基于调整后的工作电压运行,以及,根据该第一时长所对应的频率调整幅度,将该可调的电压域和频率域对应的第二处理核心的工作频率调高至相应的频率,以使该第二处理核心基于调整后的工作频率运行。As an example, when it is determined that the load state of the core cluster is a busy state according to the comparison result between the first duration and the first preset duration, according to the corresponding relationship between the preset duration in the busy state and the voltage adjustment range, The voltage adjustment range corresponding to the first duration is determined, and the frequency adjustment range corresponding to the first duration is determined according to the preset corresponding relationship between the duration in the busy state and the frequency adjustment range. Further, according to the voltage adjustment range corresponding to the first duration, the operating voltage of the second processing core corresponding to the adjustable voltage domain and frequency domain is adjusted to a corresponding voltage, so that the second processing core can adjust the and, according to the frequency adjustment range corresponding to the first duration, adjust the working frequency of the second processing core corresponding to the adjustable voltage domain and frequency domain to the corresponding frequency, so that the first Two processing cores run based on the adjusted operating frequency.
其中,在繁忙状态下,时长与电压调整幅度的对应关系,以及时长与频率调整幅度的对应关系,可以根据实际需要设置。例如,假设第一预设时长为10分钟,则可以设设置10分钟至20分钟的时长范围所对应的电压调整幅度为10%,20分钟至40分钟的时长范围所对应的电压调整幅度为15%,40分钟至50分钟的时长范围所对应的电压调整幅度为20%,依次类推。同理,可以设置时长与频率调整幅度的对应关系,此处不再赘述。Wherein, in the busy state, the corresponding relationship between the duration and the voltage adjustment range, and the corresponding relationship between the duration and the frequency adjustment range can be set according to actual needs. For example, assuming that the first preset time length is 10 minutes, the voltage adjustment range corresponding to the time length range of 10 minutes to 20 minutes can be set as 10%, and the voltage adjustment range corresponding to the time length range of 20 minutes to 40 minutes is 15 minutes. %, the voltage adjustment range corresponding to the duration range from 40 minutes to 50 minutes is 20%, and so on. Similarly, the corresponding relationship between the duration and the frequency adjustment range can be set, which will not be repeated here.
作为示例,若第一时长为15分钟,第一预设时长为10分钟,而根据预设的繁忙状态下的时长与电压调整幅度的对应关系,查询出该第一时长对应的电压调整幅度为10%,则将该可调的电压域和频率域对应的第二处理核心的工作电压调高10%,工作频率的调整同理,此处不作赘述。As an example, if the first duration is 15 minutes, the first preset duration is 10 minutes, and according to the preset corresponding relationship between the duration in the busy state and the voltage adjustment range, the voltage adjustment range corresponding to the first duration is found to be 10%, then the working voltage of the second processing core corresponding to the adjustable voltage domain and frequency domain is increased by 10%.
作为示例,在根据上述内存空间使用增速与第一预设增速值的比较结果确定该核心簇的负载状态为繁忙状态的情况下,可以根据预设的繁忙状态下的内存空间使用增速与电压调整幅度的对应关系,确定该核心簇对应的内存空间使用增速所对应的电压调整幅度,并根据预设的繁忙状态下的内存空间使用增速与频率调整幅度的对应关系,确定该核心簇对应的内存空间使用增速所对应的频率调整幅度。进一步地,根据该核心簇对应的内存空间使用增速所对应的电压调整幅度,将该可调的电压域和频率域对应的第二处理核心的工作电压调高至相应的电压,以使该第二处理核心基于调整后的工作电压运行,以及,根据该核心簇对应的内存空间使用增速所对应的频率调整幅度,将该可 调的电压域和频率域对应的第二处理核心的工作频率调高至相应的频率,以使该第二处理核心基于调整后的工作频率运行。As an example, when it is determined that the load state of the core cluster is a busy state according to the comparison result of the above-mentioned memory space usage growth rate and the first preset growth rate value, the memory space usage growth rate in the preset busy state may be According to the corresponding relationship with the voltage adjustment range, determine the voltage adjustment range corresponding to the memory space usage growth rate corresponding to the core cluster, and determine the corresponding relationship between the memory space usage growth rate and the frequency adjustment range in the preset busy state. The frequency adjustment range corresponding to the memory space usage growth rate corresponding to the core cluster. Further, according to the voltage adjustment range corresponding to the memory space usage growth rate corresponding to the core cluster, the working voltage of the second processing core corresponding to the adjustable voltage domain and frequency domain is adjusted to a corresponding voltage, so that the The second processing core runs based on the adjusted operating voltage, and according to the frequency adjustment range corresponding to the memory space usage growth rate corresponding to the core cluster, the work of the second processing core corresponding to the adjustable voltage domain and frequency domain The frequency is increased to a corresponding frequency, so that the second processing core operates based on the adjusted working frequency.
其中,在繁忙状态下,内存空间使用增速与电压调整幅度的对应关系,以及内存空间使用增速与频率调整幅度的对应关系,可以根据实际需要设置,具体可参见上述对繁忙状态下的时长与电压调整幅度的对应关系以及时长与频率调整幅度的对应关系的说明,此处不再赘述。Among them, in the busy state, the corresponding relationship between the growth rate of memory space usage and the voltage adjustment range, and the corresponding relationship between the growth rate of memory space usage and the frequency adjustment range can be set according to actual needs. The description of the corresponding relationship with the voltage adjustment range and the corresponding relationship between the duration and the frequency adjustment range will not be repeated here.
作为示例,在根据上述该核心簇的任务处理时长与第一预设处理时长的比较结果确定该核心簇的负载状态为繁忙状态的情况下,可以根据预设的繁忙状态下的任务处理时长与电压调整幅度的对应关系,确定该核心簇对应的任务处理时长所对应的电压调整幅度,并根据预设的繁忙状态下的任务处理时长与频率调整幅度的对应关系,确定该核心簇对应的任务处理时长所对应的频率调整幅度。进一步地,根据该核心簇对应的任务处理时长所对应的电压调整幅度,将该可调的电压域和频率域对应的第二处理核心的工作电压调高至相应的电压,以使该第二处理核心基于调整后的工作电压运行,以及,根据该核心簇对应的任务处理时长所对应的频率调整幅度,将该可调的电压域和频率域对应的第二处理核心的工作频率调高至相应的频率,以使该第二处理核心基于调整后的工作频率运行。As an example, when it is determined that the load state of the core cluster is a busy state according to the comparison result between the task processing duration of the core cluster and the first preset processing duration, the task processing duration in the preset busy state and the Corresponding relationship of voltage adjustment range, determine the voltage adjustment range corresponding to the task processing duration corresponding to the core cluster, and determine the corresponding task of the core cluster according to the preset corresponding relationship between task processing duration and frequency adjustment range in the busy state The frequency adjustment range corresponding to the processing duration. Further, according to the voltage adjustment range corresponding to the task processing duration corresponding to the core cluster, the working voltage of the second processing core corresponding to the adjustable voltage domain and frequency domain is adjusted to the corresponding voltage, so that the second The processing core operates based on the adjusted operating voltage, and, according to the frequency adjustment range corresponding to the task processing duration corresponding to the core cluster, the operating frequency of the second processing core corresponding to the adjustable voltage domain and frequency domain is increased to corresponding frequency, so that the second processing core runs based on the adjusted working frequency.
其中,在繁忙状态下,任务处理时长与电压调整幅度的对应关系,以及任务处理时长与频率调整幅度的对应关系,可以根据实际需要设置,具体可参见上述对繁忙状态下的时长与电压调整幅度的对应关系以及时长与频率调整幅度的对应关系的说明,此处不再赘述。Among them, in the busy state, the corresponding relationship between the task processing time and the voltage adjustment range, and the corresponding relationship between the task processing time and the frequency adjustment range can be set according to actual needs. The description of the corresponding relationship between the duration and the frequency adjustment range will not be repeated here.
作为示例,在根据上述该核心簇对应的上述频次与第一预设次数的比较结果确定该核心簇的负载状态为繁忙状态的情况下,可以根据预设的繁忙状态下的频次与电压调整幅度的对应关系,确定该核心簇对应的上述频次所对应的电压调整幅度,并根据预设的繁忙状态下的频次与频率调整幅度的对应关系,确定该核心簇对应的上述频次所对应的频率调整幅度。进一步地,根据该核心簇对应的上述频次所对应的电压调整幅度,将该可调的电压域和频率域对应的第二处理核心的工作电压调高至相应的电压,以使该第二处理核心基于调整后的工作电压运行,以及,根据该核心簇对应的上述频次所对应的频率调整幅度,将该可调的电压域和频率域对应的第二处理核心的工作频率调高至相应的频率,以使该第二处理核心基于调整后的工作频率运行。As an example, when it is determined that the load state of the core cluster is a busy state according to the comparison result of the frequency corresponding to the core cluster and the first preset number of times, the amplitude can be adjusted according to the preset frequency and voltage in the busy state Determine the voltage adjustment range corresponding to the above frequency corresponding to the core cluster, and determine the frequency adjustment corresponding to the above frequency corresponding to the core cluster according to the preset corresponding relationship between the frequency in the busy state and the frequency adjustment range amplitude. Further, according to the voltage adjustment range corresponding to the above-mentioned frequency corresponding to the core cluster, the operating voltage of the second processing core corresponding to the adjustable voltage domain and frequency domain is adjusted to the corresponding voltage, so that the second processing The core operates based on the adjusted operating voltage, and, according to the frequency adjustment range corresponding to the frequency corresponding to the core cluster, the operating frequency of the second processing core corresponding to the adjustable voltage domain and frequency domain is increased to the corresponding frequency, so that the second processing core operates based on the adjusted operating frequency.
其中,在繁忙状态下,频次与电压调整幅度的对应关系,以及频次与频率调整幅度的对应关系,可以根据实际需要设置,具体可参见上述对繁忙状态下的时长与电压调整幅度的对应关系以及时长与频率调整幅度的对应关系的说明,此处不再赘述。Among them, in the busy state, the corresponding relationship between the frequency and the voltage adjustment range, and the corresponding relationship between the frequency and the frequency adjustment range can be set according to actual needs. The description of the corresponding relationship between the duration and the frequency adjustment range will not be repeated here.
步骤S33a、增加该核心簇中当前可进行作业的第二处理核心的数量,并结束流程。Step S33a, increase the number of second processing cores in the core cluster that can currently perform jobs, and end the process.
在该核心簇的负载状态为繁忙状态的情况下,可以通过增加该核心簇中当前可进行作业的第二处理核心的数量,从而提高该核心簇整体处理任务的效率。In the case that the load status of the core cluster is busy, the overall task processing efficiency of the core cluster can be improved by increasing the number of second processing cores in the core cluster that can currently perform jobs.
在一些实施例中,所需增加的当前可进行作业的第二处理核心的数量,可以根据该核心簇的繁忙程度确定,而该核心簇的繁忙程度例如可以由上述第一时长、上述内存空间使用增速、上述任务处理时长或者上述频次表征。In some embodiments, the number of second processing cores that need to be increased can be determined according to the busyness of the core cluster, and the busyness of the core cluster can be determined by, for example, the above-mentioned first duration, the above-mentioned memory space Use the growth rate, the above-mentioned task processing time, or the above-mentioned frequency representation.
作为示例,可以预设繁忙状态下的时长和所需增设核心数量的对应关系,在根据第一时长和第一预设时长的比较结果确定该核心簇的负载状态为繁忙状态的情况下,可以根据预设的繁忙状态下的时长和所需增设核心数量的对应关系,确定该第一时长对应的所需增设核心数量,以在该核心簇中增加相应数量的当前可进行作业的第二处理核心。As an example, the corresponding relationship between the duration of the busy state and the number of additional cores required can be preset, and when the load state of the core cluster is determined to be busy according to the comparison result of the first duration and the first preset duration, you can According to the corresponding relationship between the preset duration in the busy state and the required number of additional cores, determine the required number of additional cores corresponding to the first duration, so as to increase the corresponding number of second processes that can currently perform jobs in the core cluster core.
同理,可以通过预设繁忙状态下的内存空间使用增速和所需增设核心数量的对应关系,预设繁忙状态下的任务处理时长和所需增设核心数量的对应关系,或者预设繁忙状态下的频次和所需增设核心数量的对应关系,来确定所需增加的当前可进行作业的第二处理核心的数量。In the same way, you can preset the corresponding relationship between the memory space usage growth rate in the busy state and the number of cores required, the task processing time in the busy state and the number of cores required, or the busy state The corresponding relationship between the frequency and the number of cores to be added is used to determine the number of second processing cores that need to be added and can currently perform operations.
在一些实施例中,增加该核心簇中当前可进行作业的第二处理核心的数量的步骤可以进一步包括:将众核系统中该核心簇之外的空闲的一个或多个第二处理核心加入到该核心簇中,以作为该核心簇中当前可进行作业的第二处理核心;和/或,In some embodiments, the step of increasing the number of second processing cores that can currently perform operations in the core cluster may further include: adding one or more idle second processing cores outside the core cluster in the many-core system to into the core cluster as the second processing core currently available for operations in the core cluster; and/or,
将该核心簇中的处于关闭状态的一个或多个第二处理核心进行唤醒,以作为该核心簇中当前可进行作业的第二处理核心。One or more second processing cores in the closed state in the core cluster are awakened to serve as the second processing cores in the core cluster that can currently perform jobs.
举例来说,每个第二处理核心中均具有控制器,该控制器用于控制该第二处理核心关闭或唤醒(开启)该第二处理核心,通过向第二处理核心的控制器发送唤醒指令,能够唤醒该第二处理核心,通过向第二处理核心的控制器发送关闭指令,能够关闭该第二处理核心。For example, there is a controller in each second processing core, and the controller is used to control the second processing core to shut down or wake up (turn on) the second processing core, by sending a wake-up instruction to the controller of the second processing core , the second processing core can be woken up, and the second processing core can be shut down by sending a shutdown command to the controller of the second processing core.
在一些实施例中,在确定该核心簇的负载状态为低负载状态的情况下,步骤S3可以进一步包括:减少该核心簇中当前可进行作业的第二处理核心的数量。In some embodiments, when it is determined that the load state of the core cluster is a low load state, step S3 may further include: reducing the number of second processing cores in the core cluster that can currently perform jobs.
图6为本公开实施例的核心控制方法的调控处理过程的流程图,在一些实施例中,在确定该核心簇的负载状态为低负载状态的情况下,如图6所示,对核心簇进行调控处理的步骤S3可以进一步包括:步骤S31b~步骤S33b。Fig. 6 is a flowchart of the regulation and control process of the core control method of the embodiment of the present disclosure. In some embodiments, when the load state of the core cluster is determined to be a low load state, as shown in Fig. 6, the core cluster Step S3 of performing regulation processing may further include: step S31b to step S33b.
步骤S31b、在该核心簇的负载状态为低负载状态的情况下,确定该核心簇是否具有可调的电压域和频率域,若是,则执行步骤S32b,否则执行步骤S33b。举例来说,在该核心簇的负载状态为低负载状态的情况下,检查该核心簇当前可进行作业的所有第二处理核心中,是否存在对应相同工作电压和工作频率且电压、频率可调的多个第二处理核心,若存在,则确定该核心簇具有可调的电压域和频率域,否则确定该核心簇不具有可调的电压域和频率域。Step S31b, if the load state of the core cluster is low load state, determine whether the core cluster has adjustable voltage domain and frequency domain, if yes, execute step S32b, otherwise execute step S33b. For example, in the case that the load state of the core cluster is a low load state, it is checked whether all the second processing cores in the core cluster that can currently perform operations have the same operating voltage and operating frequency and the voltage and frequency are adjustable. If there are a plurality of second processing cores, it is determined that the core cluster has an adjustable voltage domain and a frequency domain; otherwise, it is determined that the core cluster does not have an adjustable voltage domain and a frequency domain.
步骤S32b、将该核心簇的当前可进行作业的第二处理核心中,可调的电压域和频率域所对应的第二处理核心的工作电压和/或工作频率调低,并结束流程。In step S32b, among the second processing cores in the core cluster that can currently perform operations, the operating voltage and/or operating frequency of the second processing cores corresponding to the adjustable voltage domain and frequency domain are lowered, and the process ends.
在该核心簇的负载状态为低负载状态的情况下,可以通过将该核心簇中该可调的电压域和频率域对应的部分或全部第二处理核心的工作电压和工作频率调低,从而降低该部分或全部第二处理核心的运行计算效率,以有效节省核心簇的功耗,降低众核系统的功耗,节约资源利用。When the load state of the core cluster is a low load state, the operating voltage and operating frequency of some or all of the second processing cores corresponding to the adjustable voltage domain and frequency domain in the core cluster can be lowered, thereby Reduce the operational computing efficiency of the part or all of the second processing cores, so as to effectively save the power consumption of the core cluster, reduce the power consumption of the many-core system, and save resource utilization.
作为示例,在根据上述第二时长与第二预设时长的比较结果确定该核心簇的负载状态为低负载状态的情况下,可以根据预设的低负载状态下的时长与电压调整幅度的对应关系,确定该第二时长所对应的电压调整幅度,并根据预设的低负载状态下的时长与频率调整幅度的对应关系,确定该第二时长所对应的频率调整幅度。进一步地,根据该第二时长所对应的电压调整幅度,将该可调的电压域和频率域对应的第二处理核心的工作电压调低至相应的电压,以使该第二处理核心基于调整后的工作电压运行,以及,根据该第二时长所对应的频率调整幅度,将该可调的电压域和频率域对应的第二处理核心的工作频率调低至相应的频率,以使该第二处理核心基于调整后的工作频率运行。As an example, when the load state of the core cluster is determined to be a low-load state according to the comparison result between the above-mentioned second duration and the second preset duration, it may be based on the correspondence between the preset duration in the low-load state and the voltage adjustment range The voltage adjustment range corresponding to the second time length is determined, and the frequency adjustment range corresponding to the second time length is determined according to the preset corresponding relationship between the time length and the frequency adjustment range under the low load state. Further, according to the voltage adjustment range corresponding to the second duration, the operating voltage of the second processing core corresponding to the adjustable voltage domain and frequency domain is lowered to the corresponding voltage, so that the second processing core can adjust the The last working voltage runs, and, according to the frequency adjustment range corresponding to the second duration, the working frequency of the second processing core corresponding to the adjustable voltage domain and frequency domain is adjusted down to the corresponding frequency, so that the first Two processing cores run based on the adjusted operating frequency.
其中,在低负载状态下,时长与电压调整幅度的对应关系,以及时长与频率调整幅度的对应关系,可以根据实际需要设置。例如,假设第二预设时长为10分钟,则可以设设置10分钟至20分钟的时长范围所对应的电压调整幅度为10%,20分钟至40分钟的时长范围所对应的电压调整幅度为15%,40分钟至50分钟的时长范围所对应的电压调整幅度为20%,依次类推。同理,可以设置低负载状态下的时长与频率调整幅度的对应关系,此处不再赘述。Wherein, in the low load state, the corresponding relationship between the duration and the voltage adjustment range, and the corresponding relationship between the duration and the frequency adjustment range can be set according to actual needs. For example, assuming that the second preset duration is 10 minutes, the voltage adjustment range corresponding to the duration range from 10 minutes to 20 minutes can be set to be 10%, and the voltage adjustment range corresponding to the duration range from 20 minutes to 40 minutes can be set to 15 minutes. %, the voltage adjustment range corresponding to the duration range from 40 minutes to 50 minutes is 20%, and so on. Similarly, you can set the corresponding relationship between the duration of the low-load state and the frequency adjustment range, which will not be repeated here.
作为示例,若第二时长为15分钟,第二预设时长为10分钟,而根据预设的低负载状态下的时长与电压调整幅度的对应关系,查询出该第二时长对应的电压调整幅度为10%,则将将具有电压域和频率域的第二处理核心的工作电压调低10%,工作频率的调整同理,此处不再赘述。As an example, if the second duration is 15 minutes, the second preset duration is 10 minutes, and the voltage adjustment range corresponding to the second duration is queried according to the preset correspondence between the duration in the low-load state and the voltage adjustment range If it is 10%, then the working voltage of the second processing core with voltage domain and frequency domain will be lowered by 10%.
作为示例,在根据上述内存空间使用增速与第二预设增速值的比较结果确定该核心簇的负载状态为低负载状态的情况下,可以根据预设的低负载状态下的内存空间使用增速与电压调整幅度的对应关系,确定该核心簇对应的内存空间使用增速所对应的电压调整幅度,并根据预设的低负载状态下的内存空间使用增速与频率调整幅度的对应关系,确定该核心簇对应的内存空间使用增速所对应的频率调整幅度。进一步地,根据该核心簇对应的内存空间使用增速所对应的电压调整幅度,将可调的电压域和频率域对应的第二处理核心的工作电压调低至相应的电压,以使该第二处理核心基于调整后的工作电压运行,以及,根据该核心簇对应的内存空间使用增速所对应的频率调整幅度,将可调的电压域和频率域对应的第二处理核心的工作频率调低至相应的频率,以使该第二处理核心基于调整后的工作频率运行。As an example, when it is determined that the load state of the core cluster is a low-load state according to the comparison result of the above-mentioned memory space usage speed-up and the second preset speed-up value, the memory space usage in the preset low-load state can be The corresponding relationship between the growth rate and the voltage adjustment range determines the voltage adjustment range corresponding to the memory space usage growth rate corresponding to the core cluster, and according to the corresponding relationship between the memory space usage growth rate and the frequency adjustment range under the preset low load state , to determine the frequency adjustment range corresponding to the memory space usage growth rate corresponding to the core cluster. Further, according to the voltage adjustment range corresponding to the memory space usage growth rate corresponding to the core cluster, the working voltage of the second processing core corresponding to the adjustable voltage domain and frequency domain is adjusted down to the corresponding voltage, so that the first The second processing core runs based on the adjusted operating voltage, and adjusts the operating frequency of the second processing core corresponding to the adjustable voltage domain and frequency domain according to the frequency adjustment range corresponding to the memory space usage growth rate corresponding to the core cluster. down to a corresponding frequency so that the second processing core operates based on the adjusted operating frequency.
其中,在低负载状态下,内存空间使用增速与电压调整幅度的对应关系,以及内存空间使用增速与频率调整幅度的对应关系,可以根据实际需要设置,具体可参见上述对低负载状态下的时长与电压调整幅度的对应关系以及时长与频率调整幅度的对应关系的说明,此处不再赘述。Among them, in the low-load state, the corresponding relationship between the memory space usage growth rate and the voltage adjustment range, as well as the memory space usage growth rate and the frequency adjustment range, can be set according to actual needs. The description of the corresponding relationship between the duration and the voltage adjustment range and the corresponding relationship between the duration and the frequency adjustment range will not be repeated here.
作为示例,在根据上述该核心簇的任务处理时长与第二预设处理时长的比较结果确定该核心簇 的负载状态为低负载状态的情况下,可以根据预设的低负载状态下的任务处理时长与电压调整幅度的对应关系,确定该核心簇对应的任务处理时长所对应的电压调整幅度,并根据预设的低负载状态下的任务处理时长与频率调整幅度的对应关系,确定该核心簇对应的任务处理时长所对应的频率调整幅度。进一步地,根据该核心簇对应的任务处理时长所对应的电压调整幅度,将该可调的电压域和频率域对应的第二处理核心的工作电压调低至相应的电压,以使该第二处理核心基于调整后的工作电压运行,以及,根据该核心簇对应的任务处理时长所对应的频率调整幅度,将该可调的电压域和频率域对应的第二处理核心的工作频率调低至相应的频率,以使该第二处理核心基于调整后的工作频率运行。As an example, when it is determined that the load state of the core cluster is a low-load state according to the comparison result of the task processing duration of the above-mentioned core cluster and the second preset processing duration, the task processing in the preset low-load state can be The corresponding relationship between the duration and the voltage adjustment range, determine the voltage adjustment range corresponding to the task processing time corresponding to the core cluster, and determine the corresponding relationship between the task processing time and the frequency adjustment range under the preset low load state. The frequency adjustment range corresponding to the corresponding task processing time. Further, according to the voltage adjustment range corresponding to the task processing duration corresponding to the core cluster, the working voltage of the second processing core corresponding to the adjustable voltage domain and frequency domain is adjusted down to the corresponding voltage, so that the second The processing core operates based on the adjusted operating voltage, and, according to the frequency adjustment range corresponding to the task processing duration corresponding to the core cluster, the operating frequency of the second processing core corresponding to the adjustable voltage domain and frequency domain is lowered to corresponding frequency, so that the second processing core runs based on the adjusted working frequency.
其中,在低负载状态下,任务处理时长与电压调整幅度的对应关系,以及任务处理时长与频率调整幅度的对应关系,可以根据实际需要设置,具体可参见上述对低负载状态下的时长与电压调整幅度的对应关系以及时长与频率调整幅度的对应关系的说明,此处不再赘述。Among them, in the low-load state, the corresponding relationship between the task processing time and the voltage adjustment range, and the corresponding relationship between the task processing time and the frequency adjustment range can be set according to actual needs. The description of the corresponding relationship between the adjustment range and the corresponding relationship between the duration and the frequency adjustment range will not be repeated here.
步骤S33b、减少该核心簇中当前可进行作业的第二处理核心的数量,并结束流程。Step S33b, reducing the number of second processing cores in the core cluster that can currently perform jobs, and ending the process.
在该核心簇的负载状态为低负载状态的情况下,可以通过减少该核心簇中当前可进行作业的第二处理核心的数量,从而节省该核心簇的功耗,降低众核系统的功耗,节约资源利用。When the load state of the core cluster is a low load state, the power consumption of the core cluster can be saved and the power consumption of the many-core system can be reduced by reducing the number of second processing cores that can currently perform operations in the core cluster. , save resource utilization.
在一些实施例中,所需减少的当前可进行作业的第二处理核心的数量,可以根据该核心簇的低负载程度确定,而该核心簇的低负载程度例如可以由上述第二时长、上述任务处理时长或者上述频次表征。In some embodiments, the number of second processing cores that can currently perform jobs that need to be reduced can be determined according to the low load level of the core cluster, and the low load level of the core cluster can be determined by, for example, the above-mentioned second duration, the above-mentioned Task processing duration or the above-mentioned frequency representation.
作为示例,可以预设低负载状态下的时长和所需减少核心数量的对应关系,在根据第二时长和第二预设时长的比较结果确定该核心簇的负载状态为低负载状态的情况下,可以根据预设的低负载状态下的时长和所需减少核心数量的对应关系,确定该第二时长对应的所需减少核心数量,以在该核心簇中减少相应数量的当前可进行作业的第二处理核心。As an example, the corresponding relationship between the duration in the low-load state and the number of cores required to be reduced can be preset, and when the load state of the core cluster is determined to be the low-load state according to the comparison result between the second duration and the second preset duration , according to the preset corresponding relationship between the duration of the low-load state and the number of cores required to be reduced, the number of cores required to be reduced corresponding to the second duration can be determined, so as to reduce the corresponding number of currently available jobs in the core cluster Second processing core.
同理,可以通过预设低负载状态下的内存空间使用增速和所需减少核心数量的对应关系,预设低负载状态下的任务处理时长和所需减少核心数量的对应关系,或者预设低负载状态下的频次和所需减少核心数量的对应关系,来确定所需减少的当前可进行作业的第二处理核心的数量。In the same way, you can preset the corresponding relationship between the growth rate of memory space usage under low-load conditions and the required reduction in the number of cores, the corresponding relationship between the task processing time under low-load conditions and the required reduction in the number of cores, or preset The corresponding relationship between the frequency in the low-load state and the number of cores that need to be reduced is used to determine the number of second processing cores that need to be reduced that can currently perform operations.
在一些实施例中,减少该核心簇中当前可进行作业的第二处理核心的数量的步骤可以进一步包括:将该核心簇中当前可进行作业的至少一个第二处理核心移除出该核心簇;In some embodiments, the step of reducing the number of second processing cores currently capable of operating in the core cluster may further include: removing at least one second processing core currently capable of operating in the core cluster from the core cluster ;
和/或,控制该核心簇中当前可进行作业的至少一个第二处理核心处于关闭状态。And/or, at least one second processing core that is currently capable of operating in the core cluster is controlled to be in a closed state.
图7为本公开实施例的一种众核系统的应用场景示意图,在一种应用场景中,如图7所示,众核系统可以用于处理业务的任务流水线中各项任务,其中,业务例如是人脸识别业务,人脸识别业务的任务流水线中的各项任务可以包括需依次执行的视频流解码任务、人脸检测任务、人脸特征识别任务、人脸特征提取任务、人脸特征匹配任务等。FIG. 7 is a schematic diagram of an application scenario of a many-core system according to an embodiment of the present disclosure. In an application scenario, as shown in FIG. For example, in the face recognition business, the various tasks in the task pipeline of the face recognition business can include video stream decoding tasks, face detection tasks, face feature recognition tasks, face feature extraction tasks, and face feature recognition tasks that need to be executed in sequence. matching tasks, etc.
其中,众核系统的每个核心簇可以分别对应处理任务流水线中的一个任务,众核系统的各核心簇按照流水线的作业顺序依次处理各自对应的任务。在任务流水线上,针对每个核心簇,对于该核心簇处理对应任务后的任务数据,可以发送给在任务流水线上顺序位于该核心簇之后的核心簇所对应的缓存器进行缓存,以便顺序位于该核心簇之后的核心簇根据需要进行读取,同时开始运行其对应的任务,其中缓存器还可以缓存其他外部设备传递的数据。Wherein, each core cluster of the many-core system can respectively process a task in the task pipeline, and each core cluster of the many-core system sequentially processes its corresponding tasks according to the operation sequence of the pipeline. On the task pipeline, for each core cluster, the task data after the core cluster processes the corresponding task can be sent to the buffer corresponding to the core cluster that is sequentially located after the core cluster on the task pipeline for caching, so that the sequence is located at The core clusters behind the core cluster read as required, and start to run their corresponding tasks at the same time, wherein the cache memory can also cache data transferred by other external devices.
对于任意一个核心簇,该核心簇内的数据处理逻辑可以包括数据并行处理逻辑。图8为本公开实施例的一种众核系统的核心簇的组成框图,如图8所示,各核心簇均包含多个子簇,每个子簇包含当前可进行作业的至少一个第二处理核心,该多个子簇用于并行处理所在核心簇对应的任务。例如,核心簇对应的任务是人脸识别,在获取到多帧图像数据后,该核心簇的多个子簇中每个子簇可以负责根据一帧或多帧图像数据进行人脸识别,假设有三个子簇,有三帧图像,则该三个子簇可以分别对应处理其中一帧图像。For any core cluster, the data processing logic in the core cluster may include data parallel processing logic. Fig. 8 is a composition block diagram of a core cluster of a many-core system according to an embodiment of the present disclosure. As shown in Fig. 8, each core cluster includes a plurality of sub-clusters, and each sub-cluster includes at least one second processing core that can currently perform operations , the plurality of sub-clusters are used to process tasks corresponding to the core clusters in parallel. For example, the task corresponding to the core cluster is face recognition. After obtaining multiple frames of image data, each sub-cluster in the multiple sub-clusters of the core cluster can be responsible for face recognition based on one or more frames of image data. Suppose there are three sub-clusters There are three frames of images in a cluster, and the three sub-clusters can respectively process one frame of images.
在如图8所示的场景下,在该核心簇的负载状态为繁忙状态,且增加该核心簇中当前可进行作业的第二处理核心的数量的情况下,在增加该核心簇中当前可进行作业的第二处理核心的数量的步骤之后,该核心控制方法还可以进一步包括:步骤S4a~步骤S6a。In the scenario shown in Figure 8, when the load status of the core cluster is busy and the number of second processing cores that can currently perform jobs in the core cluster is increased, the number of currently available second processing cores in the core cluster is increased. After the step of the number of second processing cores performing the job, the core control method may further include: Step S4a-Step S6a.
步骤S4a、根据该核心簇中新增的当前可进行作业的第二处理核心,组建该核心簇的新子簇并 获取该新子簇的配置信息。Step S4a, according to the newly added second processing core in the core cluster that can currently perform operations, build a new sub-cluster of the core cluster and obtain the configuration information of the new sub-cluster.
图9为本公开实施例的组建新子簇后的核心簇的组成框图,如图9所示,在步骤S4a中,可以将新增的当前可进行作业的一个或多个第二处理核心,作为该核心簇的新子簇,并获取该新子簇的配置信息。其中,新子簇可以和其他子簇一起并行处理该核心簇对应的任务,该新子簇包括新增的当前可进行作业的一个或多个第二处理核心,配置信息包括但不限于该新子簇中第二处理核心的数量以及各第二处理核心的地址信息。FIG. 9 is a block diagram of a core cluster after a new sub-cluster is established according to an embodiment of the present disclosure. As shown in FIG. 9, in step S4a, one or more second processing cores that can currently perform operations can be newly added, As a new subcluster of the core cluster, and obtain the configuration information of the new subcluster. Wherein, the new sub-cluster can process tasks corresponding to the core cluster in parallel with other sub-clusters, the new sub-cluster includes one or more second processing cores that can currently perform operations, and the configuration information includes but is not limited to the new The number of second processing cores in the sub-cluster and address information of each second processing core.
步骤S5a、向该核心簇的前继核心簇中的目标处理核心发送该新子簇的配置信息,以供前继核心簇中的目标处理核心根据该核心簇的新子簇的配置信息,建立该新子簇的输入分流。Step S5a, sending the configuration information of the new subcluster to the target processing core in the predecessor core cluster of the core cluster, so that the target processing core in the predecessor core cluster can establish The input for this new subcluster is shunted.
在一些实施例中,在步骤S5a中,由该核心簇的主处理核心向核心簇的前继核心簇中的目标处理核心发送该新子簇的配置信息。In some embodiments, in step S5a, the main processing core of the core cluster sends the configuration information of the new sub-cluster to the target processing core in the predecessor core cluster of the core cluster.
其中,前继核心簇为该核心簇在任务流水线上的前一个核心簇,前继核心簇中的目标处理核心用于根据该核心簇的新子簇的配置信息,建立该新子簇的输入分流,输入分流为前继核心簇向该新子簇输出数据的路径。前继核心簇中的目标处理核心可以是前继核心簇中的主处理核心,也可以是前继核心簇中负责进行数据输出的第二处理核心。Wherein, the previous core cluster is the previous core cluster of the core cluster on the task pipeline, and the target processing core in the previous core cluster is used to establish the input of the new sub-cluster according to the configuration information of the new sub-cluster of the core cluster. Split, the input split is the path for outputting data from the previous core cluster to the new sub-cluster. The target processing core in the predecessor core cluster may be the main processing core in the predecessor core cluster, or the second processing core responsible for data output in the predecessor core cluster.
举例来说,前继核心簇可以包括一个任务调度器,该任务调度器可以配置于该前继核心簇中负责进行数据输出的第二处理核心中,也可以配置于该前继核心簇的主处理核心中。该任务调度器维护一个前继任务列表,该前继任务列表上标记有前继核心簇所在任务流水线上的后一个核心簇的子簇数量、各子簇包括的第二处理核心数量以及各子簇的地址等信息。For example, the predecessor core cluster may include a task scheduler, and the task scheduler may be configured in the second processing core in charge of data output in the predecessor core cluster, or may be configured in the main processing core of the predecessor core cluster. processing core. The task scheduler maintains a previous task list, which is marked with the number of sub-clusters of the next core cluster on the task pipeline where the previous core cluster is located, the number of second processing cores included in each sub-cluster, and the number of sub-clusters of each sub-cluster. Cluster address and other information.
其中,在前继任务列表中标记的核心簇的每个子簇均对应设置一标志位,标志位的值表征对应的子簇的状态,例如该标志位为有效值时,表示对应的子簇当前可用,而标志位为无效值时,表示对应的子簇不可用。Wherein, each subcluster of the core cluster marked in the previous task list is correspondingly set with a flag bit, and the value of the flag bit represents the state of the corresponding subcluster. For example, when the flag bit is a valid value, it means that the corresponding subcluster is currently Available, and when the flag bit is an invalid value, it means that the corresponding subcluster is not available.
前继核心簇可以根据其维护的前继任务列表对前继任务列表中的核心簇的各子簇进行任务分配。其中,前继核心簇可以根据在任务流水线上位于其之后且相邻的一个核心簇传递的更新信息更新其维护的前继任务列表。例如,核心簇在增加核心组建新子簇后,可以由核心簇的主处理核心将该新子簇的配置信息发给其前继核心簇,以使前继核心簇的目标处理核心将该配置信息写入前继任务列表中,并将增加的新子簇对应的标志位设置为有效值。The predecessor core cluster can allocate tasks to each subcluster of the core cluster in the predecessor task list according to the predecessor task list maintained by it. Wherein, the predecessor core cluster may update the predecessor task list maintained by it according to the update information transmitted by a core cluster located behind and adjacent to it on the task pipeline. For example, after a core cluster adds cores to form a new sub-cluster, the main processing core of the core cluster can send the configuration information of the new sub-cluster to its predecessor core cluster, so that the target processing core of the predecessor core cluster can configure the new sub-cluster The information is written into the predecessor task list, and the flag bit corresponding to the added new subcluster is set as a valid value.
步骤S6a、向该核心簇的后继核心簇中的目标处理核心发送该新子簇的配置信息,以供后继核心簇中的目标处理核心根据该核心簇的新子簇的配置信息,建立该新子簇的输出分流。Step S6a, sending the configuration information of the new sub-cluster to the target processing core in the successor core cluster of the core cluster, so that the target processing core in the successor core cluster can establish the new sub-cluster according to the configuration information of the new sub-cluster of the core cluster. Output split for subclusters.
在一些实施例中,在步骤S6a中,由该核心簇的主处理核心向核心簇的后继核心簇中的目标处理核心发送该新子簇的配置信息。In some embodiments, in step S6a, the main processing core of the core cluster sends the configuration information of the new sub-cluster to the target processing core in the successor core cluster of the core cluster.
其中,后继核心簇为该核心簇在任务流水线上的后一个核心簇,后继核心簇中的目标处理核心用于根据该核心簇的新子簇的配置信息,建立该新子簇的输出分流,输出分流为该新子簇向后继核心簇输出数据的路径。后继核心簇中的目标处理核心可以是后继核心簇的主处理核心,也可以是后继核心簇中负责进行数据输出的第二处理核心。Wherein, the successor core cluster is the last core cluster of the core cluster on the task pipeline, and the target processing core in the successor core cluster is used to establish the output shunt of the new sub-cluster according to the configuration information of the new sub-cluster of the core cluster, The output split is a path for the new sub-cluster to output data to the successor core cluster. The target processing core in the successor core cluster may be the main processing core of the successor core cluster, or the second processing core responsible for data output in the successor core cluster.
在一些实施例中,后继核心簇的目标处理核心可以根据需要维护有一个后继任务列表,该后继任务列表上标记有后继核心簇所在任务流水线上的前一个核心簇的子簇数量、各子簇包括的第二处理核心数量以及各子簇的地址等信息。In some embodiments, the target processing core of the successor core cluster can maintain a successor task list as required, and the successor task list is marked with the number of sub-clusters of the previous core cluster on the task pipeline where the successor core cluster is located, and the number of sub-clusters of each sub-cluster. Information such as the number of second processing cores and the address of each sub-cluster is included.
后继核心簇可根据在任务流水线上位于其之前且相邻的一个核心簇传递的更新信息更新其维护的后继任务列表。例如,核心簇在增加核心组建新子簇后,可以由核心簇的主处理核心将该新子簇的配置信息发给其后继核心簇,以使后继核心簇的目标处理核心将该配置信息写入后继任务列表中。The successor core cluster can update the successor task list maintained by it according to the update information delivered by a core cluster located before and adjacent to it on the task pipeline. For example, after a core cluster adds cores to form a new subcluster, the main processing core of the core cluster can send the configuration information of the new subcluster to its successor core cluster, so that the target processing core of the successor core cluster can write the configuration information to into the successor task list.
在一些实施例中,减少该核心簇中当前可进行作业的第二处理核心的数量的步骤可以包括:减少该核心簇中当前可进行作业的子簇的数量,或者减少该核心簇中任意一个或多个子簇中第二处理核心的数量。在减少该核心簇中当前可进行作业的子簇的数量或者减少该核心簇中任意一个或多个子簇中第二处理核心的数量之后,可以由该核心簇的主处理核心向该核心簇的前继核心簇和后继核心簇发送该核心簇的更新信息,以便前继核心簇更新其维护的前继任务列表、更新子簇对应的标志位以及删除相应的输入分流,后继核心簇更新其维护的后继任务列表以及删除相应的输出分流。In some embodiments, the step of reducing the number of second processing cores in the core cluster that can currently perform operations may include: reducing the number of sub-clusters in the core cluster that can currently perform operations, or reducing any one of the core clusters. or the number of second processing cores in multiple subclusters. After reducing the number of sub-clusters that can currently perform operations in the core cluster or reducing the number of second processing cores in any one or more sub-clusters in the core cluster, the main processing core of the core cluster can be sent to the core cluster. The predecessor core cluster and the successor core cluster send the update information of the core cluster, so that the predecessor core cluster updates the previous task list maintained by it, updates the corresponding flag bit of the sub-cluster, and deletes the corresponding input shunt, and the successor core cluster updates its maintenance list of successor tasks and delete the corresponding output stream.
在一些实施例中,在检测该核心簇对应的缓存器的实时内存空间使用率的过程中,确定该核心簇的负载状态的步骤还可包括:In some embodiments, during the process of detecting the real-time memory space usage rate of the buffer corresponding to the core cluster, the step of determining the load status of the core cluster may further include:
若实时内存空间使用率为0,且持续为0的第二时长大于或等于第二预设时长且小于第三预设时长,则确定该核心簇的负载状态为空闲状态,且空闲状态等级为第一级别。其中,空闲状态可以理解为是欠载状态或0负载状态,其属于特殊情况下的低负载状态。If the real-time memory space usage rate is 0, and the second duration of 0 is greater than or equal to the second preset duration and less than the third preset duration, then it is determined that the load state of the core cluster is an idle state, and the idle state level is first level. Wherein, the idle state can be understood as an underload state or a zero load state, which belongs to a low load state under special circumstances.
其中,第三预设时长大于第二预设时长,第三预设时长可以根据实际需要设置,本公开对此不作限制。Wherein, the third preset duration is longer than the second preset duration, and the third preset duration can be set according to actual needs, which is not limited in the present disclosure.
在一些实施例中,在检测该核心簇对应的缓存器的实时内存空间使用率的过程中,确定该核心簇的负载状态的步骤还可包括:若实时内存空间使用率为0,且持续为0的第二时长大于或等于第三预设时长,则确定该核心簇的负载状态为空闲状态,且空闲状态等级为第二级别。In some embodiments, in the process of detecting the real-time memory space usage rate of the buffer corresponding to the core cluster, the step of determining the load status of the core cluster may further include: if the real-time memory space usage rate is 0 and continues to be If the second duration of 0 is greater than or equal to the third preset duration, it is determined that the load state of the core cluster is an idle state, and the idle state level is the second level.
也就是说,可根据实时内存空间使用率为0的持续时长,确定核心簇的负载状态为空闲状态及空闲状态的级别,以便对该核心簇进行相应的调控处理,从而减少该核心簇的功耗。That is to say, according to the duration of the real-time memory space utilization rate of 0, the load state of the core cluster can be determined as the idle state and the level of the idle state, so as to perform corresponding regulation and processing on the core cluster, thereby reducing the power of the core cluster. consumption.
在一些实施例中,在确定该核心簇的负载状态为空闲状态,且空闲状态等级为第一级别的情况下,则根据该核心簇的负载状态,对该核心簇进行调控处理的步骤包括:在该核心簇的负载状态为第一级别的空闲状态的情况下,向该核心簇对应的缓存器中插入空白帧,其中空白帧可以是预设的帧图像,从而可以维持该核心簇的工作状态,保证该核心簇对已处理完成的数据的吐出。In some embodiments, when it is determined that the load state of the core cluster is an idle state, and the idle state level is the first level, the step of regulating and processing the core cluster according to the load state of the core cluster includes: When the load state of the core cluster is the idle state of the first level, a blank frame is inserted into the buffer corresponding to the core cluster, wherein the blank frame can be a preset frame image, thereby maintaining the work of the core cluster State, to ensure that the core cluster spits out the processed data.
在一些实施例中,对于每个核心簇的每个第二处理核心,均可以设置一对应的门控时钟来控制对应的第二处理核心的工作或不工作。In some embodiments, for each second processing core of each core cluster, a corresponding gating clock can be set to control whether the corresponding second processing core works or not.
在一些实施例中,在确定该核心簇的负载状态为空闲状态,且空闲状态等级为第一级别的情况下,在向该核心簇对应的缓存器中插入空白帧,等待该核心簇吐出已处理完成的数据之后,关闭该核心簇中各第二处理核心分别对应的门控时钟,以使该核心簇中各第二处理核心处于不工作状态,从而可以节省众核系统的资源,达到省电效果,降低众核系统的功耗。其中,门控时钟用于向该多个核心簇输出时钟信号以驱动该多个核心簇基于时钟信号工作或不工作。In some embodiments, when it is determined that the load state of the core cluster is the idle state, and the idle state level is the first level, a blank frame is inserted into the buffer corresponding to the core cluster, and the core cluster is waited for to spit out the After processing the completed data, close the gate control clocks corresponding to each second processing core in the core cluster respectively, so that each second processing core in the core cluster is in a non-working state, thereby saving the resources of the many-core system and achieving the goal of saving Electric effect, reducing the power consumption of the many-core system. Wherein, the clock gating is used to output clock signals to the multiple core clusters to drive the multiple core clusters to work or not to work based on the clock signals.
在一些实施例中,众核系统包括多个核心簇,核心控制方法由第一处理核心实现,由第一处理核心统一对各核心簇进行负载检测和管理。在存在多个核心簇对应处理同一任务,且检测到该多个核心簇的负载状态均为第一级别的空闲状态的情况下,在向该多个核心簇对应的缓存器中插入空白帧之后,该核心控制方法还包括:暂停向该多个核心簇发送同步信号,以使该多个核心簇可以暂停同步更新,从而可以节省众核系统的资源,达到省电效果,降低众核系统的功耗。其中同步信号用于控制该多个核心簇基于同步周期进行任务处理。In some embodiments, the many-core system includes multiple core clusters, the core control method is implemented by the first processing core, and the first processing core uniformly performs load detection and management on each core cluster. In the case that there are multiple core clusters corresponding to processing the same task, and it is detected that the load states of the multiple core clusters are all in the first-level idle state, after inserting blank frames into the buffers corresponding to the multiple core clusters , the core control method also includes: suspending sending synchronization signals to the multiple core clusters, so that the multiple core clusters can suspend synchronous update, thereby saving resources of the many-core system, achieving power saving effects, and reducing the power consumption of the many-core system. power consumption. The synchronization signal is used to control the multiple core clusters to perform task processing based on the synchronization cycle.
举例来说,多个核心簇对应处理的同一任务例如可以是待合成视频的人脸识别任务,在检测到该多个核心簇负载状态均为第一级别的空闲状态的情况下,首先向该多个核心簇中各核心簇的缓存器中插入空白帧,以维持该多个核心簇的工作状态,等待该多个核心簇将已处理完成的数据全部吐出,在该多个核心簇均无数据输入且无数据输出之后,暂停向该多个核心簇发送同步信号,或者同时关闭该多个核心簇对应的全部门控时钟。For example, the same task correspondingly processed by multiple core clusters may be, for example, a face recognition task of a video to be synthesized. A blank frame is inserted in the cache memory of each core cluster in the plurality of core clusters to maintain the working state of the plurality of core clusters, and wait for the plurality of core clusters to spit out all the processed data. After the data is input and there is no data output, the sending of synchronization signals to the multiple core clusters is suspended, or all gating clocks corresponding to the multiple core clusters are turned off at the same time.
在一些实施例中,在确定该核心簇的负载状态为空闲状态,且空闲状态等级为第二级别的情况下,则根据该核心簇的负载状态,对该核心簇进行调控处理的步骤包括:在该核心簇的负载状态为第二级别的空闲状态的情况下,将该核心簇的当前可进行作业的第二处理核心中,可调的电压域和频率域所对应的第二处理核心的工作电压和工作频率调低,或者,减少该核心簇中当前可进行作业的第二处理核心的数量。In some embodiments, when it is determined that the load state of the core cluster is an idle state, and the idle state level is the second level, the step of regulating and processing the core cluster according to the load state of the core cluster includes: In the case that the load state of the core cluster is the idle state of the second level, among the second processing cores currently available for operation in the core cluster, the adjustable voltage domain and frequency domain corresponding to the second processing core The operating voltage and operating frequency are lowered, or the number of second processing cores in the core cluster that can currently perform operations is reduced.
图10为本公开实施例提供的一种核心控制装置的组成框图。参照图10,本公开实施例提供了一种核心控制装置300,该核心控制装置300应用于众核系统,该众核系统包括至少一个核心簇,每个核心簇包括至少一个第二处理核心,该核心控制装置300包括:负载数据检测模块301、负载状态检测模块302和核心调控模块303。Fig. 10 is a block diagram of a core control device provided by an embodiment of the present disclosure. Referring to FIG. 10 , an embodiment of the present disclosure provides a core control device 300, the core control device 300 is applied to a many-core system, the many-core system includes at least one core cluster, and each core cluster includes at least one second processing core, The core control device 300 includes: a load data detection module 301 , a load state detection module 302 and a core control module 303 .
其中,负载数据检测模块301被配置为对对应的核心簇进行负载检测,获取该核心簇对应的负载数据;负载状态检测模块302被配置为根据该核心簇对应的负载数据,确定该核心簇的负载状态;核心调控模块303被配置为根据该核心簇的负载状态,对该核心簇进行调控处理;其中,调控处理 包括以下调控方式之一:对该核心簇中当前可进行作业的第二处理核心的数量进行调控;对该核心簇中当前可进行作业的第二处理核心的工作电压和工作频率进行调控;向该核心簇对应的缓存器中插入空白帧。Among them, the load data detection module 301 is configured to detect the load of the corresponding core cluster, and obtain the load data corresponding to the core cluster; the load status detection module 302 is configured to determine the load data of the core cluster according to the load data corresponding to the core cluster. Load status; the core control module 303 is configured to perform control processing on the core cluster according to the load status of the core cluster; wherein, the control processing includes one of the following control methods: the second processing of the currently available jobs in the core cluster Regulate the number of cores; regulate the operating voltage and operating frequency of the second processing core that can currently perform operations in the core cluster; insert blank frames into the buffer corresponding to the core cluster.
在一些实施例中,每个所述核心簇对应设置有一所述缓存器,所述缓存器用于缓存对应的所述核心簇所需处理的任务的任务数据;In some embodiments, each of the core clusters is correspondingly provided with a buffer, and the buffer is used for caching the task data of the task to be processed by the corresponding core cluster;
所述负载数据检测模块,用于:检测该核心簇对应的所述缓存器的实时内存空间使用率;检测该核心簇对应的所述缓存器的实时内存空间使用率与第一预设阈值的比较结果;记录所述实时内存空间使用率持续大于或等于所述第一预设阈值的时长,并记为第一时长,该核心簇对应的负载数据包括所述第一时长,The load data detection module is configured to: detect the real-time memory space usage rate of the buffer corresponding to the core cluster; detect the difference between the real-time memory space usage rate of the buffer corresponding to the core cluster and the first preset threshold Comparison result; record the duration of the real-time memory space usage rate continuously greater than or equal to the first preset threshold, and record it as the first duration, the load data corresponding to the core cluster includes the first duration,
其中,所述负载状态检测模块,用于:判断所述第一时长是否大于或等于第一预设时长;在所述第一时长大于或等于第一预设时长的情况下,确定该核心簇的负载状态为繁忙状态。Wherein, the load state detection module is used to: judge whether the first duration is greater than or equal to the first preset duration; if the first duration is greater than or equal to the first preset duration, determine whether the core cluster The load status of is busy.
在一些实施例中,所述负载数据检测模块,用于:检测该核心簇对应的所述缓存器的实时内存空间使用率与第二预设阈值的比较结果,所述第二预设阈值大于0且小于所述第一预设阈值;记录所述实时内存空间使用率持续小于或等于所述第二预设阈值的时长,并记为第二时长,该核心簇对应的负载数据包括所述第二时长,其中,所述负载状态检测模块,用于:判断所述第二时长是否大于或等于第二预设时长;在所述第二时长大于或等于第二预设时长的情况下,确定该核心簇的负载状态为低负载状态。In some embodiments, the load data detection module is configured to: detect a comparison result between the real-time memory space usage rate of the buffer corresponding to the core cluster and a second preset threshold, and the second preset threshold is greater than 0 and less than the first preset threshold; record the duration of the real-time memory space usage that is continuously less than or equal to the second preset threshold, and record it as the second duration, and the load data corresponding to the core cluster includes the The second duration, wherein the load state detection module is used to: determine whether the second duration is greater than or equal to a second preset duration; if the second duration is greater than or equal to a second preset duration, It is determined that the load state of the core cluster is a low load state.
在一些实施例中,每个所述核心簇对应设置有一所述缓存器,所述缓存器用于缓存对应的所述核心簇所需处理的任务的任务数据;所述负载数据检测模块,用于:获取该核心簇对应的所述缓存器的内存空间使用增速,该核心簇对应的负载数据包括对应的所述缓存器的内存空间使用增速,其中,所述负载状态检测模块,用于:判断所述内存空间使用增速是否大于或等于第一预设增速值;在所述内存空间使用增速大于或等于第一预设增速值的情况下,确定该核心簇的负载状态为繁忙状态。In some embodiments, each of the core clusters is correspondingly provided with a buffer, and the buffer is used for caching the task data of the corresponding tasks to be processed by the core cluster; the load data detection module is used for : Obtain the memory space usage growth rate of the buffer corresponding to the core cluster, the load data corresponding to the core cluster includes the corresponding memory space usage growth rate of the buffer, wherein the load status detection module is used to : judging whether the memory space usage growth rate is greater than or equal to a first preset growth rate value; if the memory space usage growth rate is greater than or equal to the first preset growth rate value, determine the load status of the core cluster is busy.
在一些实施例中,该核心簇对应的负载数据还包括对应的所述缓存器的实时内存空间使用率,所述负载状态检测模块,用于:在该核心簇对应的所述缓存器的实时内存空间使用率小于或等于预设使用率的情况下,判断所述内存空间使用增速是否小于或等于第二预设增速值,所述第二预设增速值为负值;在所述内存空间使用增速小于或等于第二预设增速值的情况下,确定该核心簇的负载状态为低负载状态。In some embodiments, the load data corresponding to the core cluster also includes the corresponding real-time memory space usage rate of the buffer, and the load status detection module is configured to: When the memory space usage rate is less than or equal to the preset usage rate, it is determined whether the growth rate of the memory space usage is less than or equal to a second preset growth rate value, and the second preset growth rate value is a negative value; If the growth rate of memory space usage is less than or equal to the second preset growth rate value, it is determined that the load state of the core cluster is a low load state.
在一些实施例中,所述负载数据检测模块,用于:实时检测该核心簇处理任务所需的任务处理时长,该核心簇对应的负载数据包括所述任务处理时长,其中,所述负载状态检测模块,用于:判断该核心簇处理任务所需的任务处理时长是否大于或等于第一预设处理时长;在该核心簇处理任务所需的任务处理时长大于或等于第一预设处理时长的情况下,确定该核心簇的负载状态为繁忙状态。In some embodiments, the load data detection module is configured to: detect in real time the task processing duration required by the core cluster to process the task, the load data corresponding to the core cluster includes the task processing duration, wherein the load status The detection module is used to: determine whether the task processing duration required by the core cluster processing task is greater than or equal to the first preset processing duration; the task processing duration required by the core cluster processing task is greater than or equal to the first preset processing duration In the case of , it is determined that the load state of the core cluster is a busy state.
在一些实施例中,所述负载状态检测模块,用于:判断该核心簇处理任务所需的任务处理时长是否小于或等于第二预设处理时长;在该核心簇处理任务所需的任务处理时长小于或等于第二预设处理时长的情况下,确定该核心簇的负载状态为低负载状态。In some embodiments, the load state detection module is configured to: determine whether the task processing duration required by the core cluster to process the task is less than or equal to the second preset processing duration; If the duration is less than or equal to the second preset processing duration, it is determined that the load state of the core cluster is a low load state.
在一些实施例中,所述众核系统包括多个所述核心簇,该多个核心簇基于同步周期进行任务处理,所述同步周期为各核心簇处理该任务所需的任务处理时长中最大的任务处理时长;所述负载状态检测模块,用于:统计预设检测时间段内,该核心簇对应的任务处理时长作为所述同步周期的频次;在该核心簇对应的任务处理时长作为所述同步周期的频次大于或等于第一预设次数的情况下,确定该核心簇的负载状态为繁忙状态。In some embodiments, the many-core system includes a plurality of core clusters, and the multiple core clusters perform task processing based on a synchronization period, and the synchronization period is the maximum task processing time required for each core cluster to process the task. The task processing duration; the load state detection module is used to: count the task processing duration corresponding to the core cluster as the frequency of the synchronization cycle within the preset detection time period; the task processing duration corresponding to the core cluster as the frequency of the synchronization cycle When the frequency of the synchronization cycle is greater than or equal to the first preset number of times, it is determined that the load state of the core cluster is a busy state.
在一些实施例中,所述众核系统包括多个所述核心簇,该多个核心簇基于同步周期进行任务处理,所述同步周期为各核心簇处理该任务所需的任务处理时长中最大的任务处理时长;In some embodiments, the many-core system includes a plurality of core clusters, and the multiple core clusters perform task processing based on a synchronization period, and the synchronization period is the maximum task processing time required for each core cluster to process the task. task processing time;
所述负载状态检测模块,用于:统计预设检测时间段内,该核心簇对应的任务处理时长作为所述同步周期的频次;计算该核心簇对应的任务处理时长作为所述同步周期的频次,与预设检测时间段内同步周期的个数的比值;在该比值大于或等于第一预设比值的情况下,确定该核心簇的负载状态为繁忙状态。The load state detection module is used to: count the task processing duration corresponding to the core cluster as the frequency of the synchronization cycle within the preset detection time period; calculate the task processing duration corresponding to the core cluster as the frequency of the synchronization cycle , and the ratio of the number of synchronization cycles within the preset detection time period; when the ratio is greater than or equal to the first preset ratio, it is determined that the load state of the core cluster is a busy state.
在一些实施例中,所述核心调控模块,用于:在该核心簇的负载状态为繁忙状态的情况下,确定该核心簇是否具有可调的电压域和频率域;在确定该核心簇具有可调的电压域和频率域的情况下,将该核心簇的当前可进行作业的第二处理核心中,可调的电压域和频率域所对应的第二处理核心的工作电压和工作频率调高;在确定该核心簇不具有可调的电压域和频率域的情况下,增加该核心簇中当前可进行作业的第二处理核心的数量。In some embodiments, the core control module is configured to: determine whether the core cluster has an adjustable voltage domain and frequency domain when the load status of the core cluster is busy; In the case of an adjustable voltage domain and frequency domain, among the second processing cores that can currently perform operations in the core cluster, the operating voltage and operating frequency of the second processing core corresponding to the adjustable voltage domain and frequency domain are adjusted. High; if it is determined that the core cluster does not have an adjustable voltage domain and frequency domain, increase the number of second processing cores in the core cluster that can currently perform operations.
在一些实施例中,所述核心调控模块,用于:在该核心簇的负载状态为低负载状态的情况下,确定该核心簇是否可调的具有电压域和频率域;在确定该核心簇具有可调的电压域和频率域的情况下,将该核心簇的当前可进行作业的第二处理核心中,可调的电压域和频率域所对应的第二处理核心的工作电压和工作频率调低;在确定该核心簇不具有可调的电压域和频率域的情况下,减少该核心簇中当前可进行作业的第二处理核心的数量。In some embodiments, the core regulation module is configured to: determine whether the core cluster has an adjustable voltage domain and frequency domain when the load state of the core cluster is a low load state; In the case of an adjustable voltage domain and frequency domain, the operating voltage and operating frequency of the second processing core corresponding to the adjustable voltage domain and frequency domain among the second processing cores that can currently perform operations in the core cluster Turning down: reducing the number of second processing cores in the core cluster that can currently perform jobs in a case where it is determined that the core cluster does not have an adjustable voltage domain and frequency domain.
在一些实施例中,所述核心调控模块增加该核心簇中当前可进行作业的第二处理核心的数量,包括:将众核系统中该核心簇之外的空闲的一个或多个第二处理核心加入到该核心簇中,以作为该核心簇中当前可进行作业的第二处理核心;和/或,将该核心簇中的处于关闭状态的一个或多个第二处理核心进行唤醒,以作为该核心簇中当前可进行作业的第二处理核心。In some embodiments, the core regulation module increases the number of second processing cores that can currently perform operations in the core cluster, including: adding one or more idle second processing cores outside the core cluster in the many-core system The core is added to the core cluster as the second processing core that can currently perform operations in the core cluster; and/or, one or more second processing cores in the closed state in the core cluster are awakened to As the second processing core currently available for jobs in this core cluster.
在一些实施例中,所述核心调控模块减少该核心簇中当前可进行作业的第二处理核心的数量,包括:将该核心簇中当前可进行作业的至少一个第二处理核心移除出该核心簇;和/或,控制该核心簇中当前可进行作业的至少一个第二处理核心处于关闭状态。In some embodiments, the core regulation module reduces the number of second processing cores that can currently perform operations in the core cluster, including: removing at least one second processing core that can currently perform operations in the core cluster from the a core cluster; and/or, controlling at least one second processing core that is currently capable of operating in the core cluster to be in a closed state.
在一些实施例中,所述众核系统的每个所述核心簇分别对应处理任务流水线中的一个任务,各所述核心簇均包含多个子簇,每个所述子簇包含当前可进行作业的至少一个第二处理核心,该多个子簇用于并行处理所在核心簇对应的任务;所述装置还包括:In some embodiments, each of the core clusters of the many-core system corresponds to a task in the processing task pipeline, and each of the core clusters includes a plurality of sub-clusters, and each of the sub-clusters includes a currently available job. At least one second processing core of the plurality of sub-clusters is used to process tasks corresponding to the core clusters in parallel; the device also includes:
子簇组建模块,用于根据该核心簇中新增的当前可进行作业的第二处理核心,组建该核心簇的新子簇并获取该新子簇的配置信息,该新子簇包括新增的当前可进行作业的一个或多个第二处理核心,所述配置信息包含该新子簇中第二处理核心的数量以及各第二处理核心的地址信息;The sub-cluster building module is used to form a new sub-cluster of the core cluster and obtain the configuration information of the new sub-cluster according to the newly added second processing core in the core cluster that can currently perform operations. The new sub-cluster includes the newly added One or more second processing cores that can currently perform operations, the configuration information includes the number of second processing cores in the new sub-cluster and address information of each second processing core;
第一发送模块,用于向该核心簇的前继核心簇中的目标处理核心发送该新子簇的所述配置信息;A first sending module, configured to send the configuration information of the new sub-cluster to the target processing core in the predecessor core cluster of the core cluster;
第二发送模块,用于向该核心簇的后继核心簇中的目标处理核心发送该新子簇的所述配置信息;A second sending module, configured to send the configuration information of the new sub-cluster to a target processing core in a successor core cluster of the core cluster;
其中,所述前继核心簇为该核心簇在所述任务流水线上的前一个核心簇,所述前继核心簇中的目标处理核心用于根据该核心簇的新子簇的配置信息,建立该新子簇的输入分流,所述输入分流为前继核心簇向该新子簇输出数据的路径;所述后继核心簇为该核心簇在所述任务流水线上的后一个核心簇,所述后继核心簇中的目标处理核心用于根据该核心簇的新子簇的配置信息,建立该新子簇的输出分流,所述输出分流为该新子簇向所述后继核心簇输出数据的路径。Wherein, the previous core cluster is the previous core cluster of the core cluster on the task pipeline, and the target processing core in the previous core cluster is used to establish The input split of the new sub-cluster, the input split is the path for the previous core cluster to output data to the new sub-cluster; the successor core cluster is the next core cluster of the core cluster on the task pipeline, and the The target processing core in the successor core cluster is used to establish an output shunt of the new sub-cluster according to the configuration information of the new sub-cluster of the core cluster, and the output shunt is a path for the new sub-cluster to output data to the successor core cluster .
在一些实施例中,所述负载状态检测模块,用于:若所述实时内存空间使用率为0,且持续为0的所述第二时长大于或等于所述第二预设时长且小于第三预设时长,则确定该核心簇的负载状态为空闲状态,且空闲状态等级为第一级别;若所述实时内存空间使用率为0,且持续为0的所述第二时长大于或等于所述第三预设时长,则确定该核心簇的负载状态为空闲状态,且空闲状态等级为第二级别,In some embodiments, the load state detection module is configured to: if the real-time memory space usage rate is 0, and the second duration of 0 is greater than or equal to the second preset duration and less than the second duration Three preset durations, then determine that the load state of the core cluster is an idle state, and the idle state level is the first level; if the real-time memory space usage rate is 0, and the second duration of 0 is greater than or equal to For the third preset duration, it is determined that the load state of the core cluster is an idle state, and the idle state level is the second level,
其中,所述核心调控模块,用于:在该核心簇的负载状态为第一级别的空闲状态的情况下,向该核心簇对应的所述缓存器中插入空白帧;在该核心簇的负载状态为第二级别的空闲状态的情况下,将该核心簇的当前可进行作业的第二处理核心中,可调的电压域和频率域所对应的第二处理核心的工作电压和工作频率调低,或者,减少该核心簇中当前可进行作业的第二处理核心的数量。Wherein, the core control module is configured to: insert a blank frame into the buffer corresponding to the core cluster when the load state of the core cluster is the first-level idle state; When the state is the idle state of the second level, among the second processing cores that can currently perform operations in the core cluster, the operating voltage and operating frequency of the second processing core corresponding to the adjustable voltage domain and frequency domain are adjusted. Low, or, reduce the number of secondary processing cores in this core cluster that are currently available for jobs.
在一些实施例中,在所述向该核心簇对应的所述缓存器中插入空白帧之后,所述核心调控模块,还用于:关闭该核心簇中各第二处理核心分别对应的门控时钟,所述门控时钟用于向该核心簇中对应的第二处理核心输出时钟信号以驱动对应的第二处理核心基于所述时钟信号工作或不工作。In some embodiments, after the blank frame is inserted into the buffer corresponding to the core cluster, the core control module is further configured to: close the gates corresponding to the second processing cores in the core cluster A clock, the gated clock is used to output a clock signal to the corresponding second processing core in the core cluster to drive the corresponding second processing core to work or not to work based on the clock signal.
在一些实施例中,所述众核系统包括多个核心簇,在存在多个核心簇对应处理同一任务,且该多个核心簇的负载状态均为第一级别的空闲状态的情况下,在向该多个核心簇对应的所述缓存器中插入空白帧之后,所述核心控制装置还包括:信号停发模块,用于暂停向该多个核心簇发送同步信号,所述同步信号用于控制该多个核心簇基于同步周期进行任务处理。In some embodiments, the many-core system includes a plurality of core clusters. When there are multiple core clusters corresponding to processing the same task, and the load states of the multiple core clusters are all in the idle state of the first level, the After inserting blank frames into the buffers corresponding to the multiple core clusters, the core control device further includes: a signal stop sending module, configured to suspend sending synchronization signals to the multiple core clusters, and the synchronization signals are used for The plurality of core clusters are controlled to perform task processing based on a synchronous cycle.
此外,本公开实施例所提供的核心控制装置300,用于实现上述的核心控制方法,关于该核心控制装置300的其他描述可参见上述核心控制方法中的描述,此处不再赘述。In addition, the core control device 300 provided by the embodiment of the present disclosure is used to implement the above-mentioned core control method. For other descriptions about the core control device 300, please refer to the description in the above-mentioned core control method, which will not be repeated here.
本公开实施例还提供了一种处理核心,该处理核心包括上述的核心控制装置。An embodiment of the present disclosure also provides a processing core, where the processing core includes the above-mentioned core control device.
本公开实施例还提供了一种众核系统,其包括多个处理核心,多个处理核心包括第一处理核心和多个第二处理核心,多个第二处理核心中的部分或全部第二处理核心被划分为至少一个核心簇,每个核心簇包括至少一个第二处理核心,每个核心簇具有一主处理核心,核心簇的主处理核心为该核心簇中指定的一个第二处理核心。其中,在众核系统中,第一处理核心包括上述的核心控制装置,也即第一处理核心采用包含上述核心控制装置的处理核心,和/或,至少部分核心簇的主处理核心包括上述的核心控制装置,也即至少部分核心簇的主处理核心采用包含上述核心控制装置的处理核心。An embodiment of the present disclosure also provides a many-core system, which includes a plurality of processing cores, the plurality of processing cores include a first processing core and a plurality of second processing cores, and part or all of the plurality of second processing cores are second The processing core is divided into at least one core cluster, each core cluster includes at least one second processing core, each core cluster has a main processing core, and the main processing core of the core cluster is a second processing core specified in the core cluster . Wherein, in the many-core system, the first processing core includes the above-mentioned core control device, that is, the first processing core adopts the processing core including the above-mentioned core control device, and/or, at least part of the main processing cores of the core cluster include the above-mentioned The core control device, that is, at least part of the main processing cores of the core cluster adopts the processing core including the above-mentioned core control device.
图11为本公开实施例提供的一种电子设备的组成框图。参照图11,本公开实施例提供了一种电子设备,该电子设备包括多个处理核心701以及片上网络702,其中,多个处理核心701均与片上网络702连接,片上网络702用于交互多个处理核心间的数据和外部数据。其中,一个或多个处理核心701中存储有一个或多个指令,一个或多个指令被一个或多个处理核心701执行,以使一个或多个处理核心701能够执行上述的核心控制方法。Fig. 11 is a composition block diagram of an electronic device provided by an embodiment of the present disclosure. Referring to FIG. 11 , an embodiment of the present disclosure provides an electronic device, the electronic device includes a plurality of processing cores 701 and an on-chip network 702, wherein the plurality of processing cores 701 are all connected to the on-chip network 702, and the on-chip network 702 is used to interact multiple One handles inter-core data and external data. Wherein, one or more processing cores 701 store one or more instructions, and the one or more processing cores 701 execute the one or more processing cores 701, so that the one or more processing cores 701 can execute the above core control method.
此外,本公开实施例还提供了一种计算机可读介质,其上存储有计算机程序,其中,所述计算机程序在被众核系统的处理核心执行时实现上述的核心控制方法。In addition, an embodiment of the present disclosure also provides a computer-readable medium on which a computer program is stored, wherein the computer program implements the above-mentioned core control method when executed by a processing core of a many-core system.
本公开实施例还提供了一种计算机程序产品,其包括计算机程序,所述计算机程序在被众核系统的处理核心执行时实现上述的核心控制方法。An embodiment of the present disclosure also provides a computer program product, which includes a computer program, and when the computer program is executed by a processing core of a many-core system, the above-mentioned core control method is implemented.
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统、装置中的功能模块/单元可以被实施为软件、固件、硬件及其适当的组合。在硬件实施方式中,在以上描述中提及的功能模块/单元之间的划分不一定对应于物理组件的划分;例如,一个物理组件可以具有多个功能,或者一个功能或步骤可以由若干物理组件合作执行。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。Those of ordinary skill in the art can understand that all or some of the steps in the methods disclosed above, the functional modules/units in the system, and the device can be implemented as software, firmware, hardware, and an appropriate combination thereof. In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be composed of several physical components. Components cooperate to execute. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit . Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). As known to those of ordinary skill in the art, the term computer storage media includes both volatile and nonvolatile media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. permanent, removable and non-removable media. Computer storage media include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, tape, magnetic disk storage or other magnetic storage devices, or can Any other medium used to store desired information and which can be accessed by a computer. In addition, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .
本文已经公开了示例实施例,并且虽然采用了具体术语,但它们仅用于并仅应当被解释为一般说明性含义,并且不用于限制的目的。在一些实例中,对本领域技术人员显而易见的是,除非另外明确指出,否则可单独使用与特定实施例相结合描述的特征、特性和/或元素,或可与其他实施例相结合描述的特征、特性和/或元件组合使用。因此,本领域技术人员将理解,在不脱离由所附的权利要求阐明的本公开的范围的情况下,可进行各种形式和细节上的改变。Example embodiments have been disclosed herein, and while specific terms have been employed, they are used and should be construed in a generic descriptive sense only and not for purposes of limitation. In some instances, it will be apparent to those skilled in the art that features, characteristics and/or elements described in connection with a particular embodiment may be used alone, or may be described in combination with other embodiments, unless explicitly stated otherwise. Combinations of features and/or elements. Accordingly, it will be understood by those of ordinary skill in the art that various changes in form and details may be made without departing from the scope of the present disclosure as set forth in the appended claims.

Claims (22)

  1. 一种用于众核系统的核心控制方法,所述众核系统包括至少一个核心簇,每个所述核心簇包括至少一个第二处理核心,所述核心控制方法包括:A core control method for a many-core system, the many-core system comprising at least one core cluster, each of the core clusters comprising at least one second processing core, the core control method comprising:
    针对任意一个所述核心簇,对该核心簇进行负载检测,获取该核心簇对应的负载数据;For any one of the core clusters, perform load detection on the core cluster, and obtain the load data corresponding to the core cluster;
    根据该核心簇对应的负载数据,确定该核心簇的负载状态;Determine the load status of the core cluster according to the load data corresponding to the core cluster;
    根据该核心簇的负载状态,对该核心簇进行调控处理;According to the load state of the core cluster, the core cluster is regulated and processed;
    其中,所述调控处理包括以下调控方式之一:Wherein, the control process includes one of the following control methods:
    对该核心簇中当前可进行作业的第二处理核心的数量进行调控;Regulate the number of second processing cores that can currently perform operations in the core cluster;
    对该核心簇中当前可进行作业的第二处理核心的工作电压和工作频率进行调控;Regulating the operating voltage and operating frequency of the second processing core that can currently perform operations in the core cluster;
    向该核心簇对应的缓存器中插入空白帧。Insert a blank frame into the buffer corresponding to the core cluster.
  2. 根据权利要求1所述的核心控制方法,其中每个所述核心簇对应设置有一所述缓存器,所述缓存器用于缓存对应的所述核心簇所需处理的任务的任务数据;The core control method according to claim 1, wherein each of the core clusters is correspondingly provided with a buffer, and the buffer is used for caching the task data of the tasks to be processed by the corresponding core clusters;
    所述对该核心簇进行负载检测,获取该核心簇对应的负载数据,包括:The said core cluster load detection, to obtain the load data corresponding to the core cluster, including:
    检测该核心簇对应的所述缓存器的实时内存空间使用率;Detecting the real-time memory space usage rate of the buffer corresponding to the core cluster;
    检测该核心簇对应的所述缓存器的实时内存空间使用率与第一预设阈值的比较结果;Detecting a comparison result between the real-time memory space usage rate of the buffer corresponding to the core cluster and a first preset threshold;
    记录所述实时内存空间使用率持续大于或等于所述第一预设阈值的时长,并记为第一时长,该核心簇对应的负载数据包括所述第一时长,Recording the time period during which the real-time memory space usage rate is continuously greater than or equal to the first preset threshold, and recording it as the first time period, the load data corresponding to the core cluster includes the first time period,
    其中,所述根据该核心簇对应的负载数据,确定该核心簇的负载状态,包括:Wherein, said determining the load state of the core cluster according to the load data corresponding to the core cluster includes:
    判断所述第一时长是否大于或等于第一预设时长;judging whether the first duration is greater than or equal to a first preset duration;
    在所述第一时长大于或等于第一预设时长的情况下,确定该核心簇的负载状态为繁忙状态。If the first duration is greater than or equal to the first preset duration, it is determined that the load state of the core cluster is a busy state.
  3. 根据权利要求2所述的核心控制方法,其中所述对该核心簇进行负载检测,获取该核心簇对应的负载数据,还包括:The core control method according to claim 2, wherein said performing load detection on the core cluster and obtaining the load data corresponding to the core cluster also includes:
    检测该核心簇对应的所述缓存器的实时内存空间使用率与第二预设阈值的比较结果,所述第二预设阈值大于0且小于所述第一预设阈值;Detecting a comparison result between the real-time memory space usage rate of the buffer corresponding to the core cluster and a second preset threshold, where the second preset threshold is greater than 0 and smaller than the first preset threshold;
    记录所述实时内存空间使用率持续小于或等于所述第二预设阈值的时长,并记为第二时长,该核心簇对应的负载数据包括所述第二时长,Recording the time period during which the real-time memory space usage rate is continuously less than or equal to the second preset threshold, and recording it as the second time period, the load data corresponding to the core cluster includes the second time period,
    其中,所述根据该核心簇对应的负载数据,确定该核心簇的负载状态,包括:Wherein, said determining the load state of the core cluster according to the load data corresponding to the core cluster includes:
    判断所述第二时长是否大于或等于第二预设时长;judging whether the second duration is greater than or equal to a second preset duration;
    在所述第二时长大于或等于第二预设时长的情况下,确定该核心簇的负载状态为低负载状态。If the second duration is greater than or equal to a second preset duration, it is determined that the load state of the core cluster is a low load state.
  4. 根据权利要求1所述的核心控制方法,其中每个所述核心簇对应设置有一所述缓存器,所述缓存器用于缓存对应的所述核心簇所需处理的任务的任务数据;The core control method according to claim 1, wherein each of the core clusters is correspondingly provided with a buffer, and the buffer is used for caching the task data of the tasks to be processed by the corresponding core clusters;
    所述对该核心簇进行负载检测,获取该核心簇对应的负载数据,包括:The said core cluster load detection, to obtain the load data corresponding to the core cluster, including:
    获取该核心簇对应的所述缓存器的内存空间使用增速,该核心簇对应的负载数据包括对应的所述缓存器的内存空间使用增速,Acquiring the growth rate of the memory space usage of the buffer corresponding to the core cluster, the load data corresponding to the core cluster includes the corresponding growth rate of the memory space usage of the buffer,
    其中,所述根据该核心簇对应的负载数据,确定该核心簇的负载状态,包括:Wherein, said determining the load state of the core cluster according to the load data corresponding to the core cluster includes:
    判断所述内存空间使用增速是否大于或等于第一预设增速值;Judging whether the growth rate of memory space usage is greater than or equal to a first preset growth rate value;
    在所述内存空间使用增速大于或等于第一预设增速值的情况下,确定该核心簇的负载状态为繁忙状态。In a case where the memory space usage growth rate is greater than or equal to a first preset growth rate value, it is determined that the load state of the core cluster is a busy state.
  5. 根据权利要求4所述的核心控制方法,其中,该核心簇对应的负载数据还包括对应的所述缓存器的实时内存空间使用率,所述根据该核心簇对应的负载数据,确定该核心簇的负载状态,包括:The core control method according to claim 4, wherein the load data corresponding to the core cluster also includes the corresponding real-time memory space usage rate of the buffer, and the core cluster is determined according to the load data corresponding to the core cluster load status, including:
    在该核心簇对应的所述缓存器的实时内存空间使用率小于或等于预设使用率的情况下,判断所 述内存空间使用增速是否小于或等于第二预设增速值,所述第二预设增速值为负值;In the case that the real-time memory space usage rate of the buffer corresponding to the core cluster is less than or equal to a preset usage rate, it is judged whether the growth rate of the memory space usage is less than or equal to a second preset growth rate value, the first 2. The preset growth rate is a negative value;
    在所述内存空间使用增速小于或等于第二预设增速值的情况下,确定该核心簇的负载状态为低负载状态。In the case that the memory space usage growth rate is less than or equal to the second preset growth rate value, it is determined that the load state of the core cluster is a low load state.
  6. 根据权利要求1所述的核心控制方法,其中所述对该核心簇进行负载检测,获取该核心簇对应的负载数据,包括:The core control method according to claim 1, wherein said performing load detection on the core cluster and obtaining load data corresponding to the core cluster includes:
    实时检测该核心簇处理任务所需的任务处理时长,该核心簇对应的负载数据包括所述任务处理时长,Real-time detection of the task processing time required by the core cluster to process the task, the load data corresponding to the core cluster includes the task processing time,
    其中,所述根据该核心簇对应的负载数据,确定该核心簇的负载状态,包括:Wherein, said determining the load state of the core cluster according to the load data corresponding to the core cluster includes:
    判断该核心簇处理任务所需的任务处理时长是否大于或等于第一预设处理时长;Judging whether the task processing duration required by the core cluster to process the task is greater than or equal to the first preset processing duration;
    在该核心簇处理任务所需的任务处理时长大于或等于第一预设处理时长的情况下,确定该核心簇的负载状态为繁忙状态。When the task processing duration required by the core cluster to process the task is greater than or equal to the first preset processing duration, it is determined that the load state of the core cluster is a busy state.
  7. 根据权利要求6所述的核心控制方法,其中所述根据该核心簇对应的负载数据,确定该核心簇的负载状态,包括:The core control method according to claim 6, wherein said determining the load state of the core cluster according to the load data corresponding to the core cluster comprises:
    判断该核心簇处理任务所需的任务处理时长是否小于或等于第二预设处理时长;judging whether the task processing duration required by the core cluster to process the task is less than or equal to the second preset processing duration;
    在该核心簇处理任务所需的任务处理时长小于或等于第二预设处理时长的情况下,确定该核心簇的负载状态为低负载状态。When the task processing duration required by the core cluster to process the task is less than or equal to the second preset processing duration, it is determined that the load state of the core cluster is a low load state.
  8. 根据权利要求6所述的核心控制方法,其中所述众核系统包括多个所述核心簇,该多个核心簇基于同步周期进行任务处理,所述同步周期为各核心簇处理该任务所需的任务处理时长中最大的任务处理时长;The core control method according to claim 6, wherein the many-core system includes a plurality of core clusters, and the multiple core clusters perform task processing based on a synchronization period, and the synchronization period is required for each core cluster to process the task The maximum task processing time in the task processing time;
    所述根据该核心簇对应的负载数据,确定该核心簇的负载状态,包括:The determining the load state of the core cluster according to the load data corresponding to the core cluster includes:
    统计预设检测时间段内,该核心簇对应的任务处理时长作为所述同步周期的频次;Counting the task processing duration corresponding to the core cluster within the preset detection time period as the frequency of the synchronization cycle;
    在该核心簇对应的任务处理时长作为所述同步周期的频次大于或等于第一预设次数的情况下,确定该核心簇的负载状态为繁忙状态。When the task processing duration corresponding to the core cluster is greater than or equal to a first preset number of times as the frequency of the synchronization cycle, it is determined that the load state of the core cluster is a busy state.
  9. 根据权利要求6所述的核心控制方法,其中所述众核系统包括多个所述核心簇,该多个核心簇基于同步周期进行任务处理,所述同步周期为各核心簇处理该任务所需的任务处理时长中最大的任务处理时长;The core control method according to claim 6, wherein the many-core system includes a plurality of core clusters, and the multiple core clusters perform task processing based on a synchronization period, and the synchronization period is required for each core cluster to process the task The maximum task processing time in the task processing time;
    所述根据该核心簇对应的负载数据,确定该核心簇的负载状态,包括:The determining the load state of the core cluster according to the load data corresponding to the core cluster includes:
    统计预设检测时间段内,该核心簇对应的任务处理时长作为所述同步周期的频次;Counting the task processing duration corresponding to the core cluster within the preset detection time period as the frequency of the synchronization cycle;
    计算该核心簇对应的任务处理时长作为所述同步周期的频次,与预设检测时间段内同步周期的个数的比值;Calculate the task processing duration corresponding to the core cluster as the frequency of the synchronization cycle, and the ratio of the number of synchronization cycles in the preset detection time period;
    在该比值大于或等于第一预设比值的情况下,确定该核心簇的负载状态为繁忙状态。If the ratio is greater than or equal to the first preset ratio, it is determined that the load state of the core cluster is a busy state.
  10. 根据权利要求1所述的核心控制方法,其中所述根据该核心簇的负载状态,对该核心簇进行调控处理,包括:The core control method according to claim 1, wherein said regulating and processing the core cluster according to the load state of the core cluster includes:
    在该核心簇的负载状态为繁忙状态的情况下,确定该核心簇是否具有可调的电压域和频率域;In the case that the load state of the core cluster is a busy state, determine whether the core cluster has an adjustable voltage domain and frequency domain;
    在确定该核心簇具有可调的电压域和频率域的情况下,将该核心簇的当前可进行作业的第二处理核心中,可调的电压域和频率域所对应的第二处理核心的工作电压和工作频率调高;When it is determined that the core cluster has an adjustable voltage domain and frequency domain, among the second processing cores currently available for operation in the core cluster, the second processing core corresponding to the adjustable voltage domain and frequency domain Increase the working voltage and working frequency;
    在确定该核心簇不具有可调的电压域和频率域的情况下,增加该核心簇中当前可进行作业的第二处理核心的数量。If it is determined that the core cluster does not have adjustable voltage domains and frequency domains, increase the number of second processing cores that can currently perform jobs in the core cluster.
  11. 根据权利要求1所述的核心控制方法,其中所述根据该核心簇的负载状态,对该核心簇进行调控处理,包括:The core control method according to claim 1, wherein said regulating and processing the core cluster according to the load state of the core cluster includes:
    在该核心簇的负载状态为低负载状态的情况下,确定该核心簇是否可调的具有电压域和频率域;In the case that the load state of the core cluster is a low load state, determine whether the core cluster has an adjustable voltage domain and a frequency domain;
    在确定该核心簇具有可调的电压域和频率域的情况下,将该核心簇的当前可进行作业的第二处理核心中,可调的电压域和频率域所对应的第二处理核心的工作电压和工作频率调低;When it is determined that the core cluster has an adjustable voltage domain and frequency domain, among the second processing cores currently available for operation in the core cluster, the second processing core corresponding to the adjustable voltage domain and frequency domain Lower working voltage and frequency;
    在确定该核心簇不具有可调的电压域和频率域的情况下,减少该核心簇中当前可进行作业的第二处理核心的数量。If it is determined that the core cluster does not have an adjustable voltage domain and frequency domain, reduce the number of second processing cores currently available for operation in the core cluster.
  12. 根据权利要求10所述的核心控制方法,其中所述增加该核心簇中当前可进行作业的第二处理核心的数量,包括:The core control method according to claim 10, wherein said increasing the number of second processing cores currently capable of performing operations in the core cluster comprises:
    将众核系统中该核心簇之外的空闲的一个或多个第二处理核心加入到该核心簇中,以作为该核心簇中当前可进行作业的第二处理核心;和/或Adding one or more idle second processing cores outside the core cluster in the many-core system to the core cluster as the second processing cores currently available for operations in the core cluster; and/or
    将该核心簇中的处于关闭状态的一个或多个第二处理核心进行唤醒,以作为该核心簇中当前可进行作业的第二处理核心。One or more second processing cores in the closed state in the core cluster are awakened to serve as the second processing cores in the core cluster that can currently perform jobs.
  13. 根据权利要求11所述的核心控制方法,其中所述减少该核心簇中当前可进行作业的第二处理核心的数量,包括:The core control method according to claim 11, wherein said reducing the number of second processing cores currently capable of performing operations in the core cluster comprises:
    将该核心簇中当前可进行作业的至少一个第二处理核心移除出该核心簇;和/或removing at least one second processing core that is currently capable of performing operations in the core cluster from the core cluster; and/or
    控制该核心簇中当前可进行作业的至少一个第二处理核心处于关闭状态。Controlling that at least one second processing core in the core cluster that can currently perform jobs is in a closed state.
  14. 根据权利要求12或13所述的核心控制方法,其中所述众核系统的每个所述核心簇分别对应处理任务流水线中的一个任务,各所述核心簇均包含多个子簇,每个所述子簇包含当前可进行作业的至少一个第二处理核心,该多个子簇用于并行处理所在核心簇对应的任务;所述方法还包括:The core control method according to claim 12 or 13, wherein each of the core clusters of the many-core system corresponds to a task in the processing task pipeline, and each of the core clusters includes a plurality of sub-clusters, each of which The sub-clusters include at least one second processing core that can currently perform operations, and the plurality of sub-clusters are used to process tasks corresponding to the core clusters in parallel; the method also includes:
    根据该核心簇中新增的当前可进行作业的第二处理核心,组建该核心簇的新子簇并获取该新子簇的配置信息,该新子簇包括新增的当前可进行作业的一个或多个第二处理核心,所述配置信息包含该新子簇中第二处理核心的数量以及各第二处理核心的地址信息;According to the newly added second processing core that can currently perform operations in the core cluster, form a new subcluster of the core cluster and obtain the configuration information of the new subcluster, the new subcluster includes a newly added one that can currently perform operations or multiple second processing cores, the configuration information includes the number of second processing cores in the new sub-cluster and the address information of each second processing core;
    向该核心簇的前继核心簇中的目标处理核心发送该新子簇的所述配置信息;sending the configuration information of the new sub-cluster to the target processing core in the predecessor core cluster of the core cluster;
    向该核心簇的后继核心簇中的目标处理核心发送该新子簇的所述配置信息;sending the configuration information of the new subcluster to a target processing core in a successor core cluster of the core cluster;
    其中,所述前继核心簇为该核心簇在所述任务流水线上的前一个核心簇,所述前继核心簇中的目标处理核心用于根据该核心簇的新子簇的配置信息,建立该新子簇的输入分流,所述输入分流为前继核心簇向该新子簇输出数据的路径;Wherein, the previous core cluster is the previous core cluster of the core cluster on the task pipeline, and the target processing core in the previous core cluster is used to establish The input shunt of the new sub-cluster, the input shunt is a path for outputting data from the previous core cluster to the new sub-cluster;
    所述后继核心簇为该核心簇在所述任务流水线上的后一个核心簇,所述后继核心簇中的目标处理核心用于根据该核心簇的新子簇的配置信息,建立该新子簇的输出分流,所述输出分流为该新子簇向所述后继核心簇输出数据的路径。The successor core cluster is the next core cluster of the core cluster on the task pipeline, and the target processing core in the successor core cluster is used to establish the new subcluster according to the configuration information of the new subcluster of the core cluster The output split is a path for the new sub-cluster to output data to the successor core cluster.
  15. 根据权利要求3所述的核心控制方法,其中,所述根据该核心簇对应的负载数据,确定该核心簇的负载状态,包括:The core control method according to claim 3, wherein said determining the load state of the core cluster according to the load data corresponding to the core cluster comprises:
    若所述实时内存空间使用率为0,且持续为0的所述第二时长大于或等于所述第二预设时长且小于第三预设时长,则确定该核心簇的负载状态为空闲状态,且空闲状态等级为第一级别;If the real-time memory space usage rate is 0, and the second duration of 0 is greater than or equal to the second preset duration and less than the third preset duration, then it is determined that the load state of the core cluster is an idle state , and the idle state level is the first level;
    若所述实时内存空间使用率为0,且持续为0的所述第二时长大于或等于所述第三预设时长,则确定该核心簇的负载状态为空闲状态,且空闲状态等级为第二级别,If the real-time memory space usage rate is 0, and the second duration of 0 is greater than or equal to the third preset duration, then it is determined that the load state of the core cluster is an idle state, and the idle state level is the first second level,
    其中,所述根据该核心簇的负载状态,对该核心簇进行调控处理,包括:Wherein, the said core cluster is regulated and processed according to the load status of the core cluster, including:
    在该核心簇的负载状态为第一级别的空闲状态的情况下,向该核心簇对应的所述缓存器中插入空白帧;When the load state of the core cluster is the idle state of the first level, inserting a blank frame into the buffer corresponding to the core cluster;
    在该核心簇的负载状态为第二级别的空闲状态的情况下,将该核心簇的当前可进行作业的第二处理核心中,可调的电压域和频率域所对应的第二处理核心的工作电压和工作频率调低,或者,减少该核心簇中当前可进行作业的第二处理核心的数量。In the case that the load state of the core cluster is the idle state of the second level, among the second processing cores currently available for operation in the core cluster, the adjustable voltage domain and frequency domain corresponding to the second processing core The operating voltage and operating frequency are lowered, or the number of second processing cores in the core cluster that can currently perform operations is reduced.
  16. 根据权利要求15所述的核心控制方法,其中在所述向该核心簇对应的所述缓存器中插入空白帧之后,还包括:The core control method according to claim 15, wherein after inserting a blank frame into the buffer corresponding to the core cluster, further comprising:
    关闭该核心簇中各第二处理核心分别对应的门控时钟,所述门控时钟用于向该核心簇中对应的第二处理核心输出时钟信号以驱动对应的第二处理核心基于所述时钟信号工作或不工作。Turn off the gating clocks corresponding to the second processing cores in the core cluster, and the gating clocks are used to output clock signals to the corresponding second processing cores in the core cluster to drive the corresponding second processing cores based on the clock Signals work or don't work.
  17. 根据权利要求15所述的核心控制方法,其中所述众核系统包括多个核心簇,在存在多个核心簇对应处理同一任务,且该多个核心簇的负载状态均为第一级别的空闲状态的情况下,在向该多个核心簇对应的所述缓存器中插入空白帧之后,所述核心控制方法还包括:The core control method according to claim 15, wherein the many-core system includes a plurality of core clusters, and when there are multiple core clusters correspondingly processing the same task, and the load states of the multiple core clusters are all idle at the first level state, after inserting a blank frame into the buffer corresponding to the plurality of core clusters, the core control method further includes:
    暂停向该多个核心簇发送同步信号,所述同步信号用于控制该多个核心簇基于同步周期进行任务处理。Pausing to send a synchronization signal to the multiple core clusters, where the synchronization signal is used to control the multiple core clusters to perform task processing based on a synchronization period.
  18. 一种核心控制装置,应用于众核系统,所述众核系统包括至少一个核心簇,每个所述核心簇包括至少一个第二处理核心,所述核心控制装置包括:A core control device applied to a many-core system, the many-core system comprising at least one core cluster, each of the core clusters comprising at least one second processing core, the core control device comprising:
    负载数据检测模块,被配置为对对应的核心簇进行负载检测,获取该核心簇对应的负载数据;The load data detection module is configured to perform load detection on the corresponding core cluster, and obtain the load data corresponding to the core cluster;
    负载状态检测模块,被配置为根据该核心簇对应的负载数据,确定该核心簇的负载状态;The load state detection module is configured to determine the load state of the core cluster according to the load data corresponding to the core cluster;
    核心调控模块,被配置为根据该核心簇的负载状态,对该核心簇进行调控处理;The core control module is configured to perform control processing on the core cluster according to the load state of the core cluster;
    其中,所述调控处理包括以下调控方式之一:Wherein, the control process includes one of the following control methods:
    对该核心簇中当前可进行作业的第二处理核心的数量进行调控;Regulate the number of second processing cores that can currently perform operations in the core cluster;
    对该核心簇中当前可进行作业的第二处理核心的工作电压和工作频率进行调控;Regulating the operating voltage and operating frequency of the second processing core that can currently perform operations in the core cluster;
    向该核心簇对应的缓存器中插入空白帧。Insert a blank frame into the buffer corresponding to the core cluster.
  19. 一种众核系统,包括多个处理核心,多个处理核心包括第一处理核心和多个第二处理核心,多个第二处理核心中的部分或全部第二处理核心被划分为至少一个核心簇,每个所述核心簇包括至少一个所述第二处理核心,每个所述核心簇具有一主处理核心,核心簇的主处理核心为该核心簇中指定的一个第二处理核心;A many-core system, comprising a plurality of processing cores, the plurality of processing cores including a first processing core and a plurality of second processing cores, part or all of the second processing cores in the plurality of second processing cores are divided into at least one core Clusters, each of the core clusters includes at least one second processing core, each of the core clusters has a main processing core, and the main processing core of the core cluster is a second processing core specified in the core cluster;
    其中,所述第一处理核心包括权利要求18所述的核心控制装置,和/或,至少部分所述核心簇的主处理核心包括权利要求18所述的核心控制装置。Wherein, the first processing core includes the core control device according to claim 18, and/or at least part of the main processing cores of the core cluster includes the core control device according to claim 18.
  20. 一种电子设备,包括:An electronic device comprising:
    多个处理核心;以及multiple processing cores; and
    片上网络,被配置为交互所述多个处理核心间的数据和外部数据;an on-chip network configured to exchange data between the plurality of processing cores and external data;
    一个或多个所述处理核心中存储有一个或多个指令,一个或多个所述指令被一个或多个所述处理核心执行,以使一个或多个所述处理核心能够执行权利要求1-17中任一项所述的核心控制方法。One or more instructions are stored in one or more of the processing cores, and one or more of the instructions are executed by the one or more of the processing cores, so that the one or more of the processing cores can perform claim 1 - The core control method described in any one of 17.
  21. 一种计算机可读介质,其上存储有计算机程序,其中,所述计算机程序在被众核系统的处理核心执行时实现如权利要求1-17中任一项所述的核心控制方法。A computer-readable medium on which a computer program is stored, wherein the computer program implements the core control method according to any one of claims 1-17 when executed by a processing core of a many-core system.
  22. 一种计算机程序产品,其包括计算机程序,所述计算机程序在被众核系统的处理核心执行时实现根据权利要求1-17中任一项所述的核心控制方法。A computer program product comprising a computer program which, when executed by a processing core of a many-core system, implements the core control method according to any one of claims 1-17.
PCT/CN2021/133963 2021-05-24 2021-11-29 Core control method and apparatus for many-core system, and many-core system WO2022247189A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110566468.7A CN115391021A (en) 2021-05-24 2021-05-24 Core control method and device, processing core, system, electronic device and medium
CN202110566468.7 2021-05-24

Publications (1)

Publication Number Publication Date
WO2022247189A1 true WO2022247189A1 (en) 2022-12-01

Family

ID=84114532

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/133963 WO2022247189A1 (en) 2021-05-24 2021-11-29 Core control method and apparatus for many-core system, and many-core system

Country Status (2)

Country Link
CN (1) CN115391021A (en)
WO (1) WO2022247189A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115686871A (en) * 2022-12-30 2023-02-03 摩尔线程智能科技(北京)有限责任公司 Core scheduling method and device for multi-core system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102298580A (en) * 2010-06-22 2011-12-28 Sap股份公司 Multi-core query processing system using asynchronous buffer
US20120216064A1 (en) * 2011-02-21 2012-08-23 Samsung Electronics Co., Ltd. Hot-plugging of multi-core processor
US20140317380A1 (en) * 2013-04-18 2014-10-23 Denso Corporation Multi-core processor
CN105608049A (en) * 2015-12-23 2016-05-25 魅族科技(中国)有限公司 Method and device for controlling CPU of intelligent terminal
CN107844152A (en) * 2016-09-20 2018-03-27 华为技术有限公司 Load monitor, the electric power system based on multi-core framework and voltage adjusting method
CN109144658A (en) * 2017-06-27 2019-01-04 阿里巴巴集团控股有限公司 Load-balancing method, device and the electronic equipment of limited resources
CN111198757A (en) * 2020-01-06 2020-05-26 北京小米移动软件有限公司 CPU kernel scheduling method, CPU kernel scheduling device and storage medium
CN112463367A (en) * 2020-11-19 2021-03-09 苏州浪潮智能科技有限公司 Method and system for optimizing performance of storage system, electronic equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102298580A (en) * 2010-06-22 2011-12-28 Sap股份公司 Multi-core query processing system using asynchronous buffer
US20120216064A1 (en) * 2011-02-21 2012-08-23 Samsung Electronics Co., Ltd. Hot-plugging of multi-core processor
US20140317380A1 (en) * 2013-04-18 2014-10-23 Denso Corporation Multi-core processor
CN105608049A (en) * 2015-12-23 2016-05-25 魅族科技(中国)有限公司 Method and device for controlling CPU of intelligent terminal
CN107844152A (en) * 2016-09-20 2018-03-27 华为技术有限公司 Load monitor, the electric power system based on multi-core framework and voltage adjusting method
CN109144658A (en) * 2017-06-27 2019-01-04 阿里巴巴集团控股有限公司 Load-balancing method, device and the electronic equipment of limited resources
CN111198757A (en) * 2020-01-06 2020-05-26 北京小米移动软件有限公司 CPU kernel scheduling method, CPU kernel scheduling device and storage medium
CN112463367A (en) * 2020-11-19 2021-03-09 苏州浪潮智能科技有限公司 Method and system for optimizing performance of storage system, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115686871A (en) * 2022-12-30 2023-02-03 摩尔线程智能科技(北京)有限责任公司 Core scheduling method and device for multi-core system

Also Published As

Publication number Publication date
CN115391021A (en) 2022-11-25

Similar Documents

Publication Publication Date Title
US9760158B2 (en) Forcing a processor into a low power state
TWI260543B (en) Performance scheduling method and system, and computer readable medium
US20190107874A1 (en) Method and Apparatus for Managing Global Chip Power on a Multicore System on Chip
WO2005106623A1 (en) Cpu clock control device, cpu clock control method, cpu clock control program, recording medium, and transmission medium
WO2022247189A1 (en) Core control method and apparatus for many-core system, and many-core system
KR20020087388A (en) Arithmetic Processing System and Arithmetic Processing Control Method, Task Management System and Task Management Method, and Storage Medium
JP2002099433A (en) System of computing processing, control method system for task control, method therefor and record medium
US11650650B2 (en) Modifying an operating state of a processing unit based on waiting statuses of blocks
US20200301860A1 (en) Dispatching interrupts in a multi-processor system based on power and performance factors
CN109144680A (en) A kind of clock ticktack interrupts setting method and device
US20170116037A1 (en) Resource-aware backfill job scheduling
US9182797B2 (en) Decoupled power and performance allocation in a multiprocessing system
CN104598311A (en) Method and device for real-time operation fair scheduling for Hadoop
WO2017148253A1 (en) Energy-saving management implementation method and apparatus, and network device
US9632566B2 (en) Dynamically controlling power based on work-loop performance
Yao et al. A dual delay timer strategy for optimizing server farm energy
US11243603B2 (en) Power management of an event-based processing system
Wang et al. Power saving design for servers under response time constraint
JP2008217628A (en) Cpu power saving system and power saving method
CN109144693B (en) Power self-adaptive task scheduling method and system
US11803224B2 (en) Power management method, multi-processing unit system and power management module
CN116303132A (en) Data caching method, device, equipment and storage medium
WO2016058149A1 (en) Method for predicting utilization rate of processor, processing apparatus and terminal device
KR20160105209A (en) Electronic apparatus and method for contorolling power thereof
Tsai et al. Prevent vm migration in virtualized clusters via deadline driven placement policy

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21942752

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE